Abstract

1. Introduction

<aside> ๐Ÿ‘ฉ๐Ÿผโ€๐Ÿซ In this paper we focus on integer quantization for neural network inference, where networks are modified to use integer weights and activations so that integer math pipelines can be used for many operations

</aside>

Untitled

2. Related works

3. Quantization Fundamentals

3-1. Range Mapping

Let $[ฮฒ, ฮฑ]$ be the range of representable real values chosen for quantization and b be the bit-width of the signed integer representation. Uniform quantization transforms the input value $x โˆˆ [ฮฒ, ฮฑ]$ to lie within $[โˆ’2^{bโˆ’1}, 2^{bโˆ’1} โˆ’ 1]$, where inputs outside the range are clipped to the nearest bound.

3-1-1. Affine Quantization

Affine transform function : $f(x) = s \cdot x + z$

$x_q = quantize(x, b, s, z) = clip(round(s \cdot x + z), โˆ’2^{bโˆ’1}, 2^{bโˆ’1} โˆ’ 1)$

$\hat{x} = dequantize ( x_q,s,z ) = \frac{1}{s} (x_q - z)$