INQ (Incremental Network Quantization)

Summary of "Incremental Network Qunatization: Towards Lossless CNNs with Low-Precision Weights"


1. Problem

- Deep CNN requires heavy burdens on the memory and other computational resources
- Reducing the heavy burdens could cause the Performance Loss (accuracy loss)

2. Solution

- INQ (Incremental Network Quantization) :
  Convert "Any Pre-Trained Full-Precision CNN" into "Low-Precision Version (weights are constrained to be either powers of two or zero)"

3. How

- three interdependent operations



1) Weight Partition

- Divide the weights in each layer of a pre-trained CNN model into two disjoint groups


- Weights in the first group
   + Responsible to form a low-precision base,
   + Thus, Quantized by a variable-length encoding method

- Weights in the second group
   + Responsible to compensate for the accuracy loss from the quantization
   + Thus, Retrained

2) Group-Wise Quantization


- Above two operations can be formulated into following joint optimization form :

- By employing SGD (Stochastic Gradient Decent) method, the updating scheme is :

3) Re-Training

- Above Operations are Repeated until All Weights are converted into Low-Precision Ones


- Overall Procedure of INQ



4. Performance

- 5 bit quantization has improved accuracy than 32 bit floating point reference
  + Variable Length Encoding
  + 1 bit for representing zero value
  + 4 bit for at most 16 different values for the powers of two)

- Combination of network pruning and INQ shows impressive results




댓글

이 블로그의 인기 게시물

텐서플로우에서의 양자화 (Quantize) 시키는 방법