DFFG: Fast Gradient Iteration for Data-free Quantization

Abstract

Model quantization is a technique that optimizes neural network computation by converting parameters and activation values from high-precision floating-point numbers to low-bit integers or fixed-point representations. This reduces storage and computational costs and improves computational efficiency. Currently, common quantization methods, such as QAT and PTQ, optimize quantization parameters using training data to achieve the best performance. However, in practical industrial applications, there may be little or no data available for downstream model quantization due to restrictions such as privacy and security. Therefore, researching how to perform model quantization with little or no data is essential. This article proposes a data-free quantization technique called DFFG, based on fast gradient iteration. DFFG uses information learned from the full-precision model, such as the BN layer, to recover the distribution of the original training data. We propose for the first time to use a class of FGSM gradient iteration strategy with a momentum term to update the generated data. This iteration strategy can quickly perturb the optimized data, and we ensure the diversity of generated data by manipulating the variability of the gradients. Thanks to this design, we also propose using intermediate data generated during the iteration process as data for subsequent model quantization, greatly improving the speed of data generation.We have demonstrated the effectiveness of our proposed approach through empirical evaluations. Our method generates data that not only ensures model quantization performance but also significantly surpasses other similar data generation techniques in terms of speed. Specifically, our approach is 10X faster than ZeroQ.