Deep Compression

Summary on "Deep Compression : Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"

- Neural Net : Computationally intensive and memory intensive
- Making them difficult to deploy on embedded systems with limited hardware resources

- Reduce the storage requirement of neural net without affecting accuracy by 35~49 X

- 3 stage pipeline

- By learning only the important connections, Prune the network
- Reduce the number of connections by 9~13 X

- Quantize the weights to enforce weight sharing
- Reduce the number of bits that represent each connection from 23 to 5

- Apply Huffman Coding

- ImageNet

+ AlexNet : Reduce the storage requirement by 35 X from 240MB to 6.9MB w/ loss of accuracy

+ VGG-16 : 522MB to 11.3MB