Research on a 2D Weight Stationary Dataflow Architecture
-
摘要:
随着人工智能算法的发展,卷积神经网络(CNN)在图像、音频等方面的应用□ 越来越广泛,CNN算法的计算量也越来越大.权值固定数据流(WS)将权值固定在寄存器中,是一种最大化利用◎卷积重用和filter重用的数据流.不过当前的权◢值固定数据流结构存在建立流水线时间过长的问题.本文研究了一种去除PE(Process Element)行之间的FIFO,用加法器连接PE行的2D权值固定数据流结构.这种2D权值固定的数据流结构计算AlexNet时减少了近2.7倍建立流水线时间,并且能够灵活地调整卷积步长.
Abstract:With the development of artificial intelligence algorithms, convolutional neural networks (CNN) is more and more widely used in image, audio and other aspects, and the amount of calculation of CNN algorithms is also increasing. Weight stationary (WS) is a dataflow that maximizes the use of convolutional reuse and filter reuse by fixing weights in registers. However, the current WS dataflow structure has the problem that the pipeline filling time is lager. This paper studies a 2D-WS dataflow structure that uses PE adders to remove the FIFO between PE lines. This kind of 2D-WS dataflow structure reduces the pipeline filling time by nearly 2.7 times when calculating AlexNet, and can flexibly adjust the stride size.
-
Key words:
- convolutional neural network /
- weight stationary /
- dataflow /
- CNN acceleration /
- pipeline
-
图 1 数据在各层次移动的成本比较[3]
-
[1] DENG L, LI J Y, HUANG J T, et al.Recent advances in deeplearning for speech research at Microsoft[C]//Proceedings of2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, BC, Canada: IEEE, 2013: 8604-8608. DOI: 10.1109/ICASSP.2013.6639345. [2] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification withdeep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: ACM, 2012: 1097-1105. [3] SZE V, CHEN Y H, YANG T J, et al. Efficient processing of deep neural networks: a tutorial and survey[J]. Proceedings of the IEEE, 2017, 105(12): 2295-2329. DOI: 10.1109/JPROC.2017.2761740. [4] CHEN Y H, EMER J, SZE V. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks[C]//Proceedings of2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture. Seoul, South Korea: IEEE, 2016: 367-379. DOI: 10.1109/ISCA.2016.40. [5] FARABET C, POULET C, HAN J Y, et al. CNP: an FPGA-based processor for convolutional networks[C]//Proceedings of2009 International Conference on Field Programmable Logic and Applications. Prague, Czech Republic: IEEE, 2009: 32-37. DOI: 10.1109/FPL.2009.5272559. [6] CHEN H Y, KRISHNA T, EMER J S, et al. Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks[J]. IEEE Journal of Solid-State Circuits, 2017, 52(1): 127-138. DOI: 10.1109/JSSC.2016.2616357. [7] LU W Y, YAN G H, LI J J, et al. FlexFlow: a flexible dataflow accelerator architecture forconvolutional neural networks[C]//Proceedings of2017 IEEE International Symposium on High Performance Computer Architecture. Austin, TX, USA: IEEE, 2017: 553-564. DOI: 10.1109/HPCA.2017.29. [8] DONAHUE J. Caffenet[EB/OL]. . [9] LIN M, CHEN Q, YAN S C. Network in network[J]. arXiv: 1312.4400, 2013. [10] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 770-778. DOI: 10.1109/CVPR.2016.90. -