End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet? | Gianmarco Ottavi, Geethan Karunaratne, Francesco Conti, Irem Boybat, Luca Benini, and Davide Rossi

In-Memory Acceleration (IMA) promises major efficiency improvements in deep neural network (DNN) inference, but challenges remain in the integration of IMA within a digital system. We propose a heterogeneous architecture coupling 8 RISC-V cores with an IMA in a shared-memory cluster, analyzing the benefits and trade-offs of in-memory computing on the realistic use case of a MobileNetV2 bottleneck layer. We explore several IMA integration strategies, analyzing performance, area, and energy efficiency. We show that while pointwise layers achieve significant speed-ups over software implementation, on depthwise layer the inability to efficiently map parameters on the accelerator leads to a significant trade-off between throughput and area. We propose a hybrid solution where pointwise convolutions are executed on IMA while depthwise on the cluster cores, achieving a speed-up of 3x over SW execution while saving 50% of area when compared to an all-in IMA solution with similar performance.

Download the full paper.

End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet? | Gianmarco Ottavi, Geethan Karunaratne, Francesco Conti, Irem Boybat, Luca Benini, and Davide Rossi

Previous PostSemiconductor veterans gather to design customizable, chiplet-based RISC-V server processors | Chris Williams, The Register

Next PostAndes Technology and Cyberon Collaborate to Provide Edge-Computing Voice Recognition Solution on DSP-capable RISC-V Processors | Intrado

Stay Connected With RISC-V