An Efficient Implementation of TensorFlow Lite for RISC-V Vectors - Mostafa Hagog, SiFive When deploying a neural network (NN) as part of a low-power edge application such as mobile or IoT devices, designers must trade-off flexibility and power efficiency. Hard-wired accelerators are often chosen for their inherent parallelism and performance; however, accelerators are rigid, may be difficult to program, and are not necessarily suited for some NNs that can’t take advantage of the parallelism provided. Conversely, general-purpose processors, ubiquitous in edge applications, may lack the compute efficiency needed under a strict power budget. A third option is a processor, optimized for parallelizable workloads, that can scale to many-core to deliver the necessary performance of multiple tera-ops per second that machine learning algorithms may require. Consequently, demand for vector-enabled, general-purpose processors that are compiler-friendly is rapidly growing. Using the widely deployed MobileNet CNN as an
Hide player controls
Hide resume playing