Background on the M.2 Intelligent Inference Card

You You
Admin
Joined: 2024-11-04 03:57:22
2025-01-03 08:40:00

Background

2024 marks the first year of large-scale model deployment at the edge and endpoint levels. This year, leading smartphone manufacturers began embedding sub-3B (billion parameter) local large models in their flagship devices, demonstrating the commercial viability of such models.

In the broader IoT field, which boasts larger installations and broader applications, existing host modules such as NAS, NVRs, industrial PCs, edge computing gateways, or various SBCs (Single Board Computers) like Raspberry Pi 5 and Intel N100-based mini PCs, face challenges in upgrading to intelligent systems. These devices, using their limited CPU/GPU resources, struggle to efficiently run large models for multimodal inference, including Function Call or semantic understanding of visual data.

This not only prevents deployment of mainstream large models or more powerful multimodal models but also taxes the limited computational and bandwidth resources of the host, leading to unstable business operations and poor end-user experiences.

To address these pain points in industry-specific intelligent upgrades, we collaborated with hardware ecosystem partners to introduce the AX650N-based M.2 intelligent inference card. This solution aims to provide community developers with a more flexible, user-friendly way to expand edge AI computational capacity, fostering innovation in deploying advanced algorithms and large models on various platforms. This is a step toward making AI more accessible and inclusive.


Target Audience

The M.2 intelligent inference card aims to address the following issues for key market segments:

SBC Smart Upgrades

Single Board Computers (SBCs) like Raspberry Pi and similar domestic alternatives primarily use ARM-based CPUs. While their compute power isn't a disadvantage—given that SBCs are not specifically designed for AI applications—there are scenarios where AI capabilities are desired.

Most SBCs have M.2 2280 slots for expansion. With an M.2 intelligent inference card, developers can plug and play to extend their capabilities. While Raspberry Pi released an M.2 module based on Israel's Hailo chip earlier this year, current Hailo-based modules show limitations in running Transformer models efficiently, lack video stream encoding/decoding capabilities, and fail to meet the demands of running mainstream models like Llama3 or Qwen2.5.

NVR/NAS Smart Upgrades

For home NVRs and NAS products, M.2 2280 slots are typically reserved for expanding storage. In scenarios where users prioritize data privacy and prefer local analysis over cloud-based solutions, the M.2 intelligent inference card offers an effective way to bring AI-powered insights directly to local devices.

Industrial PC (IPC) Upgrades

In industrial applications, existing IPCs need additional computational power to deploy state-of-the-art visual or multimodal models for higher precision in tasks. For instance, Transformer-based models like DepthAnything or SAM can improve video processing accuracy. Adding the M.2 intelligent inference card mitigates the limitations of CPU-bound computing in these scenarios.

Robotics Applications

Robotics applications benefit from reduced computational strain on the main control platform by offloading visual perception tasks, especially those using Vision Transformer models. This solution enhances overall system stability and reliability.

Exploration of AIPC Capabilities

The M.2 intelligent inference card also enables the exploration of realistic AI PC (AIPC) requirements, countering inflated claims like the purported 40T computational demand from certain industry players.

You You
Admin
Joined: 2024-11-04 03:57:22
2025-01-03 09:00:49

Specifications

project

description

processor

Octa-core [email protected]

memory

4GiB or 8GiB,64bit LPDDR4x

storage

16MiB,Nor Flash

intelligent

18TOPs@Int8,72TOPs@Int4;

Support CNN and Transformer model deployment;

LLM and VLM deployments are supported

Video encoding

H.264/H.265, 16-way 1080@30fps coding

Video decoding

H.264/H.265, 32-channel 1080@30fps decoding

Host adaptation

Supports Intel, AMD, NXP, Xilinx, Raspberry Pi, Rockchip, etc

Host system

Linux: Ubuntu, Debian, CentOS, etc.

Windows (adapting)

Dimensions

M.2 2280,M Key

Operating voltage

3.3 V

System power consumption

<8 w

You You
Admin
Joined: 2024-11-04 03:57:22
2025-01-03 09:08:58

Verified platform

project

remark

Master control platform

RaspberryPi5、Radxa 5B、OrangePi5、Firefly3576、Firefly3568、Intel、AMD、NXP……

Master control system

Ubuntu24.04、Ubuntu22.04、Debian12、OpenKylin、OpenEuler……

You You
Admin
Joined: 2024-11-04 03:57:22
2025-01-03 09:10:10

Software architecture

In order to be "plug and play", we have introduced the software architecture of AXCL. AXCL is a C and Python language API library for developing deep neural network inference, transcoding and other applications on the Axera chip platform, providing APIs such as running resource management, memory management, model loading and execution, and media data processing.

 

Home    
Games    
Auto News    
Headline    
News    
Tools    
Community