Identify different environments (indoor, outdoor, in-car) using a simple microphone.
The goal of Acoustic Scene Classification (ASC) is to classify the actual environment into one of the provided three predefined classes (indoor, outdoor, in-vehicle) characterized with the acoustic captured by a single digital microphone.The demo runs on a small form factor board Sensor Tile that comes along with a smartphone application connected through Bluetooth Low Energy.
We used the FP-AI-SENSING1 function pack to build this example, running on an STEVAL-STLKT01V1 board. The ASC configuration captures audio at a 16 kHz (16-bit, 1 channel) rate using the on-board MEMS microphone.Every millisecond, a DMA interrupt is received with the last 16 PCM audio samples. These samples are then accumulated in a sliding window consisting of 1024 samples with a 50% overlap. For every 512 samples (i.e.,32 ms), the buffer is injected into the ASC preprocessing for feature extraction.The ASC preprocessing extracts audio features into a LogMel (30x32) spectrogram.
For computational efficiency and memory management optimization, the step is divided into two routines:
Every 1024ms, the (30x32) LogMel spectrogram is fed to the ASC convolutional neural network input, which can then classify the output labels: indoor, outdoor and in-vehicle.
Model ST Convolutional Neural Network Quantized
Input size: 30x32
Complexity 517 K MACC
Memory footprint:
31 KB Flash for weights
18 KBRAM for activations
Performance on STM32L476 (Low Power) @ 80 MHz
Use case: 1 classification/sec
Pre/Post-processing: 3.7 MHz
NN processing: 6 MHz
Power consumption (1.8 V)
Confusion matrix
A free STM32Cube expansion package, X-CUBE-AI allows developers to convert pretrained AI algorithms automatically, such as neural network and machine learning models, into optimized C code for STM32.
The STM32 family of 32-bit microcontrollers based on the Arm Cortex®-M processor is designed to offer new degrees of freedom to MCU users. It offers products combining very high performance, real-time capabilities, digital signal processing, low-power / low-voltage operation, and connectivity, while maintaining full integration and ease of development.