Image analytics
Image analytics is a classic AI application area. The availability of huge numbers of images on the web and of pre-classified data sets has recognition of various object types. For example, real-time recognition of a constantly changing scene based on video streaming requires high data bandwidth if performed in the cloud. Alternatively, AI on the Edge enables local analysis of the visual scene in various flavors, such as understanding the scene for context analysis, simultaneous multi-object detection and recognition for obstacle avoidance, people identification for secure access, and more.
More use cases include:
- Surveillance and Monitoring: Deep Learning-enabled smart cameras could locally process captured images to identify and track multiple objects and people, detecting suspicious activities directly on the edge node. These smart cameras minimize communication with the remote servers by only sending data on a triggering event, also reducing remote processing and memory requirements. Intruder monitoring for secure homes and monitoring of elderly people are typical applications.
- Autonomous Vehicles: A smart automotive camera can recognize vehicles, traffic signs, pedestrian, road, and objects locally, sending only information needed to perform autonomous driving to the main controller. A similar concept can be applied to robots and drones.
- Expression Analysis to improve shopping, advertising, or driving: An individual’s emotional reaction can provide clues to their degree of acceptance of a service, like/dislike of various products shown on the shelves in a shop, or their level of stress, which can be used to understand and modulate the type and the amount information delivered.
Audio Analytics
AI and Deep Learning can analyze a visual scene in all its elements, much as an audio scene can be split into its basic parts to enable the following functions by deep learning.
- Audio Scene Classification can help understand location to trigger features, including ad hoc noise reduction location-specific voice interface, and disable touch/write capabilities to a smartphone when in a car (driver mode).
- Audio Event Detection: Detecting sounds such as a baby crying, glass breaking, or a gunshot can trigger an action, including notifications or location detection, via triangulation. Since understanding specific sound events in multisource conditions is a latency-critical task, AI at the Edge can be very fast and effective recognizing an audio event among numerous overlapping sound sources. Recognizing a car or truck approaching or screeching brakes can, for example, be a lifesaver.
At the same time, human speech analysis and understanding is a key feature for advanced Human-Machine Interaction and research is providing more and more precise solutions in this area. Artificial Neural Networks are contributing, too. Natural Language Processing (NLP), however, is a complex task which can be attacked in various forms.
- One way, which uses limited resources, is Keyword Recognition. This approach uses a limited vocabulary of activating words that are useful to the application. A lamp, for example, does not need to know much more than “on,” “off,” “brighter,” and dimmer” to be useful.
- Text To Speech (TTS) and Speech to Text (STT) are two examples of complex tasks in which AI and DL are used to bring these functionalities on the Edge. Examples are hands-free text read and write functions in automotive, where the driver can keep attention on his main task (drive the car) while interacting with the infotainment system.
- Finally, DL based Speech Recognition is used in Conversational User Interfaces (CUI) where abilities of NLP are drastically augmented by allowing, for example, a Chatbot to interact (dialogue) with a human grade conversation.
Inertial Sensor/Environmental Sensor Analytics
Smartwatches and fitness bands, as well as smart buildings, homes, and factories extensively exploit inertial and environmental sensors. A deep-learning-enabled processing-on-the-edge allows quicker analysis of local situations and faster response. Some examples are:
- Predictive Maintenance in Factories: Sensors attached to a machine can measure vibration, temperature, and noise levels and AI performed locally can infer the state of the equipment, potential anomalies, and early indications of failure. In this case, local Deep Learning could also communicate with cloud-based services to deliver data for specific analyses and corrective actions.
- Body Monitoring: Our wearable devices collect a lot of data about our activity, location, heart rate, among other things. This information can be correlated with health, stress levels, diet, and potentially alert wearers to a potential health issue before it becomes critical.
These are just a sample of the opportunities. Clearly, ANNs can be further exploited for multimodal context analysis by receiving data from a variety of data sources and applying specific neural-network models to recognize more than just audio, video, or sensor data while simultaneously fusing all of it to better understand what is happening around the user, providing support to automate further actions.