Christian Doppler Laboratory for Embedded Machine Learning
At the forefront of a groundbreaking collaboration between the Christian Doppler Society and industrial partners, our focus lies in applied research to fine-tune artificial neural networks on embedded platforms. Leveraging state-of-the-art technologies, we optimize latency, throughput, resource usage, and power consumption while advancing sophisticated methods for dynamic online learning in embedded systems.
About us
A Christian Doppler Laboratory (CD-Lab) is funded in equal parts by the Christian Doppler Gesellschaft and by the industry partners. It combines basic and applied research to further the field and leverage the strengths of the industrial partners to help them staying at the cutting edge in their domains. The planned duration of the CD-Lab is seven years, October 2019 - September 2026. Our scope is the mapping process of articfcial Neural Networks (NN) onto embedded platforms with tight resource constraints. We develop neither new NN architectures nor new hardware platforms or hardware devices, but we use state of the art NNs and platforms and - estimate latency, throughput, resource usage, and power consumption - analyze and compare implementations of a given NN on various target platforms - select NN variants and transform and optimize them for a given target platforms - develop methods for dynamic on-line learning methods in embedded systems
The CD-Lab is formed by five partners:
The Institute of Computer Technology (ICT) at TU Wien with Prof. Axel Jantsch, Prof Hermann Kaindl and Prof. Thilo Sauter, has a strong research focus on embedded systems covering the full spectrum from analog hardware to embedded software. A. Jantsch’s group studies Systems on Chip (SoC) architecture and design methods and has gained international recognition for work on Network on Chip, self-aware SoC, on-chip resource management, and hardware security. ICT publishes about 80 peer reviewed papers per year in top journals and conferences and is well established in the international community with its staff regularly organizing special sessions, workshops and conferences, as reviewers, journal editors and guest editors.
Institute for Computer Graphics and Vision (ICG) at the TU Graz with Prof. Horst Bischof, Prof. Vincent Lepetit, Prof. Thomas Pock and Prof. Dieter Schmalstieg, is the only Austrian academic group with the charter to address both computer vision and computer graphics. The ICG is carefully nurturing a culture of digital visual information processing. Research at the ICG is focused on machine vision, machine learning, medical image analysis, object reconstruction and object recognition, computer graphics and visualization. The institute is home to 7 civil service positions and about another 70 soft money positions, making it one of the largest institutes at Graz University of Technology. During the most recent 5-year period, the institute was responsible for 85 diploma theses and the award of 49 doctorates. Over the last years, our researchers authored on average about 90 publications a year in scientific journals, book chapters and international conferences. The ICG has experience in national (FWF and FFG) as well as EU-funded projects.
Siemens AG is a technology company focused on industry, infrastructure, transport, and healthcare. From more resource-efficient factories, resilient supply chains, and smarter buildings and grids, to cleaner and more comfortable transportation as well as advanced healthcare, the company creates technology with purpose adding real value for customers. By combining the real and the digital worlds, Siemens empowers its customers to transform their industries and markets, to transform the everyday for billions of people. Founded in 1847 the company today has around 293,000 employees worldwide.
Mission Embedded develops and supplies high-reliability embedded systems for professional applications in the fields of transportation, industry, air traffic control and medical technology. This is our passion. Our particular focus lies in assistance systems, autonomous driving, autonomous machinery as well as AI and machine vision in safety and security applications. Our tailor-made solutions enable our clients to make their innovation projects a reality within a short period of time. Mission Embedded is a member of the Frequentis Group and as such builds on more than 70 years of expertise and innovation in mission-critical applications. Customers benefit from practical experience in a large variety of fields from railway and air traffic management to industry and medical. Mission Embedded experts provide support during all phases of a product’s life-cycle, from the concept over the system design to production and maintenance.
AVL (Anstalt für Verbrennungskraftmaschinen List) is an Austrian-based automotive consulting firm as well as an independent research institute. With more than 11,500 employees, AVL List GmbH is the world's largest independent company for the development, simulation and testing of all types of powertrain systems (hybrid, combustion engine, transmission, electric drive, batteries, fuel cell and control technology), their integration into the vehicle and is increasingly taking on new tasks in the field of assisted and autonomous driving as well as data intelligence. The company was founded in 1948 with the headquarters in Graz, Austria. It provides industry-leading technologies and services based on the highest quality and innovation standards to help customers reduce complexity and add value. AVL’s mission is to provide leading technologies and superior services to our customers to create a better world by driving mobility trends of tomorrow.
The Christian Doppler Research Association (CDG) is named after the Austrian physicist and mathematician, Christian Andreas Doppler. He is primarily renowned for his discovery known as the "Doppler Effect". The universality of the "Doppler Effect" applies to a wide spectrum of uses in natural sciences and technology.
The non-profit association aims at promoting development in the areas of natural sciences, technology and economy as well as for their economic implementation and utilisation. It enables talented scientists in renowned research centres to achieve high-quality research and knowledge transfer in line with the demands and to the advantage of the CDG member companies.
The aim of the Federal Ministry for Digital and Economic Affairs (BMDW) is to drive the positive development of the business location further, to actively use the opportunities of digitization for economy and society and to promote entrepreneurship. Together with private companies, the BMDW sponsors research projects through the Christian Doppler Research Association.
Research
The Christian Doppler laboratory for Embedded Machine Learning conducts research on Deep Neural Network (DNN) based machine learning in resource constrained embedded devices. It studies the design space that is characterized by DNN architecture parameters, DNN optimization and transformations, various implementation platform configurations, and mapping options. This design space is huge, poorly understood, and rapidly evolving. Our focus is not DNN theory, but DNN implementation under tight cost and energy constraints. The CD lab is organized in three work packages:
- WP1, Embedded Platforms, assumes a given DNN and study FPGA, GPU, and SoC platforms, and their configuration. It focuses on platform dependant optimization and mapping.
- WP2, DNN Architecture and Optimization, studies DNN transformations for a given, fixed target platform. Its focus is on platform independent DNN optimization.
- WP3, Continuous Learning, studies continuous, in-device learning architectures and methods and their implementation and operation on resource constrained embedded devices.
The CD lab conducts world leading research on embedded machine learning in the application domains of computer vision for autonomous systems. For these applications the lab’s objective is to develop world leading architectures and methods with (1) the highest accuracy within a given energy budget, (2) the lowest energy consumption for a given target accuracy, and (3) the ability to do life-long learning in resource constrained environments.
Description
Modules
The CD Laboratory for Embedded is divided into two modules: Module 1 and Module 2. Module 1 focuses on the application of hardware configuration optimization, hardware-aware neural network optimization, and the selection of proper hardware and network combinations for specific applications. Module 2 focuses on continuous learning in autonomous driving and the general improvement of object detection, for instance, in bad weather conditions.
During the first two years, 2020-2021, we have built up a strong competence on estimation methods, benchmarking and assessment across different platforms, and platform aware network optimizations. Perhaps most importantly, we have developed an infrastructure of tools, flows, scripts and guides that will facilitate further experiments and research. Currently, we are further strengthen these aspects since there we locate significant added value for our partners. In addition, we plan to broaden our work area in two directions:
-
More focus on time series and spatio-temporal data and the corresponding applications. The analysis of time series and spatio-temporal data requires also to broaden our scope of networks to include RNNs, Autoencoder Networks (AEs) and other network types.
-
Distributed implementation scenario: Smart cameras and other sensors generate large amountsof data to be processed and analyzed. For reasons of communication overhead, privacy or security, it is often attractive to perform data analysis and inference in or close to the sensor. However, sensor nodes have usually limited compute and energy resources. Partitioning the processing pipeline and distributing its implementation on the sensor node, an access point and a cloud server is a challenging design trade-off that depends on the sensor node, the communication link, the access point, and the application requirements. Partitioning DNN based inference is particularly intricate because, unlike in traditional data processing pipelines,the data volume increases during processing and remains at a very high level until the very end of the DNN pipeline. Thus, any naive cut in the middle will lead to a large volume of data to be transmitted over the communication link.
Work Package 1 - Embedded Platforms
In WP1 we study methods for evaluating, selecting, and configuring a platform and mapping a given, trained DNN onto the selected platform. It represents hardware specific topics. The embedded platform is designed and optimized for inference, not for training. For estimation techniques will consider and build upon existing analytic models or layer-wise energy estimation. In cases where analytic models do not provide sufficient fidelity and accuracy, we apply a data-driven approach. This work package’s research question is: Given a network, what is the best platform to use?
Work Package 1.1 Estimation
Further development of the estimators ANNETTE and Blackthorn to include power estimation, to generalize them for more platforms, to make them more robustand mature, and to extend them to RNNs.
Work Package 1.2 Video
By working with a stream of images, performance of detection can be increased by considering temporal patterns. We extend processing on embedded hardware to Video Object Segmentation and Tracking (VOST) applications. It includes handling sparse 3D convolutions as well as efficient point cloud processing. In cooperation with WP3.2 and WP3.3, we will find ways to apply those networks on embedded platforms in an optimized way.
Work Package 1.3 Single Platform Mapping
A toolbox for platform specific optimization and mapping for our target platforms. The techniques developed in phase 1 for mapping of CNNs on different platforms will be further developed to automate them to a higher degree and to cover more platform variants. Moreover, it will be extended to RNNs like long-short term memories and Gated Recurrent Units as well.
We study methods for evaluating, selecting, and optimizing deep neural networks to be mapped onto a given target platform for inference. We explore the possible architectural choices and the exploration and mapping process of networks. Based on estimation techniques, various DNN choices, derived by top-down and bottom-up approaches can be evaluated for candidate target platforms. This work package’s research question is: Given a target platform and an application, what are the neural network design and optimization strategies?
Focus is placed on the following challenges:
-
We concentrate on combinations of optimization methods, rather than individual methods like pruning and quantization, as motivated in the introduction.
-
We cover CNNs and RNNs, driven by application use cases. Time series applications is a new focus area in this project.
-
In addition to single node platforms we target distributed implementations where the front-end processing pipeline is mapped to the embedded platform, and the back-end to a server.
The main platforms under consideration are Xilinx boards (e.g. Ultrascale and Versal AI Core Series with Vitis AI), Nvidia (Jetson), ARM based (e.g. RasberryPi, STM32 with ARM NN) and Intel (e.g. Myriad X). In addition, NXP’s i.MX 8M Plus has beenidentified as another platform of interest. “best” DNN design and optimization strategy?
Work Package 2.1 Image Driven
The focus is on image and video driven hardware-aware optimization use cases considering appropriate DNNs.
Work Package 2.2 Time Series Driven
The focus is on time series driven hardware-aware optimization use cases considering appropriate DNNs.
Work Package 2.3 Distributed Mapping
This task considers hardware, algorithms and software in the TinyML range, capable of performing on-device sensor data analytics at extremely low power, enabling always-on use-cases and targeting battery operated devices. For the case that the sensor node is too limited to host the complete DNN, the DNN has to be split into a front-end, mapped onto the platform of the sensor node, and a back-end, mapped onto a server. We develop platform aware DNN transformation and partitioning methods under the constraints of the sensor node platform, the communication link and the application requirements.
Work Package 2.4 Optimization Toolbox
We develop a toolbox for platform aware DNN optimization, partitioning and mapping methods for our target platforms.
Our focus is on continuous learning systems implementable on embedded systems, with autonomous driving as important use case. A main scenario under consideration is as follows: Suppose you have a semi-autonomous car that uses a pre-trained network for object detection and recognition. The system was trained on a specific data set,e.g. a dataset collected in the USA. While you drive, the car continuously collects data. The border cases are of special interest, i.e. where the recognition system fails. After you come back to your home, this data is used to re-train the system. Thereby the system gets better each day you take it out. We have chosen this scenario because it is:
-
Close to on-line learning, but feasible to implement on the current available hardware.
-
A realistic application scenario for our industrial partner.
-
Providing us with the freedom to experiment with different on-line methods.
Work Package 3.1 Demonstrator
For life-long learning we need a running system that we can use to get data to retrain/reevaluate our models constantly. The goal of the demonstrator is two-fold. It serves asa realistic data collection platform that we have access to. For our scientific goals this is of immanent importance. It allows us to have a constant exchange platform with our industrial partner. We can easily transfer our research results, this gets more important as the project progresses.
Work Package 3.2 Exploiting temporal information
Temporal information in 2D and 3D proves to be a valuable source for mining additional labels for either improving existing detectors, for domain transfer, or for adaptation. Both, short term temporal information (e.g. scene flow) or long term tracking information as obtained for multiple object trackers (MOT), is important. In particular, we extend ongoing work that focuses on unsupervised domain adaptation through re-training with pseudo-labels.
Work Package 3.3 3D domain transfer learning
Unsupervised domain transfer is a hot research topic. The area of domain transfer for 3D data has hardly been addressed so far. Due to the shortage of labeled 3D data (as compared to images) this is a pressing problem. Our initial experiments show an extreme sensitivity of methods to changes in the LiDAR, mounting of the LiDAR, environment and even annotation style.
Work Package 3.4 Dynamic learning
There is the need to perform short term learning or adaptation. A typical example is changing weather conditions or fast illumination changes. We study methods that are able to perform this fast learning, i.e. learning form a single example (or short video sequences). A particular example is a detector that has been trained in good weather conditions but needs to adapt quickly to rainy conditions. The approach discussed in shows that using simple auxiliary tasks like image rotation can be used to retrain from a single image. In fact, we can formulate this problem as a feature re-weighting problem using auxiliary tasks. If this is successful, this will open a whole new research area.