Battery electric vehicles (BEVs) already offer high energy efficiency, but rein­forcement learning (RL) presents a promising path to further optimization – especially in dual motor configurations. BMW Group is exploring RL-based control strategies to unlock remaining efficiency potential by dynamically optimizing torque distribution using data-driven algorithms. This software-centric approach minimizes energy utilization – without any changes to the vehicle’s hardware. 
Figure 1: Reinforcement learning (RL) workflow: An agent learns to make decisions by trying things out and getting feedback.

Reinforcement learning (RL) is a subfield of artificial intelligence in which agents make decisions based on input data. RL originated in game-playing applications such as chess and Go and is now being applied to a broad range of domains. The agent selects an action from the current state of the environment, and the action is executed in the next time step. The agent receives a reward that indicates how well the chosen action meets defined criteria. Through numerous interac­tions with the environment, the agent learns a policy that maximizes the long-term reward. RL agents are particularly advantageous when the relationship between state and optimal action is highly complex. In many scenarios, RL systems can achieve performance that surpasses that of human experts or conventional programming approaches.

Dual-Motor Electric Vehicles optimized by AI
Figure 2: RL setup in the real vehicle. RTMaps is used for implementing the RL agent. © BMW Group

Dual-Motor Electric Vehicles optimized by AI

As part of a BMW Group research project, RL agents are applied to powertrain operating strategies in electrified vehicles. The scenario involves a vehicle with two electric machines: one on the front axle and one on the rear axle. Under typical operating condi­tions, the total wheel torque requested by the driver can be set at the front axle, the rear axle, or distributed between them. The agent’s task is to determine a torque distribution for each operating point that minimizes the electrical powertrain’s energy consumption. 

Scalable, Efficient Deployment of RL Agents using Python
Figure 3: The XCP protocol enables read and write access to ECU memory, facilitating seamless in-vehicle deployment of new software.

Scalable, Efficient Deployment of RL Agents using Python

Most artificial intelligence (AI) appli­cations are implemented using the Python programming language. A wide range of established RL algorithms (e.g., DDPG, TD3, SAC, PPO) are available as open-source implementations (see “ Reinforcement learning (RL) ”). Python also enables convenient graphical processing unit (GPU) par­allelization, significantly accelerating agent training. To minimize develop­ment effort, reusing these existing Python implementations is advantageous. The algorithms should run on a compact Linux-based industrial computer with an ARM processor. This offers several benefits: 

  • The hardware can be integrated in the vehicle without complex modi­fications or additional safety mea­sures. 
  • Low power consumption minimizes load on the vehicle’s electrical net­work. 
  •  A short boot time enable prompt data generation. 

Therefore, it is crucial to implement a lean process with minimal overheads. 
 

Learning Efficiency

A critical challenge in the development of robust RL agents is the appropriate definition of both state and action space, as well as the formulation of the reward function. In this project, the state is composed of vehicle speed, requested torque, battery voltage, steering angle, and several temperatures of the electric machines, among other signals. The action is a scalar value representing the desired percentage-based torque distribution between the vehicle’s electric motors. The reward is simply formulated as the negative electrical power input to the electric machines (since reward is maximized in RL). We deliberately avoided utilizing power loss as a metric, as this would require mechanical power calculations and wheel torque measurements which are not directly available but rather calculated in production vehicles. Using calculated val­ues could lead agents to inherently learn to exploit potential modeling errors, which would reduce the outcome quality. The large training data sets acquired during the learning phase contain many operating points that have the same state but different actions, resulting in different rewards or power consumption values. The RL agent can discern these variations in power con­sumption and adapt the operating strategy accordingly. This approach remains feasible during recuperation phases where electrical power becomes negative.

 

Figure 4: Schematic implemetation of an reinforcement learning (RL) agent in the RTMaps Python bridge framework. © BMW Group

Consumption Optimization Across the Entire Drive Chain, Including Tire Losses
Figure 5: Electric drivetrain of a BMW I4 with two electric machines. The torque distribution between the two machines is optimized by an AI-based strategy. © BMW Group

Consumption Optimization Across the Entire Drive Chain, Including Tire Losses

Choosing electrical power consump­tion as the reward provides the ad­ditional benefit that consumption is optimized across the entire drive chain, including tire losses. A good choice of torque distribution can not only operate the electric machines at beneficial operating points but also reduce tire slip and thus further reduce energy losses. For the sake of completeness, it should be mentioned that RL agents provide the capability to optimize not only the reward for the current operating point but also the cumulative future reward. In the context of optimal torque distribution, this means that energy consumption can be minimized over an entire drive trajectory, not just at discrete time points. As an example, the agent can learn to maintain electric machine temperatures within an efficient operational range through strategic torque distribution allocation in earlier timesteps. 

Deploying RL Agents for Real-Time Control with RTMaps

The RL agent workflow is implemen­ted using the RTMaps middleware from Intempora (a dSPACE Company). It provides a Python bridge that seam­lessly integrates Python code into the signal processing pipeline. Within this framework, input arguments (state and reward) and output arguments (action) are defined, which can then be connected by a graphical interface with other signal blocks. The code structure executes a core function at a predetermined sampling rate, allow­ing input data to be processed using any standard Python library. With mod­est adaptations, the existing code can be integrated into the Python bridge. The RL agent receives state and re­ward information and computes the corresponding action by utilizing its underlying neural networks. To ensure responsive torque distribution control, a sampling rate of 100Hz was selec­ted. Concurrently, the system logs data to the onboard computer for subse­quent analysis and to facilitate train­ing on more powerful cloud-based systems. Communication with the drive control unit is established via the XCPoverCAN protocol supported within RTMaps. This integration is achieved by gene­rating a configuration file using the dSPACE Interface Manager that de­fines the input and output signals to the control unit. This configuration file is then compiled into an RTMaps block and incorporated directly into the workflow. For deployment, RTMaps Runtime for embedded platforms en­ables execution on ARM architectures without requiring a graphical user in­terface, thereby reducing computa­tional overhead. The workflow is initi­ated automatically upon system boot, creating a streamlined operational process. 

Validating RL Agents in Real Vehicle Scenarios

Generating a robust RL agent requires a large amount of training data. Thus, the vehicle is driven through a wide range of operating conditions while state, action, and reward are stored in a so-called replay buffer. Based on the replay buffer, the agents are trained using off-policy reinforcement learning algorithms that adapt their underlying neural networks. The fully trained agents are subse­quently deployed to the vehicle and benchmarked against the conven­tional operating strategy in different operating points. The measurements confirm energy savings in the low­single-digit percentage range, depend­ing on the operating point. In some operating regions, the RL-derived strategy converges with the conven­tional approach, confirming the opti­mality of the existing control metho­dology. While the energy savings appear modest at first glance, they represent significant value, consider­ing the fact that they require only software adaptation without any costly hardware modifications. These impro­vements are particularly noteworthy as they extract efficiency gains from an already highly optimized system, targeting the last remaining percent­ages of potential energy savings that conventional methods have been un­able to unlock. 

Figure 6: Integration of the AI computer in a vehicle. © BMW Group

AI-Driven, Optimized Torque Distribution

The results demonstrate that RL agents can effectively identify complex cor­relations between multiple state variables – including driving demands, battery voltage, and electric machine temperatures – to determine optimal torque distributions that minimize power consumption in ways difficult to achieve through conventional con­trol engineering methodologies. 

The RL agents’ outcomes can be integrated into production vehicles through different implementation pathways. The agent’s learned policies can serve purely as analytical tools, providing insights into complex physical corre­lations that inform enhancements to conventional control strategies – keep­ing the development process engineer-centric while leveraging AI-derived insights. Alternatively, the fully validated agent can be deployed directly to vehicle control units as a determinis­tic function that maps states to actions with consistent behavior, effectively transforming the AI methodology into an embedded product component. This deterministic nature ensures that for any given state, the agent will re­liably produce identical control actions, maintaining the predictability required for automotive systems. 
 

RTMaps as a Key Enabler for Embedded AI Deployment in Vehicles

The intuitive RTMaps middleware proved to be a key enabler for suc­cessfully implementing these agents on vehicle computers and managing various data signals and streams. The platform allowed existing Python algorithms to be used with only minor adjustments. Its low hardware require­ments ensured that deployment costs in the vehicle remained minimal, facilitating rollout to additional development vehicles with different powertrain configurations. Data exchange with the vehicle using XCPoverCAN provided fast and reliable communication. Notably, at the start of the pro­ject, the XCPoverCAN and XCPover-Ethernet interfaces were not yet available; however, they were implemented upon request within three months, allowing the project to proceed without delays. 

Next Steps: Extending AI Control to Thermal Management Systems

The RL agent approach extends beyond torque distribution to various operating strategies involving complex physical relationships. A promising application is thermal management system control, where components such as pumps, flaps, valves, and fans can be optimally controlled to minimize energy consumption while maintaining appropriate thermal conditions.  

 

 

Dr. Benjamin Schläpfer, BMW Group 

About the author

Dr. Benjamin Schläpfer is AI engineer in the powertrain research department at BMW Group in Garching, Germany 

 

dSPACE MAGAZINE, PUBLISHED DECEMBER 2025

At a Glance                                                                     

Task

Improve energy efficiency in battery electric vehicle (BEV) powertrains using AI-based control strategies. 

Challenge
  • Design reinforcement learning (RL) agents with meaningful state, action, and reward definitions. 
  • Ensure real-time performance on embedded hardware. 
  • Integrate seamlessly into existing vehicle systems. 
Solution
  • Use Python-based RL algorithms deployed via RTMaps on ARM-based vehicle computers. 
  • Train agents with real driving data using off-policy learning. 
  • Communicate with vehicle control units via XCPoverCAN by RTMaps. 
Benefit
  • Achieve measurable energy savings without hardware changes. 
  • Enable scalable deployment across vehicle platforms. 
  • Provide insights for enhancing conventional control strategies. 

Reinforcement Learning

Reinforcement learning (RL) is a machine learning methodology where an agent learns optimal policies through trial-and-error interaction with its environment. The agent receives numerical reward signals that indicate action quality, enabling it to iteratively improve its decision-making strategy to maximize cumulative reward. Examples of RL algorithms include: 

  • Deep Deterministic Policy Gradient (DDPG) 
  • Twin Delayed DDPG (TD3) 
  • Soft Actor Critic (SAC) 
  • Proximal Policy Optimization (PPO) 

The individual algorithms have different advantages and disadvantages. The choice of a suitable algorithm depends on the problem setup, more specifically on the environment with its state, action, and reward structure. A good and detailed overview of established RL algorithms is provided by the following source: 
Spinning Up in Deep Reinforcement Learning. https://github.com/openai/spinningup
(Joshua Achiam, 2018) 
 

Related Topics

  • Artificial Intelligence
    Artificial Intelligence

    dSPACE accelerates AI-driven development with generative AI for scenario search, scenario generation, automated V‑ECU creation, and test case generation, ensuring efficient validation and compliance with EU AI Act and ISO/PAS 8800.

Product Information

  • RTMaps
    RTMaps

    Software environment for the development, testing, benchmarking, and validation of sensor fusion and perception algorithms in multisensor applications.

Stay up-to-date with our dSPACE direct newsletter service.

With our dSPACE newsletter service, we will keep you informed about current use cases and new solutions and products, as well as trainings and events. Sign up here for a free subscription.

Enable form call

At this point, an input form from Click Dimensions is integrated. This enables us to process your newsletter subscription. The form is currently hidden due to your privacy settings for our website.

External input form

By activating the input form, you consent to personal data being transmitted to Click Dimensions within the EU, in the USA, Canada or Australia. More on this in our privacy policy.