Another research focus of the group lies in the integration of simulation-based optimization, the modeling of Markov decision processes, reinforcement learning, and policy extraction for explainable artificial intelligence. The combination of these four approaches is particularly well-suited to merging the advantages of modern machine learning methods with the need for transparency and sustainable understanding. This allows complex and time-dependent systems to be modeled and optimized, which are often difficult to address with classical methods.

Markov Decision Processes (MDPs) are a fundamental concept for modeling decision problems where outcomes are partly random and partly determined by the decisions made. An MDP consists of a set of states, actions, transition probabilities, and a reward model. States represent the different situations in which a system can find itself, while actions represent the possible decisions that can be made in each state. Transition probabilities describe the likelihood of moving from one state to another, and rewards indicate the gains or losses associated with these transitions. By modeling decision processes as MDPs, complex decision-making can be broken down into a sequence of states and actions, making them amenable to systematic optimization.

Simulation-based optimization is a research field focused on analyzing and improving complex systems. This technique uses simulations to explore various scenarios and identify optimal solutions. By replicating real-world processes in a controlled virtual environment, different parameters and strategies can be tested to gain further insights into the underlying problem and solution structure. This approach enables the optimization of problem classes that are difficult or nearly impossible to address with classical optimization methods due to randomness and complex internal dependencies. Simulation models can also generate sufficient data for machine learning methods.

 

Reinforcement Learning (RL) is a promising approach to machine learning, where an agent interacts with its environment. The agent takes actions, while the environment responds with changes and numerical feedback. The agent then tries to choose actions based on learned experiences to maximize the cumulative feedback over the course of the learning task. By applying RL, optimal strategies in Markov Decision Processes can be found without the need to explicitly know the transition probabilities. Our research in RL focuses on developing robust algorithms that function primarily in discrete event simulation models, enabling them to be used as solution methods for complex problems.

Policy Extraction and Explainable AI address the explainability of developed models and strategies. While algorithms often produce superior solutions, understanding them remains a challenge. Through policy extraction methods, decision strategies found by RL agents can be extracted into understandable and traceable rules. This makes AI system solutions transparent and interpretable. This is especially important in safety-critical and ethically sensitive application areas, where trust in the technology is of central importance.

 

 

Modeled Markov Decision Process

© freepik

As part of this combined approach, the problems are first described and modeled as a Markov Decision Process. Based on this, a simulation model is created that sufficiently approximates the problem. With the now available set of synthetic data points, reinforcement learning can be applied to generate optimal strategies. Once this is successful, these strategies can be analyzed and interpreted.

The applications considered are highly diverse, ranging from the optimization of complex production systems to optimal resource allocation and even medical applications, where improved treatment strategies are desired.