Multi-Objective Decision Making. Diederik M. Roijers

Чтение книги онлайн.

Читать онлайн книгу Multi-Objective Decision Making - Diederik M. Roijers страница 5

Multi-Objective Decision Making - Diederik M. Roijers Synthesis Lectures on Artificial Intelligence and Machine Learning

Скачать книгу

href="#u3698952a-a64d-5556-94a7-7c552ae8928d">Chapter 6 MOVE multi-objective variable elimination Algorithm 4.5, Section 4.2.3 MOVI multi-objective value iteration Section 4.3.2 OLS optimistic linear support Algorithm 5.8, Section 5.3 OLS-R optimistic linear support with reuse Algorithm 5.11, Section 5.6 PMOVI Pareto multi-objective value iteration Section 4.3.2 PCS Pareto coverage set Definition 3.11, Section 3.2.4 PMOVE Pareto multi-objective variable elimination Section 4.2.3 POMDP partially observable Markov decision process Section 5.2.1 PF Pareto front Definition 3.10, Section 3.2.4 SODP single-objective decision problem Definition 2.1, Section 2.1 U undominated set Definition 3.4, Section 3.2 VE variable elimination Algorithm 4.4, Section 4.2.1 VELS variable elimination linear support Section 5.7 VI value iteration Section 4.3.1 Vπ value vector of a policy π Definition 2.2, Section 2.1 Π a set of allowed policies Definition 2.1, Section 2.1P Pareto dominance relation Definition 3.3, Section 3.1.2

      CHAPTER 1

       Introduction

      Many real-world decision problems are so complex that they cannot be solved by hand. In such cases, autonomous agents that reason about these problems automatically can provide the necessary support for human decision makers. An agent is “anything that can be viewed as perceiving its environment through sensors and acting upon that environment through effectors” [Russell et al., 1995]. An artificial agent is typically a computer program—possibly embedded in specific hardware—that takes actions in an environment that changes as a result of these actions. Autonomous agents can act without human control or intervention, on a user’s behalf[Franklin and Graesser, 1997].

      Artificial autonomous agents can assist us in many ways. For example, agents can control manufacturing machines to produce products for a company [Monostori et al., 2006, Van Moergestel, 2014], drive a car in place of a human [Guizzo, 2011], trade goods or services on markets [Ketter et al., 2013, Pardoe, 2011], and help ensure security [Tambe, 2011]. As such, autonomous agents have enormous potential to improve our productivity and quality of life.

      In order to successfully complete tasks, autonomous agents require the capacity to reason about their environment and the consequences of their actions, as well as the desirability of those consequences. The field of decision theory uses probabilistic models of the environment, called decision problems, to formalize the tasks about which such agents reason. Decision problems can include the states the environment can be in, the possible actions that agents can perform in each state, and how the state is affected by these actions. Furthermore, the desirability of actions and their effects are modeled as numerical feedback signals. These feedback signals are typically referred to as reward, utility, payoff, or cost functions. Solving a decision problem consists of finding a policy, i.e., rules for how to behave in each state, that is optimal in some sense with respect to these feedback signals.

      In most research on planning and learning in decision problems, the desirability of actions and their effects are codified in a scalar reward function [Busoniu et al., 2008, Oliehoek, 2010, Thiébaux et al., 2006, Wiering and Van Otterlo, 2012]. In such scenarios, agents aim to maximize the expected (cumulative) reward over time.

      However, many real-world decision problems have multiple objectives. For example, for a computer network we may want to maximize performance while minimizing power consumption [Tesauro et al., 2007]. Similarly, for traffic control, we may want to maximize throughput, minimize latency, maximize fairness to drivers, and minimize noise and pollution. In response to a query, we may want a search engine to provide a balanced list of documents that maximizes both the relevance to the query and the readability of the documents [Van Doorn et al., 2016]. In probabilistic planning, e.g., path planning for robots, we may want to maximize the probability of reaching a goal, while minimizing the expected cost of executing the plan [Bryce, 2008, Bryce et al., 2007]. Countless other real-world scenarios are naturally characterized by multiple objectives.

      In all the cases mentioned above, the problem is more naturally expressed using a vector-valued reward function. When the reward function is vector-valued, the value of a policy is also vector-valued. Typically, there is no single policy that maximizes the value for all objectives simultaneously. For example, in a computer network, we can often achieve higher performance by using more power. If we do not know the exact preferences of the user with respect to these objectives, or indeed if these preferences may change over time, it can be crucial to produce a set of policies that offer different trade-offs between the objectives, rather than a single optimal policy.

      The field of multi-objective decision making addresses how to formalize and solve decision problems with multiple objectives. This book provides an introductory overview of this field from the perspective of artificial intelligence. In addition to describing multi-objective decision problems and algorithms for solving them, we aim to make explicit the key assumptions that underly work in this area. Such assumptions are often left implicit in the multi-objective literature, which can be a source of confusion, especially for readers new to the topic. We also aim to synthesize these assumptions and offer a coherent, holistic view of the field.

      We start by explicitly formulating the motivation for developing algorithms that are specific to multi-objective decision problems.

      The existence of multiple objectives in a decision problem does not automatically imply that we require specialized multi-objective

Скачать книгу