Multi-Objective Decision Making. Diederik M. Roijers

Чтение книги онлайн.

Читать онлайн книгу Multi-Objective Decision Making - Diederik M. Roijers страница 8

Multi-Objective Decision Making - Diederik M. Roijers Synthesis Lectures on Artificial Intelligence and Machine Learning

Скачать книгу

2.3 A multi-agent MODP is cooperative if and only if all agents get the same (team) value, Vπ, for executing a joint policy π ∈ Π i.e., there are no individual rewards. A single-agent MODP is cooperative by default.

      This definition of cooperative is common in the field of decision theory, e.g., in multi-agent MDPs [Boutilier, 1996, Scharpff et al., 2016] and Dec-POMDPs [Oliehoek and Amato, 2016]. However, the term “cooperative” is used differently in cooperative game theory [Chalkiadakis et al., 2011, Igarashi and Roijers, 2017], in which agents form coalitions on the basis of their individual utilities. In this book, we consider only decision problems that are cooperative according to Definition 2.3.

      In an SODP, the value function provides a complete ordering on the joint policies, i.e., for each pair of policies π and π′, Vπ must be greater than, equal to, or less than Image. By contrast, in an MODP, the presence of multiple objectives means that the value function Vπ is a vector rather than a scalar. Such value functions supply only a partial ordering. For example, it is possible that, Image but Image. Consequently, unlike in an SODP, we can no longer determine which values are optimal without additional information about how to prioritize the objectives, i.e., about what the utility of the user is for different trade-offs between the objectives.

      In the unknown weights and decision support scenarios (Figure 1.1), the parameters of the scalarization function w, or even f itself, are unknown during the planning or learning phases. Therefore, an algorithm for solving an MODP should return a set of policies that contains an optimal policy for each possible w. Given such a solution set, the user can pick the policy that maximizes her utility in the selection phase. We want the solution set to contain at least one optimal policy for every possible scalarization (in order to guarantee optimality), but we also want the solution set to be as small as possible, in order to make the selection phase as efficient as possible. We discuss which solution sets are optimal, and how this can be derived from different assumptions about the scalarization function f (Definition 1.1), and the set of permitted policies Π in the MODP in Chapter 3. In the rest of this section, we introduce two different MODP problem classes.

      The first class of MODPs that we treat is the multi-objective coordination graph (MO-CoG).1 In a MO-CoG, multiple agents need to coordinate their actions in order to be effective. For example, in the mining problem of Figure 1.2, each agent represents a van with workers from a single village. Each of these vans can go to different mines within reasonable traveling distance, leading to a set of different possible actions for each agent. Each mine yields a different expected amount of gold (the first objective) and silver (the second objective). Because mining can be done more efficiently when more workers are present at a mine, it is vitally important that the different agents (i.e., vans) coordinate which mines they go to.

      Other examples of problems that can be modeled as a MO-CoG are: risk-sensitive combinatorial auctions, in which we want to maximize the total revenue, while minimizing the risk for the auctioneer [Marinescu, 2011], and maintenance scheduling for offices in which the energy consumption, costs, and overtime for the maintenance staff must all be minimized [Marinescu, 2011].

      Before we formally define MO-CoGs, we first define the corresponding single-objective problem, i.e., coordination graphs (CoGs) [Guestrin et al., 2002, Kok and Vlassis, 2004]. In the context of coordination graphs, the notion of reward is typically referred to as payoff in the literature. Payoff is usually denoted u (for utility). We adopt this terminology and notation.

      • D = {1,…, n} is the set of n agents,

      • A = Ai × … × An is the joint action space: the Cartesian product of the finite action spaces of all agents. A joint action is thus a tuple containing an action for each agent a = 〈a1,…, an〉, and

      • U = {u1,…, uρ} is the set of ρ scalar local payoff functions, each of which has limited scope, i.e., it depends on only a subset of the agents. The total team payoff is the sum of the local payoffs: Image.

      In order to ensure that the coordination graph is fully cooperative, all agents share the payoff function u(a). We abuse the notation e both to index a local payoff function ue and to denote the subset of agents in its scope; ae is thus a local joint action, i.e., a joint action of this subset of agents. The decomposition of u(a) into local payoff functions can be represented as a factor graph [Bishop, 2006] (Figure 2.1); a bipartite graph containing two types of vertices: agents (variables) and local payoff functions (factors), with edges connecting local payoff functions to the agents in their scope.

      The main challenge in a CoG is that the size of the joint action space A grows exponentially with the number of agents. It thus quickly becomes intractable to enumerate all joint actions and their associated payoffs. Key to solving CoGs is therefore to exploit loose couplings between agents, i.e., each agent’s behavior directly affects only a subset of the other agents.

      Figure 2.1: A CoG with three agents and two local payoff functions. The factor graph illustrates the loose couplings that result from the decomposition into local payoff functions. In particular, each agent’s choice of action directly depends only on those of its immediate neighbors, e.g., once agent 1 knows agent 2’s action, it can choose its own action without considering agent 3.

      Figure 2.1 shows the factor graph of an example CoG in which the team payoff function decomposes into two local payoff functions, each with two agents in scope:

Image

      The local payoff functions are defined in Table 2.1. We use this CoG as a running example throughout this book. The local payoff functions, with their limited scopes, encode the loose couplings: each agent can only directly affect another agent when they share, i.e., are both in the scope of, a local payoff function. For example, if we fix the action for agent 2 to be 2, then agents 1 and 3 can decide upon their optimal actions independently, as they do not directly affect each other.

Скачать книгу