Intelligent Security Management and Control in the IoT. Mohamed-Aymen Chalouf

Чтение книги онлайн.

Читать онлайн книгу Intelligent Security Management and Control in the IoT - Mohamed-Aymen Chalouf страница 19

Intelligent Security Management and Control in the IoT - Mohamed-Aymen Chalouf

Скачать книгу

by reversing equations [2.1] and [2.3]. This gives a very noisy measurement, but one which is, nevertheless, useful for IoT blocking as we demonstrated in Bouzouita et al. (2019).

      The difficulty of observing the system state, described in section 2.4, has led us to consider strategies making it possible to deduce the blocking factor even in the presence of very noisy measurements.

      It is in this sense that we relied on deep learning techniques, which demonstrated great effectiveness in automatically extracting characteristics of system “features” in the presence of data tainted with noise or even of incomplete data (Rolnick et al. 2017).

      Given the lack of data, we have considered the class of reinforcement learning techniques.

      More particularly, we considered the “Twin Delayed Deep Deterministic policy gradient algorithm” (TD3) technique, which can tackle a continuous action space, and which has shown greater effectiveness in learning speed and in performance than existing approaches (Fujimoto et al. 2018).

      We formulate, in what follows, the problem of access in the IoT as a reinforcement learning problem, in which an agent finds iteratively a sub-optimal blocking factor, making it possible to reduce the access conflict.

      2.5.1. Formulating the problem

      In the problem of controlling access to the IoT, we define a discrete MDP, where the state, the action and the revenue are defined as follows:

       – The state: given that the number of terminals attempting access at a given instant k is unavailable, the state we are considering is based on measured estimates. Since a single measurement of this number is necessarily very noisy, we will consider a set of several measurements, which can better reveal the state present in the network. The state sk is defined as the vector , where H represents the measurement horizon.

       – The action: at each state, the agent must select the blocking factor p which will need to be considered by the IoT objects. This value is continuous and determinist in the problem that we consider, that is that the same state sk will always give the same action ak.

       – The revenue: this is a signal that the agent receives from the environment after the execution of an action. Thus, at stage k, the agent obtains a revenue rk as a consequence of the action ak that it carried out in state sk. This revenue will allow the agent to know the quality of the action executed, the objective of the agent being to maximize this revenue.

      The revenue is therefore maximal when the chosen action makes it possible to obtain a number of devices attempting access Image equal to the optimum Image. However, as the measurement Image is marred by noise, this impacts the measured revenue.

      The objective of such a system is to find the blocking probability, making it possible to maximize the average recompense, which amounts to reducing the distance between the measurements of the number of terminals attempting access and the optimum. To meet this objective, we rely on the TD3 algorithm.

      The TD3 algorithm is an actor-critic approach, where the actor is a network of neurons which decides the action to take in a particular state; the main network makes it possible to know the value of being in a state and to choose a particular action. TD3 makes it possible to resolve the question of over-evaluation in estimating the value (Thrun and Schwartz 1993) by introducing two critical networks and by taking the minimum between these two estimations. This approach is particularly beneficial in our case due to the inherent presence of measurement errors.

      2.5.2. Regulation system for arrivals

      The diagram in Figure 2.6 describes the system that makes it possible to control the number of attempts from IoT objects. This system is based on the diffusion of the blocking factor at the terminals, through the SIBs which are broadcasted, and more particularly through the Type14 SIB block, which makes it possible to diffuse the access blocking parameters (ETSI 2019).

      Following the reception of the blocking factor, the terminals wishing to carry out transmission execute the ACB, which allows them to pass to the following stages with a probability p, which is calculated by our TD3 based controller. These terminals can, consequently, attempt access by choosing a preamble at random from among the available preambles. Knowing the state of the preambles, the gNodeB can estimate the number of attempts made. This measure is very noisy, since the model given only makes it possible to estimate averages. We take an average estimate of the number of devices. We use a sliding average to do this.

      The controller, we have proposed, receives these measurements, augmented from the revenue, at the end of each preamble. The revenue obtained enables it to know the quality of the actions taken. These different data are placed in a memory of past experiences. This is a random sub-set of this memory that will enable it to learn robustly and to choose, subsequently, a new action.

      These different actions are repeated cyclically.

      Figure 2.6. System for regulating arrivals

Скачать книгу