The Big R-Book. Philippe J. S. De Brouwer

Чтение книги онлайн.

Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 21

The Big R-Book - Philippe J. S. De Brouwer

Скачать книгу

and hence the one to survive. This is not biological evolution, but guided evolution. We do not have to rely on a huge number of companies with random variations, but we can use data to see trends and react to them.

      The role of the data-analyst in any company cannot be overestimated. It is the reader of the book on whose shoulders rest not only to read those patterns from the data but also to convince decision makers to act in this fact-based insight.

      Till now we discussed the role of the data scientists and actions that they would take. But how does it look from the point of view of data itself?

      Using that scientific method for data-science, the most important thing is probably to make sure that the one understands the data verywell. Data in itself is meaningless. For example, 930 is just a number. It could be anything: fromthe age ofAdamath inGenesis, to the price of chair or the code to unlock your bike-chain. It could be a time and 930 could mean “9:30” (assume “am” if your time-zone habits require so). Knowing that interpretation, the numbers become information, but we cannot understand this information tillwe knowwhat it means (it could be the time Iwoke up – after a long party, the time of a plane to catch, a meeting at work, etc.).We can only understand the data if we know that it is a bus schedule of the bus “843-my-route-to-work” for example. This understanding, together with the insight that this bus always runs 15 minutes late and my will to catch the bus can lead to action: to go out and wait for that bus and get on it.

       data

       information

       insight action

      This simple example shows us how the data cycle in any company or within any discipline should work. We first have a question, such as for example “to which customers can we lend money without pushing them into a debt-spiral.” Then one will collect data (from own systems or credit bureau). This data can then be used to create a model that allows us to reduce the complexity of all observations to the explaining variables only: a projection in a space of lower dimensions. That model helps us to get the insight from the data and once put in production allows us to decide on the right action for each credit application.

      This institution will end up with a better credit approval process, where less loss events occur. That is the role of data-science: to drive companies to the creation of more sustainable wealth in a future where all have a place and plentifulness.

Schematic illustration of the role of data-science in a company is to take data and turn it into actionable insight. At every step – apart from technical issues that will be discussed in this book – it is of utmost importance to understand the context and limitations of data, business, regulations and customers.

      1 1 The term “singularity” refers to the point in time where an intelligent system would be able to produce an even more intelligent system that also can create another system that is a certain percentage smarter in a time that is a certain percentage faster. This inevitably leads to exponentially increasing creating of better systems. This time series converges to one point in time, where “intelligence” of themachine would hit its absolute limits. First, record of the subject is by Stanislaw Ulam in a discussion with John Von Neuman in the 1950s and an early and convincing publication is Good (1966). It is also elaborately explored in Kurzweil (2010).

      This book is formatted with LATEX. The people who know this markup language will have high expectations for the consistency and format of this book. As you can expect there is

      1 a table of contents at the start;

      2 an index at the end, from page 1103;

      3 a bibliography on page 1088;

      4 as well as a list of all short-hands and symbols used on page 1117.

      # This is code 1+pi ## [1] 4.141593 Sys.getenv(c("EDITOR","USER","SHELL", "LC_NUMERIC")) ## EDITOR USER SHELL LC_NUMERIC ## "vi" "root" "/bin/bash" "pl_PL.UTF-8"

      As you can see, the code is highlighted, that means that not all things have the same colour and it is easier to read and understand what is going on. The first line is a “comment” that means that R will not do anything with it, it is for human use only. The next line is a simple sum. In your R terminal, this what you will type or copy after the > prompt. It will rather look like this:

      > # This is code > 1+pi [1] 4.141593 > Sys.getenv(c("EDITOR","USER","SHELL","XDG_SESSION_TYPE") EDITOR USER SHELL LC_NUMERIC "vi" "philippe" "/bin/bash" "pl_PL.UTF-8" >

      The function Sys.getenv() returns us all environment variables if no parameter is given. If

Скачать книгу