The Big R-Book. Philippe J. S. De Brouwer

Чтение книги онлайн.

Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 18

The Big R-Book - Philippe J. S. De Brouwer

Скачать книгу

book does a better job here than BernardMarr's publication: Marr (2016): “Key Business Analytics, the 60+ business analysis tool every manager needs to know.” This book will list you all words that some managers might use and what it means, without any of the mathematics nor any or the programming behind. I warmly recommend keeping this book next to ours. Whenever someone comes up with a term like “customer churn analytics” for example, you can use Bernard's book to find out what it actually means and then turn to ours to “get your hands dirty” and actually do it.

       If you are only interested in statistical learning and modelling, you will find the following booksmore focused: Hastie, Tibshirani, and Friedman (2009) or also James,Witten, Hastie, and Tibshirani (2013) who also uses R.

       A more in-depth introduction to AI can be found in Russell and Norvig (2016).

       Data science ismore elaborately treated in Baesens (2014) and the recent book by Wickham and Grolemund (2016) that provides an excellent introduction to R and data science in general. This last book is a great add-on to this book as it focussesmore on the data-aspects (but less on the statistical learning part). We also focus more on the practical aspects and real data problems in corporate environment.

      A book that comes close to ours in purpose is the book that my friend professor Bart Baetens has compiled “Analytics in a Big Data World, the Essential guide to data science and its applications”: Baesens (2014). If the mathematics, programming, and R itself scare you in this book, then Bart's book is for you. Bart's book covers different methods, but above all, for the reader, it is sufficient to be able to use a spreadsheet to do some basic calculations. Therefore, it will not help you to tackle big data nor programming a neural network yourself, but you will understand very well what it means and how things work.

      Another book that might work well if the maths in this one are prohibitive to you is Provost and Fawcett (2013), it will give you some insight in what the statistical learning is and how it works, but will not prepare you to use it on real data.

      Summarizing, I suggest you buy next to this book also Marr (2016) and Baesens (2014). This will provide you a complete chain from business and buzzwords (Bernard's book) over understanding what modelling is and what practical issues one will encounter (Bart's book) to implementing this in a corporate setting and solve the practical problems of a data scientist and modeller on sizeable data (this book).

      In a nutshell, this book does it all, is gentle on theoretical foundations and aims to be a one-stop shop to show the big picture, learn all those things and actually apply it. It aims to serve as a basis when later picking up more advanced books in certain narrow areas. This book will take you on a journey of working with data in a real company, and hence, it will discuss also practical problems such as people filling in forms or extracting data from a SQL database.

      In some way, this book can also be seen as a celebration of FOSS (Free and Open Source Software). We proudly mention that for this book no commercial software was used at all. The operating systemis Linux, the windows manager Fluxbox (sometimes LXDE or KDE),Kile and vi helped the editing process, Okular displayed the PDF-file, even the database servers and Hadoop/Spark are FOSS …and of course R and LATEX provided the icing on the cake. FOSS makes this world a more inclusive place as it makes technology more attainable in poorer places on this world.

       FOSS

      Hence, we extend a warm thanks to all people that spend somuch time to contributing to free software.

      Writing a book that is so eclectic and holds so many information would not have been possible without tremendous support from so many people: mentors, family, colleagues, and ex-colleagues at work or at universities. This book is in the first place a condensation of a few decades of interesting work in asset management and banking and mixes things that I have learned in C-level jobs and more technical assignments.

      I thank the colleagues of the faculties of applied mathematics at the AGH University of Science and Technology, the faculty of mathematics of the Jagiellonian University of Krakow, and the colleagues of HSBC for the many stimulating discussions and shared insights in mathematical modelling and machine learning.

      To the MBA program of the Cracovian Business School, the University of Warsaw, and to the many leaders that marked my journey, I am indebted for the business insight, stakeholder management and commercial wit that make this book complete.

      A special thanks goes to Piotr Kowalczyk, FRM and Dr. Grzegorz Goryl, PRM, for reading large chunks of this book and providing detailed suggestions. I am also grateful for the general remarks and suggestions from Dr. Jerzy Dzieża, faculty of applied mathematics at the AGH University of Science and Technology of Krakow and the fruitful discussions with Dr. Tadeusz Czernik, from the University of Economics of Katowice and also SeniorManager at HSBC, Independent Model Review, Krakow.

      This book would not be what it is now without the many years of experience, the stimulating discussions with somany friends, and in particularmy wife, Joanna De Brouwerwho encouraged me to move from London in order to work for HSBC in Krakow, Poland. Somehow, I feel that I should thank the city council and all the people for the wonderful and dynamic environment that attracts so many new service centres and that makes the ones that already had selected forKrakow grow their successful investments. This dynamic environment has certainly been an important stimulating factor in writing this book.

      However, nothing would have been possible without the devotion and support of my family: my wife Joanna, both children,Amelia and Maximilian, were wonderful and are a constant source of inspiration and support.

      Finally, I would like to thank the thousands of people who contribute to free and open source software, people that spend thousands of hours to create and improve software that others can use for free. I profoundly believe that these selfless acts make this world a better and more inclusive place, because they make computers, software, and studying more accessible for the less fortunate.

      A special honorary mentioning should go to the people that have built Linux, LATEX, R, and the ecosystems around each of them as well as the companies that contribute to those projects, such as Microsoft that has embraced R and RStudio that enhances R and never fails to share the fruits of their efforts with the larger community.

PART I Introduction

      You have certainly heard the words: “data is the new oil,” and you probably wondered “are we indeed on the verge of a newera of innovation andwealth creation or …is this just hype andwill it blow over soon enough?”

      Since our ancestors left the trees about 6 million years ago,we roamed theAfrican steppes and we evolved a more upright position and limbs better suited for walking than climbing. However, for about 4million years physiological changes did not include a larger brain. It is only in the last million years that we gradually evolved a more potent frontal lobe capable of abstract and logical thinking.

      The

Скачать книгу