Big Data in Practice. Marr Bernard
Чтение книги онлайн.
Читать онлайн книгу Big Data in Practice - Marr Bernard страница 5
CERN and Big Data have evolved together: CERN was one of the primary catalysts in the development of the Internet which brought about the Big Data age we live in today.
Distributed computing makes it possible to carry out tasks that are far beyond the capabilities of any one organization to complete alone.
Purcell, A. (2013) CERN on preparing for tomorrow’s big data, http://home.web.cern.ch/about/updates/2013/10/preparing-tomorrows-big-data
Darrow, B. (2013) Attacking CERN’s big data problem, https://gigaom.com/2013/09/18/attacking-cerns-big-data-problem/
O’Luanaigh, C. (2013) Exploration on the big data frontier, http://home.web.cern.ch/students-educators/updates/2013/05/exploration-big-data-frontier
Smith, T. (2015) Video on CERN’s big data, https://www.youtube.com/watch?v=j-0cUmUyb-Y
3
NETFLIX
How Netflix Used Big Data To Give Us The Programmes We Want
The streaming movie and TV service Netflix are said to account for one-third of peak-time Internet traffic in the US, and the service now have 65 million members in over 50 countries enjoying more than 100 million hours of TV shows and movies a day. Data from these millions of subscribers is collected and monitored in an attempt to understand our viewing habits. But Netflix’s data isn’t just “big” in the literal sense. It is the combination of this data with cutting-edge analytical techniques that makes Netflix a true Big Data company.
Legendary Hollywood screenwriter William Goldman said: “Nobody, nobody – not now, not ever – knows the least goddam thing about what is or isn’t going to work at the box office.”
He was speaking before the arrival of the Internet and Big Data and, since then, Netflix have been determined to prove him wrong by building a business around predicting exactly what we’ll enjoy watching.
A quick glance at Netflix’s jobs page is enough to give you an idea of how seriously data and analytics are taken. Specialists are recruited to join teams specifically skilled in applying analytical skills to particular business areas: personalization analytics, messaging analytics, content delivery analytics, device analytics … the list goes on. However, although Big Data is used across every aspect of the Netflix business, their holy grail has always been to predict what customers will enjoy watching. Big Data analytics is the fuel that fires the “recommendation engines” designed to serve this purpose.
Efforts here began back in 2006, when the company were still primarily a DVD-mailing business (streaming began a year later). They launched the Netflix Prize, offering $1 million to the group that could come up with the best algorithm for predicting how their customers would rate a movie based on their previous ratings. The winning entry was finally announced in 2009 and, although the algorithms are constantly revised and added to, the principles are still a key element of the recommendation engine.
At first, analysts were limited by the lack of information they had on their customers – only four data points (customer ID, movie ID, rating and the date the movie was watched) were available for analysis. As soon as streaming became the primary delivery method, many new data points on their customers became accessible. This new data enabled Netflix to build models to predict the perfect storm situation of customers consistently being served with movies they would enjoy. Happy customers, after all, are far more likely to continue their subscriptions.
Another central element to Netflix’s attempt to give us films we will enjoy is tagging. The company pay people to watch movies and then tag them with elements the movies contain. They will then suggest you watch other productions that were tagged similarly to those you enjoyed. This is where the sometimes unusual (and slightly robotic-sounding) “suggestions” come from: “In the mood for wacky teen comedy featuring a strong female lead?” It’s also the reason the service will sometimes (in fact, in my experience, often!) recommend I watch films that have been rated with only one or two stars. This may seem counterintuitive to their objective of showing me films I will enjoy. But what has happened is that the weighting of these ratings has been outweighed by the prediction that the content of the movie will appeal. In fact, Netflix have effectively defined nearly 80,000 new “micro-genres” of movie based on our viewing habits!
More recently, Netflix have moved towards positioning themselves as a content creator, not just a distribution method for movie studios and other networks. Their strategy here has also been firmly driven by their data – which showed that their subscribers had a voracious appetite for content directed by David Fincher and starring Kevin Spacey. After outbidding networks including HBO and ABC for the rights to House of Cards, they were so confident it fitted their predictive model for the “perfect TV show” that they bucked the convention of producing a pilot and immediately commissioned two seasons comprising 26 episodes. Every aspect of the production under the control of Netflix was informed by data – even the range of colours used on the cover image for the series was selected to draw viewers in.
The ultimate metric Netflix hope to improve is the number of hours customers spend using their service. You don’t really need statistics to tell you that viewers who don’t spend much time using the service are likely to feel they aren’t getting value for money from their subscriptions, and so may cancel their subscriptions. To this end, the way various factors affect the “quality of experience” is closely monitored and models are built to explore how this affects user behaviour. By collecting end-user data on how the physical location of the content affects the viewer’s experience, calculations about the placement of data can be made to ensure there is an optimal service to as many homes as possible.
Netflix’s letter to shareholders in April 2015 shows their Big Data strategy was paying off. They added 4.9 million new subscribers in Q1 2015, compared to four million in the same period in 2014. Netflix put much of this success down to their “ever-improving content”, including House of Cards and Orange is the New Black. This original content is driving new member acquisition and customer retention. In fact, 90 % of Netflix members have engaged with this original content. Obviously, their ability to predict what viewers will enjoy is a large part of this success.
And what about their ultimate metric: how many hours customers spend using the service? Well, in Q1 2015 alone, Netflix members streamed 10 billion hours of content. If Netflix’s Big Data strategy continues to evolve, that number is set to increase.
The recommendation algorithms and content decisions are fed by data on what titles customers watch, what time of day movies are watched, time spent selecting movies, how often playback is stopped (either by the user or owing to network limitations) and ratings given. In order to analyse quality of experience, Netflix collect data on delays caused by buffering (rebuffer rate) and bitrate (which affects the picture quality), as well as customer location.
Although their vast catalogue of movies and TV shows is hosted in the cloud on Amazon Web Services (AWS), it is also mirrored