The Creativity Code. Marcus du Sautoy
Чтение книги онлайн.
Читать онлайн книгу The Creativity Code - Marcus du Sautoy страница 12
To be able to articulate clearly how an algorithm works you need a language that allows you to talk about a number without specifying what that number is. We already saw it at work in explaining how Euclid’s algorithm worked. We gave names to the numbers that we were trying to analyse: N and M. These variables can represent any number. The power of this new linguistic take on mathematics meant that it allowed mathematicians to understand the grammar that underlies the way that numbers work. You didn’t have to talk about particular examples where the method worked. This new language of algebra provided a way to explain the patterns that lie behind the behaviour of numbers. A bit like a code for running a program, it shows why it would work whatever numbers you chose, the third criterion in our conditions for a good algorithm.
Algorithms have become the currency of our era because they are perfect fodder for computers. An algorithm exploits the pattern underlying the way we solve a problem to guide us to a solution. The computer doesn’t need to think. It just follows the instructions encoded in the algorithm again and again, and, as if by magic, out pops the answer you were looking for.
Desert island algorithm
One of the most extraordinary algorithms of the modern age is the one that helps millions of us navigate the internet every day. If I were cast away on a desert island and could only take one algorithm with me, I’d probably choose the one that drives Google. (Not that it would be much use, as I’d be unlikely to have an internet connection.)
In the early days of the internet (we’re talking the early 1990s) there was a directory that listed all of the existing websites. In 1994 there were only 3000 of them. The internet was small enough for you to pretty easily thumb through and find what you were looking for. Things have changed quite a bit since then. When I started writing this paragraph there were 1,267,084,131 websites live on the internet. A few sentences later that number has gone up to 1,267,085,440. (You can check the current status here: http://www.internetlivestats.com/.)
How does Google figure out exactly which one of the billion websites to recommend? Mary Ashwood, an 86-year-old granny from Wigan, was careful to send her requests with a courteous ‘please’ and ‘thank you’, perhaps imagining an industrious group of interns on the other end sifting through the endless requests. When her grandson Ben opened her laptop and found ‘Please translate these roman numerals mcmxcviii thank you’, he couldn’t resist tweeting the world about his nan’s misconception. He got a shock when someone at Google replied with the following tweet:
Dearest Ben’s Nan.
Hope you’re well.
In a world of billions of Searches, yours made us smile.
Oh, and it’s 1998.
Thank YOU
Ben’s Nan brought out the human in Google on this occasion, but there is no way any company could respond personally to the million searches Google receives every fifteen seconds. So if it isn’t magic Google elves scouring the internet, how does Google succeed in so spectacularly locating the answers you want?
It all comes down to the power and beauty of the algorithm Larry Page and Sergey Brin cooked up in their dorm rooms at Stanford in 1996. They originally wanted to call their new algorithm ‘Backrub’, but eventually settled instead on ‘Google’, inspired by the mathematical number for one followed by 100 zeros, which is known as a googol. Their mission was to find a way to rank pages on the internet to help navigate this growing database, so a huge number seemed like a cool name.
It isn’t that there weren’t other algorithms out there being used to do the same thing, but these were pretty simple in their conception. If you wanted to find out more about ‘the polite granny and Google’, existing algorithms would have identified all of the pages with these words and listed them in order, putting the websites with the most occurrences of the search terms up at the top.
That’s OK but easily hackable: any florist who sticks into their webpage’s meta-data the words ‘Mother’s Day Flowers’ a thousand times will shoot to the top of every son or daughter’s search. You want a search engine that can’t easily be pushed around by savvy web designers. So how can you come up with an unbiased measure of the importance of a website? And how can you find out which sites you can ignore?
Page and Brin struck on the clever idea that if a website has many links pointing to it, then those other sites are signalling that it is worth visiting. The idea is to democratise the measure of a website’s worth by letting other websites vote for who they think is important. But, again, this could be hacked. I just need to set up a thousand artificial websites linking to my florist’s website and it will bump the site up the list. To prevent this, they decided to give more weight to a vote that came from a website that itself commanded respect.
This still left them with a challenge: how do you rank the importance of one site over another? Take this mini-network as an example.
We want to start by giving each site equal weight. Let’s think of the websites as buckets; we’ll give each site eight balls to indicate that they have equal rank. Now the websites have to give their balls to the sites they link to. If they link to more than one site, then they will share their balls equally. Since website A links to both website B and website C, for example, it will give 4 balls to each site. Website B, however, has decided to link only to website C, putting all eight of its balls into website C’s bucket.
After the first distribution, website C comes out looking very strong. But we need to keep repeating the process because website A will be boosted by the fact that it is being linked to by the now high-ranking website C. The table below shows how the balls move around as we iterate the process.
At the moment, this does not seem to be a particularly good algorithm. It appears not to stabilise and is rather inefficient, failing two of our criteria for the ideal algorithm. Page and Brin’s great insight was to realise that they needed to find a way to assign the balls by looking at the connectivity of the network. It turned out they’d been taught a clever trick in their university mathematics course that worked out the correct distribution in one step.
The trick starts by constructing a matrix which records the way that the balls are redistributed among the websites. The first column of the matrix records the proportion going from website A to the other websites. In this case 0.5 goes to website B and 0.5 to website C. The matrix of redistribution is therefore given by the following matrix: