Bots. Nick Monaco
Чтение книги онлайн.
Читать онлайн книгу Bots - Nick Monaco страница 8
The arc of bot usage and evolution in IRC is similar to that of Usenet. At first, bots played an infrastructural role; then, tech-savvy users began to entertain themselves by building their own bots for fun and nefarious users began using bots as a disruptive tool; in response, annoyed server runners and white-hat bot-builders in the community built new bots to solve the bot problems (Leonard, 1997; Ohno, 2018).
Just as with Usenet, early bots in IRC channels played an infrastructural role, helping with basic routine maintenance tasks. For instance, the initial design of IRC required at least one human user to be logged into a server (often called a “channel”) for it to be available to join. If no users were logged into an IRC server, the server would close and cease to exist. Eventually, “Eggdrop” bots were created to solve this problem. Users deployed these bots to stay logged into IRC servers at all times, keeping channels open even when all other human users were logged out (such as at night, when they were sleeping). Bots were easy to build in the IRC framework, and users thus quickly began designing other new bots with different purposes: bots that would say hello to newcomers in the chat, spellcheck typing, or allow an interface for users to play games like Jeopardy! or HuntTheWumpus in IRC.
Given the ease of developing bots in IRC and the technical skill of many early users, this environment was the perfect incubator for bot evolution. Good and bad IRC bots proliferated in the years to come. For example, Eggdrop bots became more useful, not only keeping IRC channels open when no human users were logged in but also managing permissions on IRC channels. On the malicious side, hackers and troublemakers, often working in groups, would use collidebots and clonebots to hijack IRC channels by knocking human users off of them, and annoybots began flooding channels with text, making normal conversation impossible (Abu Rajab et al., 2006; Leonard, 1997). In response, other users designed channel-protection bots to protect the IRC channels from annoybots. In IRC, bots were both heroic helpers and hacker villains – digital Lokis that played both roles. This dual nature of bots persists to this day on the modern internet on platforms like Reddit, where both play helpful and contested roles on the platform (Massanari, 2016).
Bots and online gaming on MUD environments
In addition to Usenet and IRC, computer games were also a hotbed of early bot development. From 1979 on, chatbots were relatively popular in online gaming environments known as MUDs (“multi-user domains” or “multi-user dungeons”). MUDs gained their name from the fact that multiple users could log into a website at the same time and play the same game. Unlike console games, MUDs were text-based and entirely without graphics,5 due to early computers’ limited memory and processing power, making them an ideal environment for typed bot interaction. These games often had automated non-player characters (NPCs) that helped move gameplay along, providing players with necessary information and services. MUDs remained popular into the 1990s, and users increasingly programmed and forked their own bots as the genre matured (Abokhodair et al., 2015; Leonard, 1997).
ELIZA, the original chatbot from the 1960s, served as a prototype and inspiration for most MUD chatbots. One of the big 1990s breakthroughs for MUD bots was a chatbot named Julia. Julia was part of an entire family of bots called the Maas-Neotek Family, written by Carnegie Mellon University graduate student Michael “Fuzzy” Mauldin for TinyMUD environments. Julia, a chatbot based on ELIZA’s code, inspired MUD-enthusiasts to build on the publicly available code from Maas-Neotek bots, to hack together their own bot variants (Foner, 1993; Julia’s Home Page, 1994; Leonard, 1997, pp. 40–42). Bots became legion in TinyMUDs – at one point, a popular TinyMUD that simulated a virtual city, PointMOOt, had a population that was over 50 percent bots (Leonard, 1996) – which was an essential part of the appeal for both players and developers.
Bots and the World Wide Web
As we have seen, early internet environments such as Usenet, IRC, and MUDs were the first wave of bot development, driving bot evolution from the 1970s through the 1990s. The next stage of bot advancement came with the advent of the World Wide Web in 1991.
Crawlers, web-indexing bots
The World Wide Web became widely available in the early 1990s, growing exponentially more complex and difficult to navigate as it gained more and more users. Gradually, people began to realize that there was simply too much information on the web for humans to navigate easily. It was clear to companies and researchers at the forefront of computer research that they needed to develop a tool to help humans make sense of the vast web. Bots came to fill this void, playing a new infrastructural role as an intermediary between humans and the internet itself. Computer programs were developed to move from webpage to webpage and analyze and organize the content (“indexing”) so that it was easily searchable. These bots were often called “crawlers” or “spiders,”6 since they “crawled” across the web to gather information. Without bots visiting sites on the internet and taking notes on their content, humans simply couldn’t know what websites were online. This fact is as true today as it was back then.
The basic logic that drives crawlers is very simple. At their base, websites are text files. These text files are written using hypertext markup language (HTML), a standardized format that is the primary base language of all websites.7 HTML documents can be accessed with an HTTP call. Users submit an HTTP call every time they type a webpage’s URL into a browser and press enter or click on a link on the internet. One of the core features of HTML – the one that enables the World Wide Web to exist as a network of HTML pages – is the ability to embed hypertext, or “links,” to outside documents within a webpage. Crawler bots work by accessing a website through an HTTP call, collecting the hyperlinks embedded within the website’s HTML code, then visiting those hyperlinks using another HTTP call. This process is repeated over and over again to map and catalogue web content. Along the way, crawler bots can be programmed to download the HTML underneath every website, or process facts about those sites in real time (such as whether it appears to be a news outlet or e-commerce site).
Initially, these bots crawled the web and took notes on all the URLs they visited, assembling this information in a database known as a “Web Directory” – a place users could visit to see what websites existed on the web and what they were about. Quickly, advertisers and investors poured funds into these proto-search engines, realizing how many eyes would see them per day as the internet continued to grow (Leonard, 1996).
Though Google eventually became the dominant search engine for navigating the web, the 1990s saw a host of corporate and individual search engine start-ups, all of which used bots to index the web. The first of these was Matthew Grey’s World Wide Web Wanderer in 1993. The next year, Brian Pinkerton wrote WebCrawler, and Michael Mauldin created Lycos (Latin for “wolf spider”), both of which were even more powerful spiders than the World Wide Web Wanderer. Other search engines, like AltaVista and (later) Google, also employed bots to perfect the art of searching for8 and organizing information on the web9 (Indiana University Knowledge Base, 2020; Leonard, 1997, pp. 121–124). The indexable internet – that is, publicly available websites on the World Wide Web that allow themselves to be visited by crawler bots and be listed in search engine results – is known as the “clear web.”10
Spambots and the development of the Robot Exclusion Standard
We have already seen that bots can be used for either good or bad ends, and World Wide Web bots were no different. Originally used as a solution to the problem of organizing and trawling through vast amounts of information on the World Wide Web, bots were quickly adapted for more devious purposes. As the 1990s went on and the World Wide Web (and other online communities like Usenet and IRC) continued to grow, entrepreneurial technologists realized that there was a captive audience on the other end of the terminal. This insight led to the birth