Bots. Nick Monaco
Чтение книги онлайн.
Читать онлайн книгу Bots - Nick Monaco страница 9
One of the very first spambots was on Usenet. In April 1994, two lawyers, Laurence Canter and Martha Siegel, contracted a programmer to help promote an advert for their law firm’s assistance in the US Green Card Lottery. The programmer decided to use automation to reach as many users as possible. His bot – considered the first spambot on the modern internet – posted the ad to 6,000 newsgroups in under ninety minutes. The incident elicited a strongly negative response from the Usenet community and, in response, one user built a cancelbot that removed all of the spambot’s posts from targeted newsgroups (Leonard, 1997, pp. 165–167).
Usenet was a precursor to more widespread spambot swarms on the internet at large, especially email (Ohno, 2018). Incidents like the botwars on Usenet news groups and IRC servers had, by the late 1990s, made it all too clear that bots would not be only a positive force on the internet. Negative uses of bots (spreading spam, crashing servers, denying content and services to humans, and posting irrelevant content en masse, just to name a few) could easily cause great harm – perhaps most damagingly, crawling websites to gather private or sensitive information.
To solve the problem of bots crawling sensitive websites, a Dutch engineer named Martijn Koster developed the Robot Exclusion Standard11 (Koster, 1994, 1996). The Robot Exclusion Standard (RES) is a simple convention that functions as a digital “Do Not Enter” sign. Every active domain on the internet has a “robots.txt” file that explains what content the site allows bots to access. Some sites allow bots to access any part of their domain, others allow access to some (but not all) parts of the website, and still others disallow bot access altogether. Any site’s robots.txt file can be found by navigating to the website and adding “/robots.txt” to the end of the URL. For instance, you can access Facebook’s instructions for crawler bots at facebook.com/robots.txt. As you would expect, this file disallows nearly all forms of crawling on Facebook’s platform, since this would violate users’ privacy, as well as the platform’s terms of service.
The late 1990s saw several high-profile examples of controversial bots that followed these standards, while arguably violating their intentions, and others who proudly flouted them. RoverBot, a crawler that was created in 1996, was one of these controversial bots. RoverBot was a crawler that retrieved a set of websites relating to a pre-specified topic and scraped email addresses from them. The company that built RoverBot then sold these lists of email addresses to paying customers, who used them to send out spam advertisements. While RoverBot certainly had its detractors, the firm behind it insisted that it followed rules (such as the RES) while scraping the web.
Other spambots did not even follow the letter of the law. For example, a bot known as ActiveAgent ignored the RES altogether, scraping any website it could find looking for email addresses, regardless of the site’s policies on bot access. The anonymous developer behind ActiveAgent had a different business model, though. Rather than selling the email addresses it collected, it sold its source code to aspiring spammers for $100. Buyers could then modify this code for their own purposes, sending out spam emails with whatever message or product they wanted (Leonard, 1997, pp. 140–144). Thanks in part to malicious developers like those behind ActiveAgent, new spamming techniques quickly multiplied as the web grew. Today, spambots and spamming techniques are still evolving and thriving. Estimates vary greatly, but some firms estimate that as much as 84 percent of all email is spam, as of October 2020 (Cisco Talos Intelligence, 2020).
Clearly, the RES is not an absolute means of shutting down crawler bot activity online – it’s an honor system that presumes good faith on the part of bot developers, who must actively decide to make each bot honor the convention and encode these values into the bot’s programming. Despite these imperfections, the RES has seen success online and, for that reason, it continues to underlie bot governance online to this day. It is an efficient way to let bot designers know when they are violating a site’s terms of service and possibly the law.
Social media and the dawn of social bots
Social media supercharged bot evolution in the late 2000s. During this period, the cost of broadband internet declined, connectivity increased, and computing power grew. A growing number of people began to spend more and more time on social media sites, producing their own content. The entire web began to evolve, shifting from a slow, company-driven, rocky experience to a smoother, sleeker, and user-friendly one in which user-generated content took the foreground. This new user-centric version of the internet came to be known as the “web 2.0” (O’Reilly, 2005).
The user-friendly and user-centric web 2.0 had its own problems. Just as advertisers had realized in the 1990s that the World Wide Web was a new revolutionary opportunity for marketing (and sometimes spam), in the 2000s governments and activists began to realize that the new incarnation of the web was a powerful place to spread political messages. In this environment, political bots, astroturfing, and computational propaganda quickly proliferated, though it would take decades for the wider public to realize it (Zi et al., 2010). We’ll examine these dynamics in greater detail and depth in our chapters on political bots and commercial bots.
In every case, online environments that are welcoming to bot innovations – Usenet, IRC, or MUD-gaming platforms in the late 1980s and early 1990s, or Twitter in the late aughts – have consistently been strong drivers of bot evolution. The design of these environments, called their platform architecture, is just as important as their policies on bots. In MUD gaming environments, users could easily access and modify code to build bot characters in the game; in IRC and Usenet, bots were a necessary infrastructural part of interacting with the platform, and users often enjoyed building their own. Similarly, early 2000s virtual worlds like Second Life were designed in such a way that bot development became more accessible for average users (Lugrin et al., 2008). Now, perhaps most significantly for the era of social media, Twitter’s infrastructure is extremely welcoming to bots (and was even more so in the platform’s early days) (Ferrara et al., 2014; Zi et al., 2010). Twitter’s Application Programming Interface (API) makes building and connecting bots to the platform easy, and its infrastructure has arguably done more to democratize bot development and drive their evolution than any other platform or website in bot history.
Different Types of Bots
One problem with understanding bots is the term’s ambiguity: the word has several distinct (though often overlapping) meanings. This makes it particularly difficult for policymakers trying to write sensible technology legislation. Indeed, in the words of two communications scholars, the “multiple forms of ambiguity are responsible for much of the complexity underlying contemporary bot policy” (Gorwa & Guilbeault, 2018).
People have been trying to define what bots are since the 1990s, and multiple bot “typologies” have been proposed by journalists, researchers, and academic experts seeking to organize and categorize the profusion of different bots. These typologies vary from informal groupings to more formal taxonomies (Gorwa & Guilbeault, 2018; Leonard, 1997; Maus, 2017; Stieglitz et al., 2017), and some limit themselves to specific subtypes of bots, such as news bots or political bots (DiResta et al., 2017; Lokot & Diakopoulos, 2016). However, the rapid pace of bot evolution means that these taxonomies can quickly break or become out-of-date. Nonetheless, these efforts are extremely important and provide us with footholds with which to navigate the nascent and ever-evolving landscape of bots and their uses, capabilities, and characteristics.
Recognizing the rapidly changing landscape in bot and disinformation research, the bot categories we discuss here are the most important ones at the time we are writing this book. These categories have largely remained relevant for understanding and analyzing bot behavior in the past three decades. This is not an exhaustive list, but it is a useful introduction to the field. Armed with these categories, the reader will be able