Bots. Nick Monaco

Чтение книги онлайн.

Читать онлайн книгу Bots - Nick Monaco страница 9

Bots - Nick Monaco

Скачать книгу

and advertisements at scale.

      Usenet was a precursor to more widespread spambot swarms on the internet at large, especially email (Ohno, 2018). Incidents like the botwars on Usenet news groups and IRC servers had, by the late 1990s, made it all too clear that bots would not be only a positive force on the internet. Negative uses of bots (spreading spam, crashing servers, denying content and services to humans, and posting irrelevant content en masse, just to name a few) could easily cause great harm – perhaps most damagingly, crawling websites to gather private or sensitive information.

      To solve the problem of bots crawling sensitive websites, a Dutch engineer named Martijn Koster developed the Robot Exclusion Standard11 (Koster, 1994, 1996). The Robot Exclusion Standard (RES) is a simple convention that functions as a digital “Do Not Enter” sign. Every active domain on the internet has a “robots.txt” file that explains what content the site allows bots to access. Some sites allow bots to access any part of their domain, others allow access to some (but not all) parts of the website, and still others disallow bot access altogether. Any site’s robots.txt file can be found by navigating to the website and adding “/robots.txt” to the end of the URL. For instance, you can access Facebook’s instructions for crawler bots at facebook.com/robots.txt. As you would expect, this file disallows nearly all forms of crawling on Facebook’s platform, since this would violate users’ privacy, as well as the platform’s terms of service.

      Other spambots did not even follow the letter of the law. For example, a bot known as ActiveAgent ignored the RES altogether, scraping any website it could find looking for email addresses, regardless of the site’s policies on bot access. The anonymous developer behind ActiveAgent had a different business model, though. Rather than selling the email addresses it collected, it sold its source code to aspiring spammers for $100. Buyers could then modify this code for their own purposes, sending out spam emails with whatever message or product they wanted (Leonard, 1997, pp. 140–144). Thanks in part to malicious developers like those behind ActiveAgent, new spamming techniques quickly multiplied as the web grew. Today, spambots and spamming techniques are still evolving and thriving. Estimates vary greatly, but some firms estimate that as much as 84 percent of all email is spam, as of October 2020 (Cisco Talos Intelligence, 2020).

      Clearly, the RES is not an absolute means of shutting down crawler bot activity online – it’s an honor system that presumes good faith on the part of bot developers, who must actively decide to make each bot honor the convention and encode these values into the bot’s programming. Despite these imperfections, the RES has seen success online and, for that reason, it continues to underlie bot governance online to this day. It is an efficient way to let bot designers know when they are violating a site’s terms of service and possibly the law.

      The user-friendly and user-centric web 2.0 had its own problems. Just as advertisers had realized in the 1990s that the World Wide Web was a new revolutionary opportunity for marketing (and sometimes spam), in the 2000s governments and activists began to realize that the new incarnation of the web was a powerful place to spread political messages. In this environment, political bots, astroturfing, and computational propaganda quickly proliferated, though it would take decades for the wider public to realize it (Zi et al., 2010). We’ll examine these dynamics in greater detail and depth in our chapters on political bots and commercial bots.

      One problem with understanding bots is the term’s ambiguity: the word has several distinct (though often overlapping) meanings. This makes it particularly difficult for policymakers trying to write sensible technology legislation. Indeed, in the words of two communications scholars, the “multiple forms of ambiguity are responsible for much of the complexity underlying contemporary bot policy” (Gorwa & Guilbeault, 2018).

      People have been trying to define what bots are since the 1990s, and multiple bot “typologies” have been proposed by journalists, researchers, and academic experts seeking to organize and categorize the profusion of different bots. These typologies vary from informal groupings to more formal taxonomies (Gorwa & Guilbeault, 2018; Leonard, 1997; Maus, 2017; Stieglitz et al., 2017), and some limit themselves to specific subtypes of bots, such as news bots or political bots (DiResta et al., 2017; Lokot & Diakopoulos, 2016). However, the rapid pace of bot evolution means that these taxonomies can quickly break or become out-of-date. Nonetheless, these efforts are extremely important and provide us with footholds with which to navigate the nascent and ever-evolving landscape of bots and their uses, capabilities, and characteristics.

Скачать книгу