Viegas et al., HICSS 2007

From ScribbleWiki: Analysis of Social Media

Jump to: navigation, search

Created and maintained by Sachin Agarwal (User:Sachina)

HomePage: Sachin Agarwal  Contact me

Related page: Wikipedia - an always growing Social Media

Viegas FB, Wattenberg M, Kriss J, van Ham F., Talk Before You Type: Coordination in Wikipedia, Proceedings of the 40th Annual Hawaii International Conference on System Sciences, 2007.

Contents

Talk Before You Type: Coordination in Wikipedia

The collaborative online encyclopedia called Wikipedia has a policy of letting anyone on the internet change/edit its articles. It has become an important source of information for both online and offline users. People have started using Wikipedia as references in media news stories, and students sometimes turn to Wikipedia as a source of learning materials. This extensive usage and considering wikipedia as an authoritative source of information has attracted the attention of researchers. This paper investigate how Wikipedia has evolved in recent years. This paper compare and contrast its findings with an earlier study [ Viégas et al., 2004 ] and discusses three main results of their findings. First, the community maintains a strong resilience to malicious editing, despite tremendous growth and high traffic. Second, the fastest growing areas of Wikipedia are devoted to coordination and organization. Finally, authors on a particular set of pages used to coordinate work, the “Talk” pages.

Compare and Contrast

Figure 1. History flow diagram showing edits made to the Abortion page until Aug. 2003. (Viegas et al, 2007)
Figure 1. History flow diagram showing edits made to the Abortion page until Aug. 2003. (Viegas et al, 2007)
Figure 2. History flow diagram showing edits made to the Abortion page until Oct. 2005. The edits shown in the 2003 image (on the left) are highlighted by a red ellipse here. (Viegas et al, 2007)
Figure 2. History flow diagram showing edits made to the Abortion page until Oct. 2005. The edits shown in the 2003 image (on the left) are highlighted by a red ellipse here. (Viegas et al, 2007)

Data from 2005

Authors downloaded a file from the Wikipedia site that included all pages (except for deleted pages), along with full revision histories, from the October 2005 English-language Wikipedia. They refer to this full data set as FULL05. (The data from May 2003 used in [ Viégas et al., 2004 ] is referred to as FULL03.) The data was imported into a MySQL database using tools provided by Wikipedia.

Figure 3. History flow diagram showing edits made to the Chocolate page until Aug. 2003. The presence of an edit war (the zigzag pattern on the right) is clearly visible in the diagram. (Viegas et al, 2007)
Figure 3. History flow diagram showing edits made to the Chocolate page until Aug. 2003. The presence of an edit war (the zigzag pattern on the right) is clearly visible in the diagram. (Viegas et al, 2007)

They created a 5% random sample of article pages, with full revision histories for each page in the sample. This slice of the data is referred to as SAMPLE05.

History Flow

Authors applied the history flow visualization application similar to [ Viégas et al., 2004 ]. This tool produces a graphical view of the revision history of an individual page, plotting revision sequence on the x-axis and using the y-axis to show how the contributions of different authors are added, deleted, and rearranged over time. This tool provide an overview of editing history. Figure 1 shows the diagram of the edits to the page on “Abortion” from the FULL03 data set. In contrast, Figure 2 shows the history flow diagram for the same page using the FULL05 data. The area circled in red corresponds to the data in Figure 1. Similarly, Figures 3 and 4 show the history flow diagrams for “Chocolate”. The daratic change in scale was evident. The other important observations were as follows. In the FULL03 data set, vandalism was evident (e.g., the spike in the Chocolate page). The page size continued to show a general upward trend with number of edits, with occasional sharp drops in size. One change was a drop in frequency of “edit wars,” i.e., long back-and-forth sequences of editors undoing each other’s changes. The authors hypothesize one possible reason for this as the voluntary adoption within the Wikipedia community of a “three revert rule”, which barres each member from making more than three reverts to a given page in a 24-hour period [ Three revert rule ].

Vandalism

Figure 5. 2005 Statistics (SAMPLE05) (Viegas et al, 2007)
Figure 5. 2005 Statistics (SAMPLE05) (Viegas et al, 2007)
Figure 6. Recalculated 2003 Statistics (FULL03) (Viegas et al, 2007)
Figure 6. Recalculated 2003 Statistics (FULL03) (Viegas et al, 2007)

One of the main results of [ Viégas et al., 2004 ] was that Wikipedia showed resilience towards malicious edits. Such edits (known as “vandalism” in Wikipedian terminology) were often corrected rapidly by members of the community. Sometimes, the old repair mechanisms have not been sufficient. For example, in 2004 the first page was protected to prevent changes [14], due to an intense level of vandalism (this page was “George W. Bush”). At the time of writing this paper, authors say that this page usually remains in a “protected” mode which allows changes only by users who have been registered for a certain length of time. (In the FULL05 data, a small minority (0.09%) of pages were marked as protected). In 2006 light restrictions were placed on the creation of pages to hold vandals in check. The analysis of the SAMPLE05 database shows that the basic fast-repair characteristics of Wikipedia remain strong. Figure 5 shows the results in a table. The median time to revert a “mass deletion” of a page was 2.9 minutes, and an “obscene” mass deletion was reverted in a median of 2 minutes. Thus for a large set of pages, the fast-repair mechanisms continue to function. These statistics are similar to the published results for FULL03 [ Viégas et al., 2004 ], where mass deletions were reverted in a median time of 2.8 minutes, and obscene mass deletions were reverted in a median time of 1.7 minutes.

Figure 4. History flow diagram showing edits made to the Chocolate page until Oct. 2005. There have been so many edits since mid 2003 that the entire diagram shown in the previous image has become very small (circulated in red), making it impossible to distinguish the pattern of the edit war. (Viegas et al, 2007)
Figure 4. History flow diagram showing edits made to the Chocolate page until Oct. 2005. There have been so many edits since mid 2003 that the entire diagram shown in the previous image has become very small (circulated in red), making it impossible to distinguish the pattern of the edit war. (Viegas et al, 2007)

Authors made two other comparisons with [ Viégas et al., 2004 ]. First, although the 2005 statistics were generally similar, we found a much higher median time between all edits on a given page: 726 minutes as opposed to 90 minutes. The reason for this difference is unclear to the authors. Second, authors recalculated the same statistics for articles before May 2003 that still existed in SAMPLE05. In spite of the fact that there was a built-in bias towards articles that survived for years, the recalculated statistics were generally similar to the original ones (again the median time between all edits was much higher).

How has Wikipedia grown?

Namespaces

Figure 7. List of all Wikipedia namespaces (Viegas et al, 2007)
Figure 7. List of all Wikipedia namespaces (Viegas et al, 2007)
Figure 8. Page Growth Factor per Namespace (Viegas et al, 2007)
Figure 8. Page Growth Factor per Namespace (Viegas et al, 2007)

When people think about Wikipedia, they tend to focus on the encyclopedia articles. The site is much more than its encyclopedic content. Wikipedia is divided into 20 sections, called namespaces, each serving a special purpose (Figure 7). Each namespace has an associated talk namespace for discussion — for instance, the namespace “Image” has “Image Talk” associated with it. Authors discuss on namespaces ranging from zero (main) to seven (image talk). The main namespace contains all encyclopedic articles called the “meat” of Wikipedia. “Talk” refers to discussion pages associated with these articles. The “User” namespace provides pages for registered users' personal presentation and auxiliary pages for personal use containing, for instance, bookmarks to favorite pages. “User talk” refers to discussion pages associated with User pages. “Wikipedia” refers to pages that explain policies and guidelines and talk about Wikipedia’s Sister Projects (e.g. Wiktionary, Wikibooks, Wikinews, etc). “Wikipedia Talk” refers to discussion associated with pages in the Wikipedia namespace. Finally, “Image” is a namespace that provides information about images and sound clips, one page for each file, with a link to the image or sound clip itself. “Image Talk” is the discussion space associated with the Image namespace. Some namespaces hardly existed in the beginning of the encyclopedia. Figure 8 shows a comparison of the FULL03 and FULL05 data, showing that between 2003 and 2005 the fastest growing namespaces were User Talk, followed by Wikipedia (guidelines).

Talk pages

Figure 9. Distribution of postings on talk pages (Viegas et al, 2007)
Figure 9. Distribution of postings on talk pages (Viegas et al, 2007)

[ Viégas et al., 2004 ] considered talk pages as pages characterizing places where conflict was resolved. The Talk pages play an important role in planning and other types of coordination. Editors discuss paragraphs that need reworking and sections that should be added or trimmed. They use Talk pages as a place for collective planning as well as a platform for dispute resolution. Non-empty Talk pages exist for 14.5% of the article pages in the FULL05 database. Heavily edited articles and Talk pages go hand in hand. While the average edits per page in Wikipedia is roughly 15 (median = 2), around 94% of the pages with more than 100 edits have related Talk pages. Articles with associated Talk pages have, on average, 5.8 times more edits and 4.8 times more users than articles without a discussion forum. Many different types of coordination take place within these pages, ranging from high-level discussion on the goals of an encyclopedia to discussions on the minutiae of etymology. Authors manually classify all user posts in a purposeful sample of 25 Talk pages from the Main namespace. The sample was chosen to include a variety of controversial and non-controversial topics and span a spectrum from hard science to pop culture. To ensure the sample contained cases with thorny coordination issues, they selected some pages, such as “George W. Bush,” where page protection had been necessary. The selected Talk pages ranged in size from 12K to 128K. The number of posts on each page ranged from 5 in Color Theory to 205 in Gmail. Requests for coordination were the most common kind of posting, accounting for over half of the contributions on Talk pages (Figure 9). Contributors use Talk pages to discuss their editing activities in advance, to ask for help, and to explain the reasons why they think specific changes should be made. Figure 9 briefly summarizes their results.

References

Viégas, F., Wattenberg, M., & Dave, K. Studying Cooperation and Conflict between Authors with history flow Visualizations. In Proceedings of SIGCHI 2004.

Views
Personal tools
  • Log in / create account