Almeida et. al., ICWSM 2007
From ScribbleWiki: Analysis of Social Media
On the Evolution of Wikipedia
- Author: Almeida B. R., Mozafari B. and Cho J.
- Conference: Proceedings of ICWSM, 2007.
- Link: http://www.icwsm.org/papers/2--Almeida-Mozafari-Cho.pdf
- Maintainer: Sameer Badaskar
While the paper by (Wilkinson et. al., ACM 2007) relates the quality of Wikipedia articles to the number of editors and edits, this paper focuses on the temporal evolutionary aspects of Wikipedia as a whole. Some of the questions addressed are: How does Wikipedia grow over time ? What is the temporal trend of authoring processes ?
Growth of Wikipedia could be measured in terms of the number of authors and the number of articles being accreted over time. While it is difficult to measure the number of authors due to anonymous contributions (when only the IP address of the remote machine is registered), growth can be expressed in terms of the number of articles. Wikipedia shows an exponential growth trend given by N(t) = C * exp(a * t). Here N(t) is the number of articles created at time t. The exponent a is found experimentally to be 2.31 x 10^-8. The growth of user base of Wikipedia is also shown to be exponential but the concern raised at the beginning of this section has not been addressed.
Wikipedia Activity over Time
Activity can be quantified as the average number of articles created and edited per unit time. The graphs of article-creation and article-updates versus time show self-similar characteristics. What this means is that if the time series is viewed at different time-scale resolutions, it would show similar trends. To assess the amount of self-similarity, the authors use the Hurst Test (for more details, see the paper). The Hurst Parameter k resulting from the test assesses whether a given time series is self-similar or not. A fractal time-series is characterized by 0.5 < k < 1 which holds for the case of Wikipedia (k = 0.77).
Also, the distribution of edits for articles follows a Zipf distribution which means that there are a small percentage of articles which are heavily edited while a large proportion of articles receive few edits. This observation is consistent with the one made by (Wilkinson et. al., ACM 2007). At the same time, on an average, users spend most (70%) of their time updating articles than writing new articles.
Though this paper gives a picture of the global trends in wikipedia, it does not account for automatic edits/article-creation. Infact, if automatically created/edited articles were ignored, the figure 4(c) would have given clearer picture of the amount of user activity over time.
The Wikipedia article database dump is accessible at http://en.wikipedia.org/wiki/Wikipedia:Database_download