Though Wikimedia projects like Wikipedia are clearly incredibly valuable to people worldwide (e.g., Wikipedia's status as the fifth most popular site worldwide), it has been harder to quantify other facets of this value. Anecdotally, the content from communities like Wikipedia has been incredibly important in the development of natural language processing tools,[supp 1] search engines like Google,[supp 2] and an important resource when making life decisions.[supp 3]
This OpenSym 2018 paper, "What is the Commons Worth? Estimating the Value of Wikimedia Imagery by Observing Downstream Use", attempts to quantify the monetary value of Wikimedia Commons, a peer-produced repository of free-use imagery and video that in part holds the images readers come across on Wikipedia. To do so, the authors pose a counterfactual question: how much would the licensing of this content generate if it operated under a for-profit model such as that of Getty Images? They collect a random dataset of 10,000 images from Commons and do a reverse image-search on them to detect how often they are being used across the internet. The domain of each re-use is then evaluated to determine whether, for instance, it was a commercial entity. Using Getty's licensing model of USD $175 for commercial use and USD $60 for non-commercial use, they extrapolate out how often on average each image is used (and where) to reach a total estimate of USD $28.9 billion for Wikimedia Commons.
While there are interesting discussions to be held about some of the methodological choices that led to their final estimate of USD $28.9 billion for the entirety of Commons – e.g., what is a more reasonable estimate of what proportion of images would be paid for if under license – the general approach and motivation are sound and certainly raise important questions about how we value resources like Wikimedia Commons. This research complements previous estimates of the value of Commons.[supp 4] These are not easy questions, but I'll be excited as more research adds to our understanding of the value of these communities' work.
Cf. earlier coverage: "Estimate for economic benefit of Wikipedia: $50 million by 2006 already"
See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions are always welcome for reviewing or summarizing newly published research.
From the abstract: "We use publically available statistics about the top-1000 most popular pages on each day to estimate the efficiency of caches for support of the platform. While the data volumes are moderate, the main goal of Wikipedia caches is to reduce access times for page views and s. We study the impact of most popular pages on the achievable cache hit rate in comparison to Zipf request distributions and we include daily dynamics in popularity."
From the abstract: "At MIT, librarians, archivists, writing instructors, and local Wikipedians have collaborated to host several -a-thons with the common goals of addressing content gaps on Wikipedia and offering the public and the MIT community (including students, staff, alumni and faculty) new ways to engage with the institute's archives and special collections. [...] This article shares results from MIT's GLAM -a-thons, and argues that approaching projects from the perspective of Wikipedia's collaborative culture can enhance other kinds of academic collaboration."
From the abstract: "The described project that was started in 2015, was collaboratively designed by archivists and historians with the La Guardia & Wagner Archives ("the Archives") and LaGuardia Community College's faculty and librarians, and involves beginning college students in the production of a needed public history of the outbreak and impact of HIV/AIDS in New York City. [...] Utilization of a Wikipedia as a non-commercial, public, open access information source also succeeds in raising web traffic, visibility and accessibility for unique and valuable archival collections."
From the abstract: "This study compared language policies in Hebrew Wikipedia and the Hebrew Facebook translation app. Hebrew Wikipedia designed a strict linguistic guide that promotes a neutral Hebrew register, rejecting both colloquial and high registers, enforced by an algorithm post factum."
From the abstract: "The purpose of this paper is to examine the rather unsuccessful Wikiproject for Cambodia. Despite its lack of success, it is a case that can be used to draw lessons for dealing with the issue of geographical under-representation on Wikipedia as a whole. ... The author takes a broadly qualitative approach to the study of Wikipedia. For this study, the Cambodia Wikiproject main page, as well as the various talk page archives associated with it, was downloaded in November 2016 and subjected to a content analysis. Descriptive statistics are also used when necessary to build the argument. Findings: Wikiproject Cambodia has failed to appreciably improve the coverage of Cambodian topics. This is likely due to its inability to attract for a prolonged period of time a champion able to anchor the project and provide a sense that someone is listening. But the makeup of the project members also suggests that even if a champion could be found, the question of who gets to represent whom remains difficult to deal with. It is unlikely that Cambodia will anytime soon develop a strong community of Wikipedia ors given the economic and social constraints the country imposes on the most of its population."
From the abstract: "While the Wikipedia article on Manila cannot be classified as promotional, it is clear that much of the city remains invisible in this work. Such a puzzle becomes understandable when we examine the urban studies literature where we find that the spatial logic of the city itself helps conceal much from view, so that what we read on Wikipedia is a view from the islands of privilege rather than the oceans of marginalization that make up much of the city's spatial form. If such a spatial structure is to change, representations such as found on Wikipedia need to be challenged."
From the paper (translated): "Finally, the [talk page comments classfied in] the category of personal attacks are remarkable because of their insignificant quantitative dimension. In the context of the White Rose, there was only a single incident of this kind. On the backdrop of widespread hate attacks on the Internet this finding is notable, considering that the resistance against national socialism has never been uncontroversial."