Frontiers of Citizen Science

Since some colleagues and I recently submitted a proposal for a symposium on citizen science at a conference next year, I thought this would be a good time to write some more about citizen science and what people are doing with it. I previously gave a brief introduction to the “citizen science” phenomenon (also called “crowd science”, “crowd-sourced science”, “networked science”, “civic science”, “massively-collaborative science”, etc.) in an earlier post. The presence of massive online datasets and the availability of high-speed internet access and social media provide many opportunities for citizen scientists to work on projects analyzing and interpreting data for research.

Citizen science (CS) is an increasingly popular activity, it’s produced impressive achievements already, and it clearly has potential for more. (It also even has a meme!) You don’t have to look hard to see accomplishments of CS projects in the news. A quick online search brought up citizen scientists studying bumblebees, bird nests, weather events, plankton, and other projects. The growing phenomenon of CS has drawn the interest of social scientists as well, and I’ll say more about their research later in this post.

I’m particularly familiar with the Zooniverse, a platform that hosts projects in a variety of fields. It began in 2007 with the Galaxy Zoo project, which I’ll say more about below, and its other astronomy/astrophysics projects include Disk Detective, Planet Hunters, Moon Zoo, and Space Warps. To give other examples, outside of astronomy, there are projects in zoology, such as Snapshot Serengeti to study animals and their behavior with “camera trap” photos (the graph above describes herbivores they’ve cataloged, from a recent blog post); in biology/medicine, such as Cell Slider to identify cancer cells and aid research; and in climate science, there is Old Weather, which examines ship’s logs to study historical weather patterns. In addition, people at Adler Planetarium and elsewhere are working on producing educational resources and public outreach programs.

Galaxy Zoo (GZ) invites volunteers to visually classify the shapes and structures of galaxies seen in images from optical surveys. The project resulted in catalogs of hundreds of thousands of visually classified galaxies—much much better than anything achieved before—allowing for novel statistical analyses and the identification of rare objects and subtle trends. If you’re interested in my own research, I’m leading clustering and astrostatistical analyses of GZ catalogs to study the spatial distribution of galaxies and determine how their morphologies are related to the dark matter distribution and large-scale structure of the universe. For example, with more and better data than pre-GZ studies, my colleagues and I obtained statistically significant evidence that galaxies with stellar bars tend to reside in denser environments (see this paper). In the figure above, you can see examples of barred galaxies (lower panels) and unbarred ones (upper panels). In 2009, we used the impressive GZ datasets to disentangle the environmental dependence of galaxy color and morphology, since we tend to see redder and elliptical galaxies in denser regions (see this paper). Time permitting, I’d like to extend this work by using those results with detailed dark matter halo models, and we could potentially compare our results to galaxies in the Illustris simulation (which has been getting a lot of media attention and was misleadingly described as “the first realistic model of the universe“).

Galaxy Zoo scientists have many other achievements and interesting research. For example, a Dutch schoolteacher, Hanny van Arkel, discovered a unique image of a quasar light echo, which was dubbed “Hanny’s Voorwerp” (Lintott et al. 2009). GZ volunteers also identified galaxies that appeared to look like “green peas”, and most of them turned out to be small, compact, star-bursting galaxies (Cardamone et al. 2009). In addition, Laura Trouille is leading the Galaxy Zoo Quench project, in which participants contribute to the whole research process by classifying images, analyzing data, discussing results, and writing a paper about them.

Citizen science is related to “big data” and data-driven science (see also this article), and in particular to data mining and machine learning. According to a new astrostatistics book by Ivezic, Connolly, VanderPlas, & Gray, data mining is “a set of techniques for analyzing and describing structured data, for example, finding patterns in large data sets. Common methods include density estimation, unsupervised classification, clustering, principal component analysis, locally linear embedding, and projection pursuit.” Machine learning is a “term for a set of techniques for interpreting data by comparing them to models for data behavior (including the so-called nonparametric models), such as various regression methods, supervised classification methods, maximum likelihood estimators, and the Bayesian method.” Kaggle has data prediction competitions for machine learning, and their most recent one involved challenging people to develop automated algorithms to classify GZ galaxy morphologies like as well as the “crowd-sourced” classifications, and the winning codes performed rather well. Nothing beats numerous visual classifications, but there is clearly much to be learned along these lines.

Finally, sociologists, political scientists, economists and other social scientists have been studying CS, such as the organization and efficacy of CS projects, motivations of participants, and applications to industry and policy making. For example, Amy Freitag has written about how citizen science programs define “success” and their rigorous data collection. The sociologist Anne Holohan has written a book Community, Competition and Citizen Science on collaborative computing projects around the world. Eugenia Rodrigues is studying the views and experiences of participants in CS initiatives, and Hauke Riesch has written on this subject as well. (This is also related to the work by Galaxy Zoo scientists in Raddick et al. on participants’ motivations.)

In a recent interesting article, Chiara Franzoni & Henry Sauermann analyze the organizational features, dimensions of openness, and benefits of CS research. As case studies, they examine GZ, Foldit (an online computer game about protein folding), and Polymath (involving many mathematicians collectively solving problems). They argue that the open participation and open disclosure of inputs, which they mention is also characteristic of open source software, distinguish CS from traditional “Mertonian” science. (Robert Merton was a sociologist who emphasized—perhaps too much—social and cultural factors in science, such as scientists’ desire for peer recognition and career benefits, disputes between scientists, etc. I ended up not discussing him in my post on “paradigm shifts“.) They also discuss knowledge-related and motivational benefits, and they point out that CS projects that involve subjects less popular than astronomy or ornithology, for example, or that address very narrow and specific questions may face challenges in recruiting volunteers. Finally, they discuss organizational challenges, such as division of labor and the need for project leadership and infrastructure. If you’re interested, Bonney et al. in Science magazine is another shorter article about organizational challenges and developments in citizen science.