The Science of Citizen Science: Meetings in San Jose This Week

[This is adapted from a post I wrote on the Zooniverse blog.]

I’m excited about attending the Citizen Science Association (CSA) and American Association for the Advancement of Scientists (AAAS) meetings in San Jose, California this week, and I thought I’d tell you a bit about the citizen science-related events I’m looking forward to. I’ll write about other events and science news later, and in any case, check out the hashtags #CitSci2015, #AAASmtg and #AAAS2015 on Twitter for live updates.

As I mentioned in an earlier post last fall, we’ve organized an AAAS session that is titled, “Citizen Science from the Zooniverse: Cutting-Edge Research with 1 Million Scientists,” which will take place on Friday afternoon. It fits well with the AAAS’s them this year: “Innovations, Information, and Imaging.” Our excellent line-up includes Laura Whyte (Adler Planetarium) on Zooniverse, Brooke Simmons (Oxford) on Galaxy Zoo, Alexandra Swanson (U. of Minnesota) on Snapshot Serengeti, Kevin Wood (U. of Washington) on Old Weather, Paul Pharoah (Cambridge) on Cell Slider, and Phil Marshall (Stanford) on Space Warps. I’ll be chairing the session, but they’ll be doing all the hard work.

And in other recent news, citizen scientists from the Zooniverse’s Milky Way Project examined infrared images from NASA’s Spitzer Space Telescope and found lots of “yellow balls” in our galaxy. It turns out that these are indications of early stages of massive star formation, such that the new stars heat up the dust grains around them. Charles Kerton (Iowa State) and Grace Wolf-Chase (Adler) published the results last week in the Astrophysical Journal.

Courtesy: JPL

Courtesy: JPL

But let’s get back to the AAAS meeting. It looks like many other talks, sessions, and papers presented there involve citizen science too. David Baker (FoldIt) will give plenary lecture on post-evolutionary biology and protein structures on Saturday afternoon. Jennifer Shirk (Cornell), Meg Domroese and others from CSA have a session Sunday morning, in which they will describe ways to utilize citizen science for public engagement. (See also this related session on science communication.) Then in a session Sunday afternoon, people from the European Commission and other institutions will speak about global earth observation systems and citizen scientists tackling urban environmental hazards.

Before all of that, we’re excited to attend the CSA’s pre-conference on Wednesday and Thursday. (See their online program.) Chris Filardi (Director of Pacific Programs, Center for Biodiversity and Conservation, American Museum of Natural History) and Amy Robinson (Executive Director of EyeWire, a game to map the neural circuits of the brain) will give the keynote addresses there. For the rest of the meeting, as with the AAAS, there will be parallel sessions.

The first day of the CSA meeting will include: many sessions on education and learning at multiple levels; sessions on diversity, inclusion, and broadening engagement; a session on defining and measuring engagement, participation, and motivations; a session on CO2 and air quality monitoring; a session on CS in biomedical research; and sessions on best practices for designing and implementing CS projects, including a talk by Chris Lintott on the Zooniverse and Nicole Gugliucci on CosmoQuest. The second day will bring many more talks and presentations along these and related themes, including one by Julie Feldt about educational interventions in Zooniverse projects and one by Laura Whyte about Chicago Wildlife Watch.

Furthermore, a couple sessions include some presentations that will interest southern Californians. Barbara Lloyd (Ocean Sanctuaries) will give a talk about “Identifying Sevengill Sharks in San Diego with Wildbook,” and Mark Chandler (Earthwatch Institute) will talk about “Engaging a Diversity of Citizen Scientists around Urban Trees in Greater Los Angeles.”

I also just heard that the Commons Lab at the Woodrow Wilson Center is releasing two new reports today, and hardcopies will be available at the CSA meeting. One report is by Muki Haklay (UCL) about “Citizen Science and Policy: A European Perspective” and the other is by Teresa Scassa & Haewon Chung (U. of Ottawa) about “Typology of Citizen Science Projects from an Intellectual Property Perspective.” Look here for more information.

AAAS Symposium in Feb. 2015: Cutting-Edge Research with 1 Million Citizen Scientists

[This is an expanded version of a post I wrote for the Galaxy Zoo blog.]

Some colleagues and I successfully proposed for a symposium on citizen science at the annual meeting of the American Association for the Advancement of Science (AAAS) in San Jose, CA in February 2015. (The AAAS is the world’s largest scientific society and is the publisher of the Science journal.) Our session will be titled “Citizen Science from the Zooniverse: Cutting-Edge Research with 1 Million Scientists.” It refers to the more than one million volunteers participating in a variety of citizen science projects. This milestone was reached in February, and the Guardian and other news outlets reported on it.


“Citizen science” (CS) involves public participation and engagement in scientific research in a way that educates the participants, makes the research more democratic, and makes it possible to perform tasks that a small number of researchers could not accomplish alone. (See my recent post on new developments in citizen science.)


The Zooniverse began with Galaxy Zoo, which recently celebrated its seventh anniversary, and which turned out to be incredibly popular. (I’ve been heavily involved in Galaxy Zoo since 2008.) Galaxy Zoo participants produced numerous visual classifications of hundreds of thousands of galaxies, yielding excellent datasets for statistical analyses and for identifying rare objects. Its success led to the development of a variety of CS projects coordinated by the Zooniverse in a diverse range of fields. For example, they include: Snapshot Serengeti, where people classify different animals caught in millions of camera trap images; Cell Slider, where they classify images of cancerous and ordinary cells and contribute to cancer research; Old Weather, where participants transcribe weather data from log books of Arctic exploration and research ships at sea between 1850 and 1950, thus contributing to climate model projections; and Whale FM, where they categorize the recorded sounds made by killer and pilot whales. And of course, in addition to Galaxy Zoo, there are numerous astronomy-related projects, such as Disk Detective, Planet Hunters, the Milky Way Project, and Space Warps.


We haven’t confirmed the speakers for our AAAS session yet, but we plan to have six speakers from the US and UK who will introduce and present results from the Zooniverse, Galaxy Zoo, Snapshot Serengeti, Old Weather, Cell Slider, and Space Warps. I’m sure it will be exciting and we’re all looking forward to it! I’m also looking forward to the meeting of the Citizen Science Association, which will be a “pre-conference” preceding the AAAS meeting.

Frontiers of Citizen Science

Since some colleagues and I recently submitted a proposal for a symposium on citizen science at a conference next year, I thought this would be a good time to write some more about citizen science and what people are doing with it. I previously gave a brief introduction to the “citizen science” phenomenon (also called “crowd science”, “crowd-sourced science”, “networked science”, “civic science”, “massively-collaborative science”, etc.) in an earlier post. The presence of massive online datasets and the availability of high-speed internet access and social media provide many opportunities for citizen scientists to work on projects analyzing and interpreting data for research.

Citizen science (CS) is an increasingly popular activity, it’s produced impressive achievements already, and it clearly has potential for more. (It also even has a meme!) You don’t have to look hard to see accomplishments of CS projects in the news. A quick online search brought up citizen scientists studying bumblebees, bird nests, weather events, plankton, and other projects. The growing phenomenon of CS has drawn the interest of social scientists as well, and I’ll say more about their research later in this post.


I’m particularly familiar with the Zooniverse, a platform that hosts projects in a variety of fields. It began in 2007 with the Galaxy Zoo project, which I’ll say more about below, and its other astronomy/astrophysics projects include Disk Detective, Planet Hunters, Moon Zoo, and Space Warps. To give other examples, outside of astronomy, there are projects in zoology, such as Snapshot Serengeti to study animals and their behavior with “camera trap” photos (the graph above describes herbivores they’ve cataloged, from a recent blog post); in biology/medicine, such as Cell Slider to identify cancer cells and aid research; and in climate science, there is Old Weather, which examines ship’s logs to study historical weather patterns. In addition, people at Adler Planetarium and elsewhere are working on producing educational resources and public outreach programs.


Galaxy Zoo (GZ) invites volunteers to visually classify the shapes and structures of galaxies seen in images from optical surveys. The project resulted in catalogs of hundreds of thousands of visually classified galaxies—much much better than anything achieved before—allowing for novel statistical analyses and the identification of rare objects and subtle trends. If you’re interested in my own research, I’m leading clustering and astrostatistical analyses of GZ catalogs to study the spatial distribution of galaxies and determine how their morphologies are related to the dark matter distribution and large-scale structure of the universe. For example, with more and better data than pre-GZ studies, my colleagues and I obtained statistically significant evidence that galaxies with stellar bars tend to reside in denser environments (see this paper). In the figure above, you can see examples of barred galaxies (lower panels) and unbarred ones (upper panels). In 2009, we used the impressive GZ datasets to disentangle the environmental dependence of galaxy color and morphology, since we tend to see redder and elliptical galaxies in denser regions (see this paper). Time permitting, I’d like to extend this work by using those results with detailed dark matter halo models, and we could potentially compare our results to galaxies in the Illustris simulation (which has been getting a lot of media attention and was misleadingly described as “the first realistic model of the universe“).

Galaxy Zoo scientists have many other achievements and interesting research. For example, a Dutch schoolteacher, Hanny van Arkel, discovered a unique image of a quasar light echo, which was dubbed “Hanny’s Voorwerp” (Lintott et al. 2009). GZ volunteers also identified galaxies that appeared to look like “green peas”, and most of them turned out to be small, compact, star-bursting galaxies (Cardamone et al. 2009). In addition, Laura Trouille is leading the Galaxy Zoo Quench project, in which participants contribute to the whole research process by classifying images, analyzing data, discussing results, and writing a paper about them.

Citizen science is related to “big data” and data-driven science (see also this article), and in particular to data mining and machine learning. According to a new astrostatistics book by Ivezic, Connolly, VanderPlas, & Gray, data mining is “a set of techniques for analyzing and describing structured data, for example, finding patterns in large data sets. Common methods include density estimation, unsupervised classification, clustering, principal component analysis, locally linear embedding, and projection pursuit.” Machine learning is a “term for a set of techniques for interpreting data by comparing them to models for data behavior (including the so-called nonparametric models), such as various regression methods, supervised classification methods, maximum likelihood estimators, and the Bayesian method.” Kaggle has data prediction competitions for machine learning, and their most recent one involved challenging people to develop automated algorithms to classify GZ galaxy morphologies like as well as the “crowd-sourced” classifications, and the winning codes performed rather well. Nothing beats numerous visual classifications, but there is clearly much to be learned along these lines.

Finally, sociologists, political scientists, economists and other social scientists have been studying CS, such as the organization and efficacy of CS projects, motivations of participants, and applications to industry and policy making. For example, Amy Freitag has written about how citizen science programs define “success” and their rigorous data collection. The sociologist Anne Holohan has written a book Community, Competition and Citizen Science on collaborative computing projects around the world. Eugenia Rodrigues is studying the views and experiences of participants in CS initiatives, and Hauke Riesch has written on this subject as well. (This is also related to the work by Galaxy Zoo scientists in Raddick et al. on participants’ motivations.)

In a recent interesting article, Chiara Franzoni & Henry Sauermann analyze the organizational features, dimensions of openness, and benefits of CS research. As case studies, they examine GZ, Foldit (an online computer game about protein folding), and Polymath (involving many mathematicians collectively solving problems). They argue that the open participation and open disclosure of inputs, which they mention is also characteristic of open source software, distinguish CS from traditional “Mertonian” science. (Robert Merton was a sociologist who emphasized—perhaps too much—social and cultural factors in science, such as scientists’ desire for peer recognition and career benefits, disputes between scientists, etc. I ended up not discussing him in my post on “paradigm shifts“.) They also discuss knowledge-related and motivational benefits, and they point out that CS projects that involve subjects less popular than astronomy or ornithology, for example, or that address very narrow and specific questions may face challenges in recruiting volunteers. Finally, they discuss organizational challenges, such as division of labor and the need for project leadership and infrastructure. If you’re interested, Bonney et al. in Science magazine is another shorter article about organizational challenges and developments in citizen science.

Big Science and Big Data

I’d like to introduce the topic of “big science.” This is especially important as appropriations committees in Congress debate budgets for NASA and NSF in the US (see my previous post) and related debates occurred a couple month’s ago in Europe over the budget of the European Space Agency (ESA).

“Big science” usually refers to large international collaborations on projects with big budgets and long time spans. According to Harry Collins in Gravity’s Shadow (2004),

small science is usually a private activity that can be rewarding to the scientists even when it does not bring immediate success. In contrast, big-spending science is usually a public activity for which orderly and timely success is the priority for the many parties involved and watching.

He goes on to point out that in a project like the Laser Interferometer Gravitational-Wave Observatory (LIGO), it’s possible to change from small science to big but it means a relative loss of autonomy and status for most of the scientists who live through the transition. Kevles & Hood (1992) distinguish between “‘centralized’ big science, such as the Manhattan Project and the Apollo program; ‘federal’ big science, which collects and organizes data from dispersed sites; and ‘mixed’ big science, which offers a big, centrally organized facility for the use of dispersed teams.”

In addition to LIGO, there are many other big science projects, such the Large Hadron Collider (LHC, which discovered the Higgs boson), the International Thermonuclear Experimental Reactor (ITER), and in astronomy and astrophysics, the James Webb Space Telescope (JWST, the successor to Hubble), the Large Synoptic Survey Telescope (LSST, pictured below), and the Wide-Field InfraRed Survey Telescope (WFIRST), for example.


Note that some big science projects are primarily supported by government funding while others receive significant funding from industry or philanthropists. LSST and LIGO are supported by the NSF, JWST and WFIRST are supported by NASA, and LHC is supported by CERN, but all of these are international. In the case of the fusion reactor ITER (see diagram below), on which there was a recent detailed New Yorker article, it has experienced many delays and has gone over its many-billion-dollar budget, and it has had management problems as well. While budget and scheduling problems are common for big science projects, ITER is in a situation in which it needs produce results in the near future and avoid additional delays. (The US is committing about 9% to ITER’s total cost, but its current contribution is lower than last year’s and its future contributions may be reevaluated at later stages of the project.)

in-cryostat overview 130116

As scientists, we try to balance small-, mid-, and large-size projects. The large ones are larger than before, require decades of planning and large budgets, and often consist of collaborations with hundreds of people from many different countries. It’s important to be aware that relatively small- and mid-scale projects (such as TESS and IBEX in astronomy) are very important too for research, innovation, education, and outreach, and as they usually involve fewer risks, they can provide at least as much “bang for the buck” (in the parlance of our times).

In the context of “big science” projects these days, the concepts of “big data” and “data-driven science” are certainly relevant. Many people argue that we are now in an era of big data, in which we’re obtaining collections of datasets so large and complex that it becomes difficult to process them using on-hand database management tools or traditional data processing applications. Since the volume, velocity, and variety of data are rapidly increasing, it is increasingly important to develop and apply appropriate data mining techniques, machine learning, scalable algorithms, analytics, and other kinds of statistical tools, which often require more computational power than traditional data analyses. (For better or for worse, “big data” is also an important concept in the National Security Agency and related organizations, in government-funded research, and in commercial analyses of consumer behavior.)

In astronomy, this is relevant to LSST and other projects mentioned above. When LSST begins collecting data, each night for ten years it will obtain roughly the equivalent amount of data that was obtained by the entire Sloan Digital Sky Survey, which was until recently the biggest survey of its kind, and it will obtain about 800 measurements each for about 20 billion sources. We will need new ways to store and analyze these vast datasets. This also highlights the importance of “astrostatistics” (including my own) and of “citizen science” (which we introduced in a previous post) such as the Galaxy Zoo project. IT companies are becoming increasingly involved in citizen science as well, and the practice of citizen science itself is evolving with new technologies, datasets, and organizations.

I’ll end by making a point that was argued in a recent article in Science magazine: we should avoid “big data hubris,” the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.

More from the AAAS meeting

The second half of the AAAS meeting in Chicago was interesting too. (I wrote about the first half in my previous post.)


Probably the best and most popular event at the meeting was Alan Alda’s presentation. You’ll know Alan Alda as the actor from M*A*S*H (and recently, 30 Rock), but he’s also a visiting professor at the Alan Alda Center for Communicating Science at Stony Brook University. He gave an inspiring talk to a few thousand people about how to communicate science clearly and effectively in a way that people can understand. He talked about how one should avoid or be careful about using jargon. Interaction with the audience is important, and one can do that by telling a personalized story (with a hero, goal, and an obstacle, which develops an emotional connection), or by engaging with the audience so that they become participants. It’s also important to communicate what is most interesting or exciting or curiosity-piquing about the science, but in the end, the words you use don’t matter as much as your body language and tone of voice. It’s also good to develop improvisation skills, so when a particular explanation or analogy doesn’t appear to work well with the audience, you can adapt to the situation. He referred to the “curse of knowledge”, such that as scientists we forget what it’s like not to be experts in our particular field of research. That can be an obstacle when interacting with most segments of the public, Congress members and other politicians (most of whom aren’t scientists or haven’t the time to become familiar with the science), and even with scientists in other fields. Most of all, one needs to be clear, engaged, and connected with one’s audience. Finally, Alda told us about the “flame challenge
–challenging scientists to explain flames and other concepts for 11-year olds to understand. (The kids are also the judges of the competition.) If the video of Alda’s talk becomes available online, I’ll link to it here for you.

I attended an interesting session on climate change and whether/how it’s possible to reduce 80% of greenhouse gas emissions from energy by 2050. As pointed out by the chair, Jane Long (who is one of the authors of this report), our energy needs will likely double or even triple by then, while we must be simultaneously reducing carbon emissions. Peter Loftus discussed this issue as well, and showed the primary energy demand as well as energy intensity (energy used per unit GDP) have been rapidly increasing over the past twenty years, partly due to China. But to obtain substantial carbon reductions, the intensity needs to drop below what we’ve had for the past 40 years! We need to massively add to power generation capacity (10 times more rapidly than our previous rates), and it might not be feasible to exclude both nuclear and “carbon capture” in the process. Karen Palmer gave an interesting talk about the importance of energy efficiency as part of the solution, but she says that one problem is that it’s still hard to evaluate which policies best promote energy efficiency as well as ultimately energy savings and carbon emission reductions. Richard Lester made strong arguments about the need for nuclear power, since renewables might not be up for the task of meeting rising energy demands in the near future. This was disputed by Mark Jacobson, who pointed out that nuclear power has 9-25 times more pollution per kW-hour than wind (due to mining and refining) and it takes longer to construct a plant than the 2-5 years it takes to build wind or solar farms. Jacobson also discussed state-by-state plans: California benefits from many solar devices, for example, while some places in the northeast could use offshore wind farms. In addition, such offshore arrays could withstand and dissipate hurricanes (depending on their strength), and WWS (wind, water, solar) could generate about 1.5 million new jobs in the U.S. in construction alone. Different countries have very different economic situations and carbon footprints, so different solutions may be needed.

I caught part of a session on “citizen science” (see my previous post). Chris Lintott spoke about the history of citizen science and about how the internet has allowed for unprecedented growth and breadth of projects, including the numerous Zooniverse projects. Caren Cooper discussed social benefits of citizen science, and Carsten Østerlund discussed what motivates the citizen scientists themselves and how they learn as they participate. Lastly, Stuart Lynn spoke about how the next generation of citizen science systems can be developed, so that they can accommodate larger communities and larger amounts of data and so that people can classify billions of galaxies with the upcoming Large Synoptic Survey Telescope, for example.

Finally, there was another interesting session on how scientists can work with Congress and on the challenges they face, but more on that later…

Citizen Science: a tool for education and outreach

I’ll write about a different kind of topic today. “Citizen science” is a relatively new term though the activity itself is not so new. One definition of citizen science is “the systematic collection and analysis of data; development of technology; testing of natural phenomena; and the dissemination of these activities by researchers on a primarily avocational basis.” It involves public participation and engagement in scientific research in a way that educates the participants, makes the research more democratic, and makes it possible to perform tasks that a small number of researchers could not accomplish alone. Volunteers simply need access to a computer (or smartphone) and an internet connection to become involved and assist scientific research.


Citizen science was popularized a few years ago by Galaxy Zoo, which involved visually classifying hundreds of thousands of galaxies into spirals, ellipticals, mergers, and finer classifications using the classification tree below. (I am a member of the Galaxy Zoo collaboration and have published a few papers with them.) As a result of “crowdsourcing” the work of more than 100,000 volunteers around the world, new scientific research can be done that was not previously possible with such large datasets, including studies of the handedness of spiral galaxies, analyses of the environmental dependence of barred galaxies, and the identification of rare objects such as a quasar light echo that was dubbed “Hanny’s Voorwerp”. Other citizen science projects include mapping the moon, mapping air pollution, counting birds with birdwatchers, classifying a variety of insects, and many other projects.


Citizen scientists have many motivations, but it appears that the primary one is the desire to make a contribution to scientific research (see this paper). In the process, by bringing together professional scientists and members of the general public and facilitating interactions between them, citizen science projects are important for outreach purposes, not just for research. In addition, by encouraging people to see a variety of images or photographs and to learn about how the research is done, citizen science is useful for education as well. Many valuable educational tools have been produced (such as by the Zooniverse projects). Citizen science projects are popular and proliferating because they give the opportunity for people at home or in the classroom to become actively involved in science. It has other advantages too, including raising awareness and stimulating interest in particular issues. Citizen science is continuing to evolve, and in the era of “big data” and social media, it has much potential and room for improvement.