iNat project | Yurong He

A mixed-methods study -

How iNaturalist (iNat) users collaboratively improve data quality (2015-2016)

Context

iNaturalist is a social network/crowdsourcing site where users can record any organisms they observe in nature, meet other nature lovers, and learn about the natural world.

As of 2016, iNat has over 1,300, 000 public accessible observation records (biodiversity data) created by over 70,000 users. Dr. Andrea Wiggins and I were aware that people (e.g., scientists, program managers) talk a lot about two things:

using iNat ”Research” grade data in their studies,
using iNat to manage data and recruit participants for biodiversity programs that involve members from the public and has no money to build their own information system.

However, people have concerns about the data quality control processes and whether this platform is suitable for different types of biodiversity programs. So Andrea and I both agreed that we should conduct a rigorous research study to help answer those questions.

I was a PhD researcher at Human-Computer Interaction lab at University of Maryland. Dr. Andrea Wiggins was an assistant professor at the University of Maryland iSchool. Our collaborators for this study included program managers, scientists, educators, professional photographers from different organizations.

Methods and Analysis

For understanding the data creation and validation processes

Participant observation in a place-based science educational project that utilized iNat as a data management platform
Qualitative analysis field notes, photos, and online comments through inductive and deductive coding processes

For understanding the factors influencing validation processes

Regression analyses of observation record metadata exported using the site’s download tool (N=925 records)

Findings

Collaborative data validation behavior: for each record that reaches ”Research” grade

iNat users who record observations:
- Provide date, and plausible geographic information
- Upload any good quality media voucher (photo, audio, video)
iNat users who verify observations:
- Agree/disagree with taxonomic ID
- Refine ID by suggesting different or more specific ID
- Leave comments usually clarifying questions

Factors influence data validation behavior:

Observation record captured on mobile devices had lower quality grade, less agreement, less taxonomic specificity
Explicitly asking for ID help from other iNat users correlated with lower quality grade, less taxonomic specificity
When geographic position was less accurate, taxonomic specificity was better
Birds rule! Aves more likely research grade, more ID agreement, usually ID’d to species (Plants & all else, not so much)

Impact

For people who consider use iNat data, this study provide assurance in respect of how each observation records is collaboratively verified on this site
For people who consider use iNat to manage data and recruit the public to participate in science, this study provides important insights for their program design and data management plan:
What device is used to record matters, mobile devices (e.g., smartphone) are easier to use, but lead to lower quality grade data
Not all biodiversity programs get equal attention from iNat users who are willing to verify data, bird programs win (this may also explain why iNat brand is a bird)

The full results of this study are published in a top-tier HCI conference:

Wiggins, A., & He, Y. (2016). Community-based data validation practices in citizen science. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW’16). ACM. https://dl.acm.org/citation.cfm?id=2820063 [Honorable Mention Award, top 5% of 571 submitted papers]

Back to WORK