Revolt: Collaborative Crowdsourcing for Labeling MachineLearning Datasets (2017)

Generating comprehensive labeling guidelines for crowdworkers can be challenging for complex datasets. Revolt harnesses crowd disagreements to identify ambiguous concepts in the data and coordinates the crowd to collaboratively create rich structures for requesters to make posthoc decisions, removing the need for comprehensive guidelines and enabling dynamic label boundaries.

Work done during internship at Microsoft Research, Redmond.

Abstract

Crowdsourcing provides a scalable and efficient way to construct labeled datasets for training machine learning systems. However, creating comprehensive label guidelines for crowdworkers is often prohibitive even for seemingly simple concepts. Incomplete or ambiguous label guidelines can then result in differing interpretations of concepts and inconsistent labels. Existing approaches for improving label quality, such as worker screening or detection of poor work, are ineffective for this problem and can lead to rejection of honest work and a missed opportunity to capture rich interpretations about data. We introduce Revolt, a collaborative approach that brings ideas from expert annotation workflows to crowd-based labeling. Revolt eliminates the burden of creating detailed label guidelines by harnessing crowd disagreements to identify ambiguous concepts and create rich structures (groups of semantically related items) for post-hoc label decisions. Experiments comparing Revolt to traditional crowdsourced labeling show that Revolt produces high quality labels without requiring label guidelines in turn for an increase in monetary cost. This up front cost, however, is mitigated by Revolt’s ability to produce reusable structures that can accommodate a variety of label boundaries without requiring new data to be collected. Further comparisons of Revolt’s collaborative and non-collaborative variants show that collaboration reaches higher label accuracy with lower monetary cost.

Downloads

PDF Download ACM Digital Library Technical and Design Notes (draft)

Citation

Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017.
Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets.
In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17).
ACM, New York, NY, USA, 3180-3191. DOI: http://dx.doi.org/10.1145/3025453.3026044

Bibtex

@inproceedings{Chang:2017:Revolt,
 author = {Chang, Joseph Chee and Amershi, Saleema and Kamar, Ece},
 title = {Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets},
 booktitle = {Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems},
 series = {CHI '17},
 year = {2017},
 url = {http://doi.acm.org/10.1145/3025453.3026044},
 doi = {10.1145/3025453.3026044},
 publisher = {ACM},
 address = {New York, NY, USA},
}

Apr 23, 2023	SIGCHI - Hamburg, Germany.
Nov 8, 2022	CSCW - online. (originally in Taiwan, thanks Covid :(
Oct 29, 2022	UIST - Bend, Oregon.
Apr 30, 2022	SIGCHI - New Orleans, Louisiana.
Oct 10, 2021	UIST - online.
Jun 15, 2021	Google Research - Invited Talk.
May 8, 2021	SIGCHI - online. Best Paper Honorable Mentions Award.
Oct 20, 2020	UIST - online.
Aug 6, 2020	AI2 - Invited Talk.
Mar 16, 2019	IUI - Los Angeles, California.
Apr 21, 2018	SIGCHI - Montreal, Canada. Best Paper Honorable Mentions Award.
May 8, 2017	SIGCHI - Denver, Colorado.
Oct 30, 2016	HCOMP - Austin, Texas. Invited Talk.
Oct 16, 2016	UIST - Tokyo, Japan.
May 16, 2016	Microsoft Research Internship - Redmond, Washington.
May 7, 2016	SIGCHI - San Jose, California. Best Paper Honorable Mentions Award x2.
Mar 13 2016	CHIIR (student volunteer) - Chapel Hill, North Carolina.

Revolt

Collaborative Crowdsourcing for Labeling Machine Learning Datasets

Abstract

Downloads

Citation

Bibtex

contact

found on

recent activities

random