Finding a Choice in a Haystack: Automatic Extraction of Opt-Out Statements from Privacy Policy Text WWW ’20, April 20–24, 2020, Taipei
REFERENCES
[1]
Aigbe Akhigbe and Ann Marie Whyte. 2004. The Gramm-Leach-Bliley Act of
1999: Risk implications for the nancial services industry. Journal of Financial
Research 27, 3 (2004), 435–446.
[2]
Amazon Web Services, Inc. 2017. Alexa Top Sites. https://docs.aws.amazon.com/
AlexaTopSites/latest/index.html. (2017).
[3]
Rebecca Balebako, Pedro Leon, Richard Shay, Blase Ur, Yang Wang, and Lor-
rie Faith Cranor. 2012. Measuring the Eectiveness of Privacy Tools for Limiting
Behavioral Advertising. In Proceedings of the Web 2.0 Security and Privacy Work-
shop (W2SP).
[4]
Eric Baucom, Azade Sanjari, Xiaozhong Liu, and Miao Chen. 2013. Mirroring
the real world in social media: twitter, geolocation, and sentiment analysis. In
Proceedings of the 2013 international workshop on Mining unstructured big data
using natural language processing. ACM, 61–68.
[5] S. Behnel. 2005. lxml - XML and HTML with Python. https://lxml.de/. (2005).
[6]
Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing
with Python (1st ed.). O’Reilly Media, Inc.
[7]
Alexander Bleier and Maik Eisenbeiss. 2015. The Importance of Trust for Person-
alized Online Advertising. Journal of Retailing 91, 3 (2015), 390–409.
[8] Bloomberg Businessweek. 2000. Business Week/Harris Poll: A Growing Threat.
(2000), 96.
[9]
California State Legislature Website. 2018. SB-1121 California Consumer Privacy
Act of 2018. (2018). https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?
bill_id=201720180SB1121.
[10]
Fred H Cate. 2010. The limits of notice and choice. IEEE Security & Privacy 8, 2
(2010), 59–62.
[11]
Eugene Charniak and Mark Johnson. 2005. Coarse-to-ne n-best parsing and
MaxEnt discriminative reranking. In Proceedings of the 43rd annual meeting on as-
sociation for computational linguistics. Association for Computational Linguistics,
173–180.
[12]
Shan Chen, Dan Hong, and Vincent Shen. 2005. An Experimental Study on
Validation Problems with Existing HTML Webpages. 373–379.
[13]
Lorrie Faith Cranor, Joseph Reagle, and Mark S Ackerman. 1999. Beyond Concern:
Understanding Net Users’ Attitudes About Online Privacy. Technical Report. TR
99.4.1, AT&T Labs-Research.
[14]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT:
Pre-training of Deep Bidirectional Transformers for Language Understanding.
CoRR abs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805
[15]
Digital Advertising Alliance. 2009. Self-Regulatory Principles for Online Behav-
ioral Advertising. (July 2009). http://digitaladvertisingalliance.org/principles.
[16]
Digital Advertising Alliance. 2019. Your AdChoices. (2019). https://youradchoices.
com/.
[17]
European Commission. 2016. EGULATION (EU) 2016/679 OF THE EUROPEAN
PARLIAMENT AND OF THE COUNCIL of 27 April 2016 on the protection
of natural persons with regard to the processing of personal data and on the
free movement of such data, and repealing Directive 95/46/EC (General Data
Protection Regulation). (2016). https://eur-lex.europa.eu/legal-content/EN/TXT/
PDF/?uri=CELEX:32016R0679.
[18]
Benjamin Fabian, Tatiana Ermakova, and Tino Lentz. 2017. Large-Scale Readabil-
ity Analysis of Privacy Policies. In Proceedings of the International Conference on
Web Intelligence (WI). 18–25.
[19]
Joshua Gluck, Florian Schaub, Amy Friedman, Hana Habib, Norman Sadeh, Lor-
rie Faith Cranor, and Yuvraj Agarwal. 2016. How Short is Too Short? Implications
of Length and Framing on the Eectiveness of Privacy Notices. In Proceedings
of the Twelfth USENIX Conference on Usable Privacy and Security (SOUPS ’16).
USENIX Association, USA, 321–340.
[20]
Yoav Goldberg and Omer Levy. 2014. word2vec Explained: deriving Mikolov et
al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722
(2014).
[21]
Hana Habib, Sarah Pearman, Jiamin Wang, Yixin Zou, Alessandro Acquisti,
Lorrie Faith Cranor, Norman Sadeh, and Florian Schaub. 2020. “It’s a scavenger
hunt”: Usability of Websites’ Opt-Out and Data Deletion Choices. In CHI’20: ACM
CHI Conference on Human Factors in Computing Systems.
[22]
Hana Habib, Yixin Zou, Aditi Jannu, Neha Sridhar, Chelse Swoopes, Alessandro
Acquisti, Lorrie Faith Cranor, Norman Sadeh, and Florian Schaub. 2019. An em-
pirical analysis of data deletion and opt-out choices on 150 websites. In Fifteenth
Symposium on Usable Privacy and Security.
[23]
Hamza Harkous, Kassem Fawaz, Rémi Lebret, Florian Schaub, Kang G Shin, and
Karl Aberer. 2018. Polisis: Automated Analysis and Presentation of Privacy
Policies Using Deep Learning. arXiv preprint arXiv:1802.02561 (2018).
[24]
Jovanni Hernandez, Akshay Jagadeesh, and Jonathan Mayer. 2011. Tracking the
Trackers: The AdChoices Icon. (2011). http://cyberlaw.stanford.edu/blog/2011/
08/tracking-trackers-adchoices-icon.
[25]
Alex Holub, Pietro Perona, and Michael C Burl. 2008. Entropy-based active
learning for object recognition. In 2008 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition Workshops. IEEE, 1–8.
[26]
IAB Europe. 2011. EU Framework for Online Behavioural Advertising. (April
2011). https://www.edaa.eu/wp-content/uploads/2012/10/2013-11-11-IAB-
Europe-OBA-Framework_.pdf.
[27]
IAB Europe. 2019. GDPR Transparency and Consent Framework. (2019). https:
//iabtechlab.com/standards/gdpr-transparency-and-consent-framework/.
[28]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag
of Tricks for Ecient Text Classication. In Proceedings of the 15th Conference of
the European Chapter of the Association for Computational Linguistics: Volume 2,
Short Papers. Association for Computational Linguistics, 427–431.
[29]
Hyejin Kim and Jisu Huh. 2017. Perceived Relevance and Privacy Concern
Regarding Online Behavioral Advertising (OBA) and Their Role in Consumer
Responses. Journal of Current Issues & Research in Advertising 38, 1 (2017), 92–105.
[30]
Saranga Komanduri, Richard Shay, Greg Norcie, and Blase Ur. 2011. AdChoices?
Compliance with Online Behavioral Advertising Notice and Choice Requirements.
A Journal of Law and Policy for the Information Society 7 (2011).
[31]
Vinayshekhar Bannihatti Kumar, Abhilasha Ravichander, Peter Story, and Nor-
man Sadeh. 2019. Quantifying the eect of in-domain distributed word repre-
sentations: A study of privacy policies. In AAAI Spring Symposium on Privacy-
Enhancing Articial Intelligence and Language Technologies.
[32]
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim,
Chan Ho So, and Jaewoo Kang. 2019. Biobert: pre-trained biomedical language
representation model for biomedical text mining. arXiv preprint arXiv:1901.08746
(2019).
[33]
Pedro Leon, Blase Ur, Richard Shay, Yang Wang, Rebecca Balebako, and Lorrie
Cranor. 2012. Why Johnny can’t opt out: a usability evaluation of tools to limit
online behavioral advertising. In Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems. ACM, 589–598.
[34]
Frederick Liu, Shomir Wilson, Peter Story, Sebastian Zimmeck, and Norman
Sadeh. 2018. Towards Automatic Classication of Privacy Policy Text. (2018).
[35]
Larry M Manevitz and Malik Yousef. 2001. One-class SVMs for document classi-
cation. Journal of machine Learning research 2, Dec (2001), 139–154.
[36]
F Marotta-Wurgler. 2015. Does "notice and choice" disclosure regulation
work? An empirical study of privacy policies. Michigan Law: Law and
Economics Workshop (2015). https://www.law.umich.edu/centersandprograms/
lawandeconomics/workshops/Documents/Paper13.Marotta-Wurgler.Does%
20Notice%20and%20Choice%20Disclosure%20Work.pdf
[37]
Arunesh Mathur, Jessica Vitak, Arvind Narayanan, and Marshini Chetty. 2018.
Characterizing the use of browser-based blocking extensions to prevent online
tracking. In Proceedings of the Symposium on Usable Privacy and Security (SOUPS).
[38]
Aleecia M. McDonald and Lorrie F. Cranor. 2008. The Cost of Reading Privacy
Policies. I/S: A Journal of Law and Policy for the Information Society 4, 3 (2008),
540–565.
[39]
Aleecia M McDonald and Lorrie Faith Cranor. 2010. Americans’ Attitudes About
Internet Behavioral Advertising Practices. In Proceedings of the Workshop on
Privacy in the Electronic Society (WPES).
[40]
William Melicher, Mahmood Sharif, Joshua Tan, Lujo Bauer, Mihai Christodor-
escu, and Pedro Giovanni Leon. 2016. (Do Not) Track Me Sometimes: Users’
Contextual Preferences for Web Tracking. Proceedings on Privacy Enhancing
Technologies 2016, 2 (2016), 135–154.
[41]
Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Ar-
mand Joulin. 2017. Advances in pre-training distributed word representations.
arXiv preprint arXiv:1712.09405 (2017).
[42] Mozilla. 2019. Geckodriver. https://github.com/mozilla/geckodriver. (2019).
[43]
Kanthashree Mysore Sathyendra, Shomir Wilson, Florian Schaub, Sebastian
Zimmeck, and Norman Sadeh. 2017. Identifying the Provision of Choices in
Privacy Policy Text. In Proceedings of the 2017 Conference on Empirical Methods
in Natural Language Processing. Association for Computational Linguistics, 2774–
2779. https://doi.org/10.18653/v1/D17-1294
[44]
Network Advertising Initiative. 2018. NAI Code of Conduct. (2018). https:
//www.networkadvertising.org/sites/default/les/nai_code2018.pdf.
[45]
Network Advertising Initiative. 2019. Opt Out of Interested-Based Advertising.
(2019). http://optout.networkadvertising.org/.
[46]
Jerey Pennington, Richard Socher, and Christopher Manning. 2014. Glove:
Global vectors for word representation. In Proceedings of the 2014 conference on
empirical methods in natural language processing (EMNLP). 1532–1543.
[47]
Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher
Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word
representations. arXiv preprint arXiv:1802.05365 (2018).
[48]
Postlight Labs, LLC. 2019. Mercury Web Parser. https://mercury.postlight.com/
web-parser/. (2019).
[49]
Usable Privacy Policy Project. 2017. Usable Privacy Policy project website. https:
//usableprivacy.org/. (2017).
[50]
Enric Pujol, Oliver Hohlfeld, and Anja Feldmann. 2015. Annoyed Users: Ads
and Ad-Block Usage in the Wild. In Proceedings of the Internet Measurement
Conference.
[51]
Joel R Reidenberg, Travis Breaux, Lorrie Faith Cranor, Brian French, Amanda
Grannis, James T Graves, Fei Liu, Aleecia McDonald, Thomas B Norton, Ro-
han Ramanath, N. Cameron Russell, Norman Sadeh, and Florian Schaub. 2015.