Machine learning researcher and artist shares feminist data privacy practices
A Q&A with Caroline Sinders
We recently talked to Caroline Sinders, a machine learning researcher and artist examining the technology impact of digital conversations across politics, AI and more. She’s a former fellow at Harvard Kennedy School, the Mozilla Foundation, Yerba Buena Center for the Arts, Eyebeam, STUDIO for Creative Inquiry, and the International Center of Photography. Her work has been supported by the Ford Foundation, Omidyar Network, the Open Technology Fund and the Knight Foundation.
Q: We’ve recently expanded our mission at Story Changes Culture from elevating the stories of women across tech and media to including research about critical technology issues affecting women. We’re working now on the ethical development of AI and data privacy. Given your experiences, what’s your advice for women who want to ensure their data isn’t exploited to directly or indirectly harm them or others?
Oh, gosh, what a really big and important question. I think it depends on the kinds of apps you use and the kind of data settings you have and also the country you live in. For example, I'm answering these questions based in the United Kingdom. Before I lived here, I lived in the EU and the European Union in Germany. But I'm an American. I grew up in the US (I spent most of my life in the United States). The US has less data protection than the EU and the UK. In the UK, there are data protection laws, and there are privacy laws that we just don't have in the United States. So one of the big things to kind of keep in mind is that there may not be things we can do as individuals. We need much bigger systemic changes. And some of those systemic changes are demanding there be more transparency from large platforms like Google, Facebook, Amazon, etc, around how they use our data and how they are targeting us, and how they're misrepresenting or misunderstanding our data.
One thing I think is really important is to support investigative journalists who are looking into this. One thing you can do is donate to The Markup. Another thing you can do is look at your phone and check all of the default settings. Do you have your location services turned on? Look at software and look at how much stuff is already automatically turned on so you’re sharing as much as possible. I would argue to start changing that. Don't let certain apps track your location. Don't share your location on images. Some of these that I'm referring to are things we can think of as digital hygiene, for privacy to be safer online.
We also live in a system of surveillance capitalism so no matter our best practices we just live in it. I think we should fight that system. And this is where you can elect politicians who are trying to hold these companies to task and you can support civil society organizations like my organization Convocation and you can support the Mozilla Foundation and The Markup.\
Q: Can you share an example of the misuse of AI as it relates to women and an example of the socially responsible use of AI as it relates to women? We’d like to illustrate to our community how AI can be used for both good and bad purposes so we might be inspired to get involved in the right kinds of work and collaborations.
Generally, any time you use AI, it's making different kinds of assumptions about you. There's been an amazing project done by Joy Buolamwini called Gender Shades. The project was looking at how facial recognition understands or misunderstands race and gender. What Joy found was that state-of-the-art facial recognition systems from Google and Amazon and IBM, etc. had a really hard time understanding race and gender. It had a much easier time recognizing white male faces and a much harder time recognizing female faces and female faces of color. I think it's really important to think about how so many different systems around us are using facial recognition, and the software is proven to be faulty.
Another thing to think about is, how do people capture data about women? I want to pause and say, a lot of the way we capture data about gender is extraordinarily binary and that’s already a form of bias. Gender is not binary. There are so many different kinds of gender. So already the systems we exist in are using data to paint a picture of the world. How they group that data, how they bucket that data, and how they structure that data is a problem. If a system is trying to collect information about gender, and it only has two gender options, that's already a bias, but also, more importantly, to think about who makes these systems and what these systems are being used for. A lot of times these systems are made by white men, so that already then has encoded bias in it.
Q: You include on your website the concept of “thoughtfully making machine learning.” Can you elaborate on what that means?
What that means is, how we can insert human rights into machine learning. So instead of saying, “Oh, let's just randomly use AI” or, “Oh, I'm making a project and it's about technology and I want it to improve,” llet's say a community is maybe improving their street by ensuring they are paved. It is important to know if they are building it with the community. Is it something the community was asking for?
Sometimes in making technology, every nail looks like it fits your hammer. Every problem looks like a nail for your hammer, but that's not necessarily true. So in this case, if someone were to come to me and say, “I want to use AI to study and understand improvements for unpaved streets,” the first thing I would ask is if they have interviewed any of the people that live on the street and if they’ve figured out other underlying issues. That's what I mean by thoughtfully making with machine learning. It's trying to center a community's needs and values and advocating for them.
Q: What is the Feminist Data Set project and how does it work? How is it being used today? How can people get involved?
Feminist data set is a multi-year project where I'm using intersectional feminism as an investigatory framework to understand and critique machine learning. It's saying with every step of the machine learning pipeline, is this a form of feminist technology? With data collection, for example, when we collect data, instead of just randomly scraping data, we have a series of workshops where we engage with people and we say how would we structure our data set to reflect intersectional feminist values? How do we find feminist data? For example, the data set is all text-based and it can be any text. We've removed citation requirements, meaning text that is submitted doesn't have to be published by particular publishing bodies since women are published and under-cited, but it also means we think about the structure of the text.
So with an article on income inequality, if that article states that men are paid less than women, that's not intersectional and it can't be in the data set. But if an article talks about how black women, indigenous women, Latina women and trans women are all paid different amounts, that can be in the data set. The project is really a speculative research project, but it's sort of looking at every step and saying, with data, what's a feminist way to collect data? What's a feminist way to store that data? And then what's the manifestation of it?
Anyone can get involved in any way, shape or form. We hold workshops pretty frequently, so please come and enjoy one of our workshops
Q: The first episode we produced for our docuseries the Chasing Grace Project was focused on the wage gap. We understand your Technically Responsible Knowledge Wage Calculator has been primarily used to understand wages for task-based work to inform AI systems for large companies. Has it been used to better understand and/or address the gender wage gap? If so, please tell us more!
The TRK (Technically Responsible Knowledge) wage calculator doesn't address the gender wage gap but it does address generally a wage gap. The thing to keep in mind about the technically responsible knowledge wage calculator is that it's there to confront gig economy pricing and it’s there to try to create equitable pricing inside of the gig economy.
The TRK wage calculator is looking at what's called microservice platforms. Microservice platforms are platforms like Amazon's Mechanical Turk, where people can sign up for an account and they can do a variety of really short tasks like maybe it's filling out a survey, maybe it's labeling an image, maybe it’s labeling a receipt. This kind of labeling is used inside of AI. It's the backbone of a lot of AI systems. Mechanical Turk is a reference to that and to the 18th century Mechanical Turk, which is an automaton. So Amazon's Mechanical Turk system is a nod to that and it's very similar in the sense that it’s humans doing this kind of blunt force, automated action of labeling hundreds of images.
A lot of different researchers have been looking at microservice platforms that are using AI and what they found is that people are radically underpaid. A few places have found that the amount of money people sometimes make on Mechanical Turk can be things like between $7 to $2 an hour. So when we were investigating Mechanical Turk for the Feminist Data Set, a big thing we were thinking about was how people should work when they're labeling a data set. How would that feel? How do you make that equitable? If you're going to have a space that allows for all different kinds of workers, then your work style needs to reflect that.
If you play with the TRK wage calculator, it's a slider and it is taking time into account. One of the things we realized from interviewing people who put projects on Mechanical Turk is they thought they were pricing things equitably, but they weren't ensuring how long it took to do something. Meaning, let's say, I tell you, I'll pay you 20 Euro if you can help me move. Well, if it takes you five minutes to help me that's a great deal. If it takes you six hours, that's not a good deal at all.
Q: Can you tell us a little bit about your story? What sent you on a journey to merge art and technology? How did you make the transition from photography to working on machine learning and natural language processing at IBM? And what about that experience led you to build and design all the things?
I always really liked technology. I graduated high school in 2006 and I didn't realize at the time you could get a job working in technology. My dream job was to research Live Journal and work for Live Journal. I wanted to figure out new modes of behavior that people were engaging in and I didn't know that was a job, but I knew that being a photographer was a job. And I was good at photography, so that's why I chose that.
When I got into undergrad, that's when I realized there were all these different forms of research I could be doing related to technology, and I still really liked photography but I was also interested in how technology was affecting photography and this was pre-Instagram. I was just really interested in how technological innovations affect photography as a medium and how that then affects us as an audience.
I then got a master's at the Interactive Telecommunications Program, which was in the same building and the same schools as my undergrad photo program and I knew they were very interesting, very experimental technology space that also was thinking about things like the future of imaging. And when I was studying there, I became interested in user research. I was interested in how technologists were dealing with cutting-edge technology.
I worked in advertising for a year after I graduated, and then I got headhunted to join IBM Watson. When I joined IBM, I had already spent a few years studying feminist uses of social media, so I was interested in technology, actually like human conversation. So I transitioned slightly from photographs to then thinking about what's the future of human community spaces, and that involves text and conversation. When I joined IBM Watson, the head of Watson design at that time, Joe Meersman, asked me what are the things I was interested in. I said I was interested in human conversation, and I was interested in the future of conversation, and I think I eventually want to get a Ph.D. related to this. He said, “Great, instead of you doing UX, we're going to put you into design research and you're going to work on natural language processing.” It was this fantastic man, Joe, who was the head of Watson Design who really helped put me on my path and I'm really, eternally grateful for that.
I think of myself at times as a translator of technology. That's why with Convocation, we talk a lot about making complexities humanly readable and making it thoughtfully with machine learning. I'm very interested in how you reinsert humans and humanity and human rights and equity back into the technology process and I think a lot of that is making the systems very legible. That means looking at all the different components of how you build something and why you build it that way and then at times, trying to propose new interventions and new ways to make things. I think that that is very much a part of this, you know, creating more human-centered and more human rights center-design.