Ways with Words | Big Data || Radcliffe Institute


-Welcome back. So we have two panels
this afternoon– one on public discourse
and this one on big data. So I’ll ask the
panelists to come up. And I’m going to introduce the
moderator, Rebecca Lemov, who is a professor in the Department
of the History of Science, and she’s also the author of a
book called Database of Dreams, which is about using early
database technology to try to construct a large database
of social, psychological, anthropological data,
and is currently working on history of course of
interrogation and brainwashing, which sounds fascinating. So I’ll turn it over to Rebecca. -Thank you for the
introduction, and thanks for the chance to be here. I really enjoyed the day so
far, and I think our panel now will offer a shift
in perspective and also a shift in methodology. So I’ll try to frame that. Let’s see. So we live in what is sometimes
called the petabyte era, and this pronouncement
in turn causes much discussion of
the sheer size of data that is being produced
as well as the rapidity with which that’s happening. And I’m going to make
this statistic up, but you hear things
like more data is being created in
the last 10 years than in all of human
history combined– or maybe in the last 10 minutes. We’re not quite sure. But certainly it’s mounting up. And what is often forgotten, I
think, or temporarily put aside in such excited and
excitable discussions is how much of this
newly created stuff is made of and out of
personal data, the almost literal mining, tracking, and
harvesting of subjectivity of people’s lives,
in other words. And I think that point was
made earlier by Janet Mock in the keynote discussion,
where she said, “Our phones are extensions of ourselves. We carry ourselves on them.” And so this panel,
in a sense, looks at what happens when that
data circulates and is amassed in large amounts
or studied in other ways, and what we can
learn when it becomes the topic of social science. And so when introducing
the topic of big data, I often start with the 1973
Charlton Heston classic Soylent Green, which is set in the
sci-fi dystopian future of 2022 in case any of you
haven’t seen it recently, in which pollution,
overpopulation, depleted resources, dying oceans, and
assisted suicide are the norm, and the rations are this
stuff called soylent green. And, of course, the
famous tag line– I hope I’m not ruining
it for anyone– is that soylent green
turns out to be people. But you could also argue
that big data is people in a, perhaps, suggestive way. And today in the press,
in the business world and scholarly
journals, the question arises of what is
unique about big data? Is it hype? Will the phrase even
exist in 5 years? Often definitions of big
data are strangely circular, as in this one from the
Columbia Journalism Review– “Big data is a
catchall label that describes a new way of
understanding the world,” which basically boils down to big
data is data that is very big. I teach a whole
class on the history and present and
future of big data, so ambiguity and the
hype of it, actually, is quite useful for
discussing a range of topics about basically how technology
is changing our lives, and in that spirit
that’s one of the reasons we have titled this
panel Big Data. Many resort to
definitions of big data that stress the three V’s
which are volume, variety, and veracity. And these, in fact, were
coined in 2001 by an industry analyst who was
trying to describe the problems with big
data, not necessarily its revolutionary
properties, but they’ve gone on to become the very
definition of big data. Others talk about its
transformational properties, and in a bold piece
for Wired magazine, the technology evangelist
Chris Anderson claimed the end of theory
had been reached and that so much data now exists
that it’s unnecessary to build a hypothesis to
test scientifically, but rather the data can,
if you properly handle it, speak for itself. And so I think we have
a couple data scientists and an ethnographer of
identity in social media today who will speak to some
of these questions. But I thought I would
start by just reminding us that some of these
qualities of big data that are important
to keep in mind are precisely
where it comes from and its quality of being
generated continuously. So if you think about things
like the continuous recording of retail purchases,
digital devices that record the history
of their own use, like mobile phones or
the logging of banking or clickstream data,
recording people’s navigations through websites
or apps, or the way that we do leave a stream in our online
interactions of comments, of likes and clicks. For example, in 2012
Walmart was generating more than 2.5 petabytes
of data relating to more than 1 million customer
transactions every hour. And Facebook reported
a couple years ago that it was processing 2.5
billion pieces of content, including language
links, comments, and likes– which
have now proliferated into those other
little icons– per day. And people upload 300
million photos per day. So today we have a chance to
listen to three researchers who are, in a sense, in the
trenches with this massive proliferation, and
trying to figure out what it means for
gender, in particular. The three people
we’re going to hear, I think it is fair
to say, have not taken for granted the personal
and gender implications of these new sources of data. And I should add each
is unique in how exactly they approach their subject,
but what they have in common is a deep concern with using
the unique properties of these burgeoning
data sources to study language’s formative
role in gender identity– both its maintenance
and reinforcement, and its deconstruction. So we’ll see how
gender identity can be both reinforced and
perhaps maintained, and also deconstructed. And so our first
participant is Ben Hookway. He is the CEO of
Relative Insight, which is a UK-based
marketing analysis firm. There are longer
bios in your program. He has a really
interesting background, which I just thought I would
mention, in law enforcement, pioneering the use
of language analysis to identify sexual predators. And more recently, his firm
is moving towards marketing. He also has a background
in several technology-based businesses such as Vidiactive,
Next Device, Critical Path in the UK and the USA. So I’m looking forward to
hearing about his work. -So normally I present this
to venture capital firms and big brands and
marketers and agencies and so on, so it’s
probably fair to say you’re not my normal crowd. But I’m just going to tell you
what we do, where we came from, and since we’re
all about data, I’m going to show you
some data, as well. So I– the language in
America are dear to my heart. I did actually used
to live in the– I’m from Scotland originally. And I moved to the
States when I was 24, and at my first ever dinner
party, one of the guests asked me at what age do
you start getting taught English in Scottish schools? So I said, well,
8, but I’ve only just become fluent which is
why I still have an accent. The accent’s been
kicked out of me somewhere after staying
6 years in the states and doing various things
and then go back to the UK. So now I’m CEO of
Relative Insight, and Relative Insight is an
advanced language analytics business. And essentially what we do
is specialize in comparing sets of language. So we do two things– we
turn this subjective thing, this unstructured thing called
language– which people tend to be quite emotive about
and have high expectations of– into data. But then the second
critical thing we do is that we compare
sets of data, and we specialize in looking at
the differences between sets of language. And this is important
because it’s the differences which
tend to be important, not the language itself. So we’re actually
originally a collaboration between government agencies
in the UK and academia, and we were developed
to catch bad people. And one of the examples
of catching bad people is online child protection. So in an open chat
forum, we can tell if you’re a 12-year-old
girl or if you’re a 30-year-old man doing a
very sophisticated impression of a 12-year-old girl. Now, the way we do that is by
taking known 12-year-old girl language, taking suspected
language, comparing them. And the two sets might
be 99% identical, but that’s not what
you’re interested in. What you’re interested
in is the 1% difference that’s significant. And unfortunately, if an
offender’s undertaken that kind behavior, very often they’ll
have more than one persona in a chat room, so they
manipulate conversation by pretending to
be multiple people and trying to direct
behavior and so on. And we can detect when
that’s happening, as well. So we do all that good
stuff, but we make our money, actually, by using this
technology in brands. So we help brands
sell you all stuff because you will respond
better to messages if it’s in your own vernacular, and
in the style of language that you tend to use. The trick, though,
is how to find out what that language
style is if you don’t know what you’re looking for. And so by comparing
two given sets of data, understanding the
critical differences, then you can hone in on
how different people speak. So it’s worth talking about. Earlier on in the sessions
this morning there was some discussion about
channels of communication, and Twitter was
brought up a lot. So we do analyze Twitter,
and we do analyze Facebook language a lot. However, we also
spend a lot of time analyzing forums
and review sites. So when you’re looking at
language as a data source, you’ve got to consider
the source of the data. And what we find is social,
and Twitter, especially– although it’s easy to
get because it’s API accessible and so on–
it’s very, very volatile, and it’s not necessarily a
very good medium or long-term reflection of what’s really
going on in the world. Forums, by contrast, tend to be
a lot more thoughtful, there’s more room to write, people
aren’t reacting or re-Tweeting anything. They’re giving a
considered opinion, so we spend quite a
lot of time on there. And we even do things like
call center transcripts, we do employee opinion surveys. So rather than having
to tick boxes 1 to 5, are you highly satisfied
or dissatisfied? You can actually write
free-form comments and this can get analyzed
and tracked over time, and so on and so on. So when we talk about
language, what do we mean? Well, we mean two things,
really– the topics people talk about, but also style. So are they emotive? Are they particularly
expressive? Are they factual? Are there particular
language markers that they tend to use that
if you’re not in that group you wouldn’t know? So for example, in the UK,
one of the big mother’s forums is called Mumsnet. And so if you did
some analysis on that and you look at
the language, you see things like DD, DS,
DH coming up all the time. And you just kind of
ignore these normally, but only by comparing
language sets and seeing these things appear as
something significant do you realize they’re
probably pretty important. It turns out they stand for
darling husband, darling son, darling daughter,
and so on and so on. And this represents the cultural
code of that group of people. So if you’re trying to
engage with these people and you don’t talk like them,
you’re instantly exposed. A really good
example of this was we work with Microsoft
Mobile, as well, and they launched a phone
which had a great camera in it. Part of the ad campaign
was to DSLR enthusiasts, proper photographers. So use this phone and
take amazing pictures, was kind of the slogan. But if you do a comparison
between DSLR enthusiasts talk and how smartphone
review people talk, you find, actually, that DSLR
people don’t take pictures. What they do is shoot images. And if you don’t
know this, you end up with this what we call
Dad at the disco effect. So most people have probably
experienced– and I’m a Dad and I’ve done disco dancing–
that technically they’re doing all the right things,
but it’s just not cool, right? For this kind of unknown reason. And language has
the same effect. So that’s the kind
of stuff we do. This is just who we
sell our stuff to. So once you get it in
your head that language is nothing but data and the
best results come from comparing sets of data, you
can compare language across lots of really
interesting axes. So two groups of
mothers forums in the UK talk about the same topic
in very different ways. People will type and express
themselves differently on an iPhone from an
iPad from a desktop. Tracking language over time
is fascinating– everything from how they’re
talking about your brand before the ad campaign compared
with after the ad campaign. The delta tells you the real
effect of the ad campaign, not what the ad agency tells you
the effect of the ad campaign is. But equally, forums have
been around so long now that you can go back in time
and look at societal change. So we’ve done
projects, for example, on attitudes to facial
cosmetic surgery have changed over the last 8 years. And you can tell this by
looking at the language and tracking it. Obviously different people
in different locations, how different brands
speak to each other, and so on and so on. But increasingly we’re moving
into the world of beyond conventional demographics. This is the other key thing
we’re doing at the moment. So demographics are
a traditional way for marketing people
to use a proxy to identify an individual. And they’re really out of date. I’m 43 and I live in
Manchester in England, and I’ve got a house, and
I’ve got a car, and so what? People hold multiple personas. I hold multiple personas
during the day, in the morning, in the evening,
when I’m working, when I’m doing presentations,
and so on and so on. And so what we do is model
people with a commonality– either they’re all
a member of a forum, or everyone who follows
a particular music artist on Twitter. . You can grab these, all this
language together, and pull it together and that’s a much
better way of targeting people. So that’s what we do. And I get to talk about
politics because I’m just a bemused observer of
your political process at the moment. Although I was in the states
when Bush Jr. got elected, so I’ve sort of
lived through it, but I do have the right, I
feel, because I live in the UK to mock you. So we’re going to
have a little quiz. What I’m showing you
here is candidate A, and this is candidate
A’s Twitter stream, and this is how they tend
to talk on Twitter compared with another candidate. So this is what
this candidate tends to over index on
compared to candidate B. So they tend to use social media
mentions a lot, personal names, so they name people
in their Tweets, evaluation– good,
judgment of appearance– beautiful, so that’s beautiful,
awesome, gorgeous, and so on and so on. They’re degree boosters, so
they’re quite expressive, and so on and so on. This doesn’t mean the
other candidate doesn’t use these attributes,
it just means that this candidate uses them
consistently and statistically significantly more. This is candidate A
compared to candidate C. And the green dots are
basically the same attributes which keep on appearing. So the point being
it doesn’t matter who you compare
this candidate with, they have these
significant attributes. Who wants to guess
who candidate A is? It is Trump. And if you were a marketer
and you were looking at this, you’d say this is great–
personal names, positive. You don’t know anything
else about their persona, but linguistically
this is a good brand. Candidate D– same thing. What candidate D says more
than candidate E. Sorry, just one more thing– this
is Trump against Sanders. That’s Trump against Clinton. This is what candidate
D says– same principle. And again, same
themes keep coming up. So this particular
candidate continually keeps over indexing on certain
things, having to do stuff, discussing conflict,
mentioning family, health care, gun control, and
so on and so on. So guess who this is. That’s Clinton. The point is Hillary Clinton
sounds like Hillary Clinton no matter who you compare it to. You can compare it
to Carly Fiorina, and there’s no real difference
between comparing her to Carly Fiorina or comparing
her to Donald Trump. She’s continually over
indexing on these things. Point being, from a pure
linguistic, blind point of view in the presidential race,
gender has got no effect at all on the language. So any time gender
comes up, it’s because of other
societal aspects. There we go. That was all I was going to say. I’m sure we’ll get
questions about that later, but there we go. -Thanks, that was wonderful. Our next speaker
is Lyle Ungar who’s a professor and the Graduate
Group Chair of Computer and Information Science at
University of Pennsylvania. He also teaches in the
Psychology Department. And he teaches an
array of courses from cognitive science to
artificial intelligence to machine learning and
data mining, crowdsourcing, management of
technology, and he was mentioning developing a course
on the singularity, which sounds enticing. -So we heard incredibly
powerful personal narratives this morning. This afternoon we’re
talking about what we can see in aggregate
looking at language. I work with a group
of psychologists and computer scientists called
the World Well-being Project, trying to measure
well-being of people through the language they use. The theory is that measuring
stuff helps to drive change. If you measure GDP,
one focuses on GDP. If we can measure
well-being– whatever that is, that’s a hard question–
we can try and define it. So people ask a lot of
questions, mostly very ill-formed– do women
talk more than men? But we try, in psychology, to
take these sloppy questions– what does it mean to be nicer
or be less assertive– make them precise, measure them, and
get away from abstract debates to concrete measurement. One concrete
example– not my work, but before we get there–
Matthias Mehl measured a bunch of men and women,
put little audio recorders on their lapels,
counted the words. Who uses more
words, men or women? Well, the top two, three
talkers in this group were men, the least talkers
were mostly men, too. On average, there’s
not a huge difference. OK, one can go back
and do similar studies in the office place. It is true by and large–
men interrupt women a lot more than
women interrupt men, although it’s a lot different
for senior female managers, at least in the tech industry
where it’s been measured. A lot of these things can
be measured and quantified. Are you being interrupted? Yeah, if you’re female, you are. We don’t need to
speculate anymore. Cool. So I’m interested not just
in these quantitative big pictures. I want to look at the words that
people use and understand them, and the wonderful
thing for me is that people used to sit
around the campfire talking and I couldn’t measure it
because I’m a measurement guy, but now they’re all Tweeting
and posting to Facebook and sharing it
with me, and I can monitor all their private or
semi-private conversations. So what I want to do today is to
talk about language and gender, and then language,
gender, and personality. Let’s see what we can measure. Now, luckily for
me, 70,000 people agreed, each individually,
to share their Facebook posts with me. They said yes, you may
download all the Facebook posts I’ve made. This is opt in, not stealing. They declared their gender
as male or female– sorry, this is 2 or 3 years old. This is before the new
modern Facebook more choices, so everyone in this
group has either declared male or female–
very old fashioned, I know. We have their age to
control for statistically, and they took a very
standard personality test. I’ll say in a second
what the personality things are, but things like
extroversion, introversion. Cool. So we can look at
what words differ. Now, before I show you
these horrible stereotypes, which are just reality–
you can like them or not. They’re like Trump–
they’re there. I can’t help it,
but I can at least measure it and look at it. What I’m going to
show you is the words that most differ
between men and women. On average, words like the, a,
it are used roughly equally, but there are words that differ. So let’s see how
these words differ. One of the words most
indicative of being female– oh, it’s embarrassing. First of all, note for a
word– “love you” is a word, “so happy,” “so” with four
O’s, five O’s– very typical of females. The old guys out there, you know
that little less than 3 means? Yeah, you do. You’re not an old guy. A heart, OK? And you see things like
“boyfriend,” and “I miss”– very cliche, sorry. What does it look
like to be male? Words most predictive of
being male– I’m a little bit embarrassed, since
I’m obviously male. Let me say it’s not all bad. Males do talk more about
engineering and politics, and interestingly, subtle
things– males talk more about “my wife,” whereas
females– and these are, again, averaged over everyone,
so this is predominantly heterosexual– females
talk more about “husband,” they talk about husband and
other people’s husbands, too. They’re a little more inclusive
in their discussion there. Cool. So I’m going to show
you a bunch of these. What I’ve done is
cluster together sets of words that tend to
co-occur, which is often a little more interpretable. So here’s the same set of female
words, but you can see topics. And I’ll call them topics–
their fancy technical term is Latent Dirichlet Allocation,
but don’t worry about it. It’s words that
show up together– “happy birthday,” “boyfriends,”
“cute baby,” “adorable.” And similarly for
males, I can look at the topics that
show up, Again, I want to point out that
males do on average talk more about government
than females. So this is what people are
distinctively talking about. How accurate is this? We can tell you the computer
and, given someone’s Facebook posts and the
words that they use and their self-declared
gender, were 92% accurate. We can also show these same
Facebook posts or Tweets to human raters and say,
is this a man or a woman? What do you think it is? Humans are pretty good at
looking at a Facebook post and say, that’s a
girl, that’s a boy, but they make some mistakes. One of the many
definitions of stereotype is some mistaken view. I can look at the
words that most correlate with incorrect
labels of people. So people who are judged
to be women but are men are using words like “my” and
“love” and “hair” and “baby” and “happy.” People who are, in fact,
women but judged as men use words like “Ebola,”
“research history,” and “state.” So what’s happening? Interestingly, the human
raters are picking up on correct statistical
trends in the sense that women do talk more
about love than men, and men do talk more about
research and state than women, but they’re overweighting them. They see someone talking about
sports or about politics, they go, crap, must be a guy. Wrong. It gives some
statistical indication that it might be a guy, but it
doesn’t mean it must be a guy. So it’s interesting to watch
these systematic errors that people make. I want to shift gears now
and talk about personality. We have 70,000 people,
these same people who shared their Facebook posts
and took a standard five-factor model, big five
personality measurement, and we can then look at the
words that correlate with each of these five factors. I’m going to talk
about two of them today, starting
with extroversion. What do extroverts talk about? And these are controlled
for age and sex, so this is as
balanced as I can get. And it’s pretty good. It’s “party,” and I
love the “cant wait.” These guys are in
such a hurry, they can’t even put the apostrophes
into with “cant,” right? This is amazing. This is fun. I wish I were an extrovert, but
most academics who do science tend to be a little more
on the introvert side. What do introverts, my
colleagues, look like? Apart from the angst, there’s
words like “apparently,” which are words of more
cognitive complexity, there’s more “reading”
and “drawing,” more books, more
zombies– I’m not sure. OK, so that’s
introversion, extroversion. We’re also going to talk
today about agreeableness. Agreeable Americans– these
are all American sampless– talk a lot about religion,
actually– awesome. Disagreeable people– OK, sorry. I should have given
a warning, but I assume everyone here is adult
and able to deal with these. So rather different
words of agreeableness, disagreeableness,
introversion, extroversion. We have these for
a bunch of these. We can predict your
personality about as accurately as your
Facebook friends can. So it’s not great, but it’s
a fair amount of signal. But we know for each
person whether they say they were a man or
a woman, and again I apologize for the
people who don’t want to pick either
of those categories, but this is again old Facebook. This is 2 and 1/2 years old. So what are we going to do? We’re going to look at a classic
circumplex– fancy word again– looking to characterize people
on degrees of how assertive they are, how
dominant or assertive, versus how submissive,
the vertical axis, and something that’s called
affiliation, connection, agreeable– it’s not quite
the same as agreeableness, but how much they tie to
other people versus don’t. So we’re going to put people
onto this piece there, and people sometimes do
intermediate versions of combinations of
these to say where they fit in the personality scale. But the important
thing for us is that we can do this
vertical dominance, and this horizontal
affiliation maps roughly onto the agreeableness
extroversion, with a slight rotation,
that I showed you before. So these are correlated
but not the same, so extroverts tend to be
dominant and a little bit warm. Introverts tend to
be the opposite. So we can look from
our personality and say something about– I
don’t know if women are nice or not, but I can see if they’re
warmer and more agreeable by their language. So we can map lots of topics. We have 2,000 Facebook topics. We can code them with how male
in blue or female in yellow they are, where they
fit in this piece there, we can look at all the words,
which we don’t have time to look through, but I want to
show you two dimensions of this in some detail. So along the bottom axis,
which is a little hard to read, is male to female. On the vertical axis is hostile
to warm, to grieve affiliation. And every dot is a word topic. I’ve shown a few of those. I hope they can be seen. So you can see, for
example, in the bottom left, all those swearing
are hostile males. Up in the right, “shopping,”
“Christmas,” “grocery” are warm females, but
“shopping” is not particularly an affiliative binding thing. Up top, “family,” “friends,”
“wonderful,” “blessed,” “amazing”– strong
affiliation, strong female. And we can see in
general, women do talk– and I think they actually
believe– with more connection on average. And again I have to emphasize
everyone is different. It’s like talking
more or talking less, this is averaged
over 70,000 people. But in terms of characterizing
a broader American population, you can see that
women on average do seem much more connective. Makes sense? The other dimension–
and this is not a surprise in psychology–
that’s been disputed more is assertiveness,
the positive version. Dominance may be less positive. Where do men and women sit
in terms of how much they tend to be assertive or not? And there the connection
is much weaker. What we find again in the
bottom– males on the left, females on the right. And assertiveness, it’s
statistically significant, but not very big– actually,
we find women slightly more assertive than men. Now again, one has to
say this is on Facebook. May be different in the office. I’d love to have
lots of recordings of office conversations. Some of the companies
I talk to are starting to do this and
analyze those discussions. But you can see, again, these
sort of systematic differences where very male and
very non-assertive is “computers,”
“programming,” “Photoshop.” Very male and very assertive
is, again, swearing. But sort of in the middle, this
“happy birthday,” “wishing,” “sister”– quite
general neutral. So we don’t have time
to look at all of these. If you go to
WWBP.org, our website, you can go and play with these
and look at different versions. It’s all up there to play with. But lots of insight to mull
over what our people are talking and how are they
talking about it? So what have we seen? Well, first of all, there
is systematic differences between males and
females and how they post on Facebook– not
surprising– and systematic errors that people
make in overweighting, for example, how likely
someone discussing sports is to be male. We’ve seen that words
show personality. The marketers love that, but
I love that also in the sense that I want to go and
customize hospital treatments. If you’re more conscientious
or less conscientious, you should probably be getting
different instructions, do things there. We’ve seen that assertiveness
is often misjudged. I haven’t given you
the literature on this, but a lot of controversy. But I think there’s often a fair
amount of assertiveness going by women, and if
you can quantify it, you get a better idea of what’s
happening or not happening. More broadly, I’m in the
business of building tools– tools for measuring things. And I’m the business
of selling– I don’t do it for money, give
away for free, all open source. I’m in the business of
selling these tools to help other researchers do them. Again, I want to
stress, for free. It’s relatively cheap and easy. I haven’t talked yet about
10 billion tweets, which I’ve got, which I
can map to US county and look at variations
in personality, in inequality, in racism
across county level piece or communities. It’s a way to get at what
people are thinking and feeling as they talk to
their friends or post to their blogs or the
discussion boards. I want to end by noting
that what we’ve done here is very basic– male or female. We’re now looking
at gender spectrum. People aren’t either talking
male or talking female. Everyone sits somewhere
on a spectrum, and maybe it’s even
two dimensions– there’s a degree of maleness
and a degree of femaleness to people’s language,
which is something we can explore when we have
these really large data sets. We can try to
understand– eventually, I hope– across very different
subgroups of self-definition– how their language varies. I think this would be a
great time going forward to get a picture in the
very large scale of what life looks like within
people’s very small worlds. And that I thank you. Thanks, that was a
really rich talk, and we’ll look forward
to some questions. But first we have
Alice Marwick, who is a social media researcher
based in New York City. She’s the Director
of the McGannon Center for
Communication Research and an assistant professor at
Fordham University’s Department of Communication
and Media Studies. Her first book Status
Update– Celebrity, Publicity, and Branding in the
Social Media Age came out with Yale a
couple of years ago, and is a really
fascinating ethnography of how people in the
San Francisco tech scene use social media to boost
their status and popularity, and how this affects
their offline lives. And her current
research has moved more towards the proliferation of
sexism and misogynistic speech online using such
topics as Gamergate and the Donglegate
Reddit controversies and online harassment as
places where she’s looking. So I look forward to this. -Thank you. It’s great to be here. It’s wonderful to be at such
an interdisciplinary event. I think that when we’re tackling
these large, weighty topics like language and gender,
it’s great to see people from across the spectrum
addressing them. I’m an ethnographer,
and so I deal in what we might call
small data– interviews and observations
which often can help us to contextualize
and problematize the findings of big data. For the last 3 years,
I’ve been thinking a lot about online
sexism, and today I’m going to give a short
overview of the visibility of online sexism, particularly
with regard to language, and discuss the implications. Social media like Twitter,
Reddit, and forums enable collective action
and participation, but they also enable a
new type of organized gamified harassment
that works to shut down the voices of women,
queer people, and people of color online. And I would like to give a
little bit of a trigger warning that there is some very sexist
speech in this presentation, and I think it’s
important to present it so you can see the magnitude of
what we’re dealing with online. So Zoe Quinn is a software
developer who became the target of an organized brigade
after her ex-boyfriend wrote a 10,000-word screed
about her on his blog. He claimed that she
had lied to him, that she cheated on
him with five guys, and most damagingly
he claimed that Quinn had slept with a writer for
Kotaku, a video game blog, to get favorable coverage for
her game Depression Quest. This was unequivocally
proven to be false, but under the pretense that they
were trying to reform ethics in video game journalism, a
group of gamers organizing through websites and
chat rooms inundated Ms. Quinn with thousands
of hateful messages. They distributed
nude photos of her, and her address and social
security number were revealed, and this is a process
called doxing, where people’s personal
information is revealed online for the purpose
of shaming and harassing. People called her parents,
they called her phone at all hours of the
night, they openly discussed raping
her, her weight, and the smell of her vagina. This brigade unified around
the Twitter hashtag Gamergate. As the harassment
escalated, Anita Sarkeesian, a feminist media critic
and favorite target of anti-feminist gamers
for several years, was also doxed and
forced to cancel an appearance at the University
of Utah due to a death threat. Another game
developer, Briana Wu, posted anti-Gamergate
memes on Tumblr and Twitter and also received death threats. And actress Felicia Day, a
longtime gaming advocate, wrote an emotional blog
post about the effects that Gamergate was having on her
ability to trust male gamers, and she was promptly
doxed for her trouble. But Gamergate is only
the latest kerfuffle in a series of well-documented
and often high-profile incidents in which
women have been openly harassed, stalked, or otherwise
systemically targeted online for political and
feminist speech, for criticizing the video game
or technology industry, or just for existing. These high-profile cases
are the most extreme of a general atmosphere
of sexist speech, gendered harassment, and
misogynistic comments in many places online. Moreover, and what I think is
most interesting about this, is that these brigades
are organized. Sarkeesian refers to these
groups as cyber mobs. They’re collectives which engage
in cooperative competition to increase the harm
to their victims. They reinforce their
social dominance over marginalized groups. And what this means
is that people use social media
to work together to learn personal information
about their targets. They gain status
in their community as they escalate the harassment. And in this way, online
harassment becomes a game. Those of us who have ever
read YouTube comments or waded into forums like 4chan
or even some parts of Reddit have seen sexist
terms, jokes like women should be in the kitchen or make
me a sandwich, or men simply calling women fat,
ugly, and slutty– the triumvirate of uncreative
but expected gendered insults. Now, of course, this is
not restricted to sexism. The comment sections
of blogs and websites are often rife with
racist, sexist, homophobic, transphobic, and just plain
mean and profane language. The harassment of women
of color and queer women is often compounded
by intersectionality. When I appeared on television
to talk about online sexism, the channel posted
the video on YouTube and the comments on the
channel were terrible. So I posted my favorite
comment– I respect women but I hate feminists. I believe that feminists
should be gang raped. And Helen Lewis, a
British journalist, posted what she
calls Lewis’s law– the comments on any
article about feminism justify feminism. And unfortunately,
Lewis’s law has proven to be very much
true in my own experience. Women in certain online
spaces frequently find themselves the
subject of attacks that use sexual violence
and gendered speech acts as intimidation. I have heard about this from
political writers, economists, feminist bloggers,
technologists, music writers, and journalists. One friend moved her
family to a different city after she was stalked based
on her feminist writing. The blog Feministing had
a dedicated FBI agent to deal with death threats,
and a science blogger I know has been stalked for 9 years
by someone who follows her on every social media platform. And as Laurie Penny, a financial
writer for The Independent, wrote, “An opinion, it
seems, is the short skirt of the internet. Having one and
flaunting it is somehow asking an amorphous mass of
almost entirely male keyboard bashers to tell you how
they’d like to rape, kill, and urinate on you.” So you might ask,
don’t both genders deal with harassment
and meanness online? And yes, this is
definitely true. And as you can see in this
graph from the Pew Research Center, which is a
nonprofit that researches online behavior,
both men and women experience online harassment,
meanness, and cruelty. But two interesting
things– first of all, you can see that the rates are
much higher for younger people, and this only looks
at people 18 and over, and so the youngest
groups of people are the ones who are
experiencing the most harassment. And if we had numbers
for 13- to 18-year-olds, we might also see that
they’re experiencing large amounts of harassment. And the other thing
I think that’s interesting about
this graph is if you look at the number of men versus
women who are being stalked or sexually harassed or that
the harassment is sustained, it’s greater for
women than for men. But besides harassment,
the normalization of misogynistic
language online is something that
I’m also following and that I’m concerned about. And scholars have pointed to
the emergence of a Manosphere as a central contributor to
the spread of online sexism. And the Manosphere– this is
the men’s rights Subreddit. So Reddit is a
discussion forum and it has a lot of subsections,
and each Subreddit is dedicated to a
different topic, and this one is dedicated
to men’s rights. And the Manosphere consists
of websites and message boards where men discuss
men’s rights, pick up artist techniques,
the feminist agenda, and in more extreme cases,
bond over misogynistic ideals. And the Southern
Poverty Law Center characterizes the
Manosphere as “an underworld of so-called men’s rights groups
and individuals on the internet which is just
fraught with really hardline, anti-women misogyny.” Now, the Manosphere, to a
certain extent, is diverse. It’s spread across different
websites and technologies from mainstream sites like
Reddit and Hacker News to more obscure corners
of the internet. But these separate
subcultures are homophilous In other words,
birds of a feather flock together. When people with
similar views spend most of their time
interacting with other people with the same views,
a polarizing effect happens in that their views
tend to become more extreme. Thus the development of these
male-centric communities may encourage sexist speech
and allow men to work together to systematically devalue
women’s contributions or even harass women. And I want to make a
note on demographics. The title of this talk
No Girls on the Internet refers to a maxim that’s thrown
around on male-centric sites like Reddit or in gamer
groups– oh, there’s no women on the internet. Every woman you meet on
the internet is just a guy pretending to be a woman. But actually we see in the
United States that about 86% of both men and women
are on the internet. There’s a gap in access in
terms of class and education, but there is not one
in terms of gender. But we also know that most
social media users are women. About 60% of Facebook
users are women. Most Twitter users are women. The same with Flickr, Tumblr,
Instagram, and Pinterest. On the other hand,
LinkedIn, 4chan, and Reddit are primarily used by men. Now, there may be a
participation gap. 87% of Wikipedia
contributors are male. Male students are more
likely than female students to create, edit, and
distribute digital video over YouTube or Facebook. But this doesn’t hold
true for content creation over Tumblr or Instagram. So if there are plenty of girls
and women on the internet, but they may not be hanging
out in the same spaces as men, this homophily may be
compounded by a relative absence of female contributions
in some spaces. And this combines
with what we really see as an emerging troll
culture on spaces like 4chan in which extreme racist, sexist,
homophobic, and ablest insults are used to insult
people, to shock them, and to produce a certain
type of affect in the people that these insults are aimed at. Now, what this means is that
egregiously sexist slurs online are becoming normalized,
and this is especially true for young people. This is a graphic
that I have love. MRA stands for men’s
rights activists, and surrounded by misandry. And much of the
Manosphere’s supporters are speaking about
sexism and gender extensively, but in
a bizarre world way. Now, don’t get me wrong. I think that the patriarchy
hurts men just as much as it does women,
and I think that men should talk about
masculinity and changes in masculinity and
things like that. But that’s not what’s
going on in these spaces. Let me give you an example
from another Subreddit, which is called The Red Pill, which
is a pick up artist community. And The Red Pill
summarizes itself as “discussion of sexual
strategy in a culture increasingly lacking a
positive identity for men.” And it refers to the
moment in the sci-fi movie The Matrix where the
protagonist, Neo, takes the red pill and has his eyes
opened to the dystopian reality of his world– what
1970s feminists would call consciousness-raising,
and what people today say means to be a “woke” person. To Red Pillers, the red
pill reveals the reality of a world where the deck
is stacked against men and feminized
behavior is rewarded. And I could spend the rest
of this lecture just talking about The Red Pill, but let
me give you an example of one of their theories. Women are very
emotional, and most women make their decisions based
on their emotional state at the time. The rationalization
hamster is an analogy for the thought processes used
by women to turn bad behavior and bad decisions
into acceptable ones to herself and her friends. When a woman makes
a bad decision, the hamster spins in its
wheel– the woman’s thinking– and creates some type
of acceptable reason for that bad decision. The crazier decision,
the faster the hamster must spin in order to
successfully rationalize away the insanity. So that’s the type
of deep thinking about gender and sexism that
is going on in these spaces. Now, Subreddits
like The Red Pill and groups like men’s
rights activists do not agree on
everything, but they do share a common language. They have been highly
successful in popularizing their own vocabulary including
terms like misandry– their word for feminism, the
male equivalent of misogyny– defined as dislike for, contempt
for, or ingrained prejudice against men, white
knighting, which is men who espouse feminist
views or speak up for women, and the idea is that
they’re doing so in order to appear heroic
and appeal to women, and social justice
warriors, which is a pejorative
term for anyone who engages in feminist, anti-racist
or pro-queer activism or discourse. And they often co-opt activist
and social justice rhetoric. So not only do we have an
increase in sexist slurs, we have an increase in
these rhetorical moves to discuss feminism using
intrinsically anti-feminist words like misandry. And you’ll see this
throughout the internet in a wide variety of
spaces, from Twitter to comments on tech blogs,
to places like Facebook. And this graph is
from Google Analytics and it shows how
misandry basically came out of nowhere before
2007 and has gradually increased in popularity in
terms of people Googling for it, talking about it, and
speaking about it. So I want to talk about
a couple of implications in my last few minutes. First, I want to
think about the idea that internet culture encourages
participation and creativity. Websites like 4chan or
the men’s rights Subreddit are generative communities
of online participation and activism. In many ways they are what we
might call social movements or communities of practice. They often resemble
organized fandom in their desire to learn
everything about and obsess over their targets, but they
are anti-fans in that they hate and despise their subjects. And I think we need
to stop categorizing creative participation as, by
definition, a positive thing. The second implication is the
chilling effects of harassment. Kathy Sierra, a prominent tech
blogger who wrote about user experience– hardly a
controversial topic– was so badly harassed
that she stopped blogging and stepped away from
her former career as a public speaker and
visible tech expert. She was discouraged
from earning a living due to online harassment. Now, this is very
understandable given how scary and problematic
harassment can be, but it also compounds
the problem. If women stop voicing opinions
in traditionally male realms like economics, politics, and
technology, the women who do so will continue to be
seen as outliers. But I think that
what is more likely is that rather than
stopping participating, women will move to the
more female friendly parts of the internet. Now, there’s nothing wrong
with fashion blogging, mommy blogging, or
Pinterest, but they tend to focus on very
traditional views of women and have a somewhat conservative
norm of femininity– looking pretty,
being a great cook, or making your own baby food. I interviewed fashion
bloggers during my postdoc and what they all said
to me is that the norms of their community was that
you must be nice online, that you couldn’t openly
criticize other people. So this means that the safe
corners of the internet may box women into
expressing themselves through very traditional, almost
stereotypically feminine ways. And as we all know,
women have plenty to offer the world that is
not about fashion, cooking, parenting, or homemaking. It should not be the case that
the only way women can reliably avoid online harassment is to
adhere to stereotypical gender roles. But I worry that that’s where
these instances are heading. So let’s think about solutions. Gendered harassment
and online sexism is a very complicated problem. It goes far beyond Gamergate
or even the technology industry itself. But unfortunately
there is as of yet no good legal or
policy solution. Internet harassment is
difficult to come about through legal means. There are issues of jurisdiction
and most importantly free speech. The police are
often unsympathetic or do not understand
the technical issues. They often tell victims to
simply stay off the internet. Moderation is very expensive. Companies like
Twitter are finally beginning to build tools
in to combat harassment, but they have been slow to
adopt proactive solutions. And I will say that
as companies build in things like being
able to flag posts or being able to turn people
in for various violations, those are frequently
used against women, against feminists, against queer
people as a form of harassment by these sort of brigades. Every day new online
communities and apps are launched with
the assumptions that their users will be good
actors, without any protection for those who may be harassed. Online sexism, however,
is not something that we merely have to tolerate. There are already many
women organizing themselves to combat it on blogs,
Tumblr, Twitter, and Reddit, whether by technical
means or support networks. If you are the victim of
harassment or sexist speech, I urge you to seek out social
support from other people going through similar things. And I also urge
everyone in this room to speak out against
sexist language when you see it online with
the goal of, if not eliminating online harassment
and sexism, making it less common, acceptable,
and part of the every day. Thank you. -Thank you. That’s an excellent
contribution. And now we’re going to
have about 15 or 20 minutes of discussion among
the panel, and maybe we can talk about the differences
in their approaches and aims for their research. And then we’ll have
time for some questions from the audience, and
we’ll set up a mic shortly. So I was really struck
by some commonalities among these presentations
in the sense that there’s a kind of
overwhelming feeling that a lot of data produced
from the online world, from the sources that
you’ve all worked with, tends to retrench or reinforce
preexisting stereotypes and norms, despite the rhetoric
we often hear about disruption, revolution, new forms arising. And so I wonder if you could
each comment on that paradox. -I think it goes
both directions. So on the one hand,
there is this huge weight of reinforcement
of common things because people form
these homophilic groups and do things. Other hand, we haven’t
talked about social change. And one thing we’ve
been looking at is tracking, for
example, changes in domestic partner violence
discussion which has changed. The names behind it have changed
many times in the last decade. You can see the use
of mainstream events that are reported in the
mainstream media being picked up by social media
and in turn driving policy. You can see this tremendous
groundswell of organizing. So I think that on the one hand,
there’s this reinforcement. On the other, hand,
it is a great medium for social activism,
and you can actually see public policy starting
to be affected more at the local level. It’s harder at the national
level to have this effect, but I live in Philadelphia,
reasonably big city, and it’s still the
case that organizers can get enough of a groundswell
to influence public policy. -So perhaps there’s
some backlash, but also evidence of– -Also organizing and
people mobilizing. -I think that it depends
which way you cut the data. Comparing your
assessment of how women tend to talk online compared
with men on Facebook. Interestingly, we
ran something– it’s very, very unusual
for a client of ours to ask for a gender split. That almost never happens,
which is a good thing. They’re more interested in the
persona in general of people. But we were asked to compare
self-validated Twitter– successful people on Twitter. So this is normally people
who in their bio on Twitter describe themselves
as successful, and then split it male female. And women talk about
politics and issues more than men in that group. And on the whole, actually,
women use more pronouns, they’re more conversational,
and so on and so on. So on the whole,
I think, men still kind of resort to
numbers, and sport, and all this kind of stuff. That’s generally true. But I think when you split away
from just the straight male female split and you split and
compare things in other ways, then I think other truths
revealed themselves. -One of the things I
think is most interesting about social media
is that it’s used by a very wide,
diverse group of people with a lot of
different opinions. And often this means
that conversations that have been going on for
years among different groups and communities get bubbled
up and other people can see those who aren’t
necessarily part or don’t understand the
context of this conversation. I think this speaks
to what Janet Mock was saying in her keynote about
the title of her memoir. What we see on
Twitter especially is that often there’ll
be a group like, say, young African Americans
talking about something with a particular
hashtag, and then that becomes visible to other
groups who don’t necessarily understand the context of that. And then you can get this harsh,
quick backlash against it. Like, you’ll see people making
a bunch of racist comments because a conversation is
visible to them that has always gone on, but they have
not been able to see it until social media
has made that visible. And so I think that
when we’re looking at how social media both
contributes to reinforcing and resisting these
norms, I think we can see that a lot social
activism, as Lyle said, has gone on through
social media. You just have to look at
the black lives matter hashtag, you have to look
at the contemporary moment in popular culture around
feminism, which is something that for, like, a
decade there was lots of feminist stuff going
on, but it really hadn’t made it into the mainstream. And it’s a pop
culture, and I think a lot of that has been due to
the work of young feminists on different platforms,
doing that work, and then that work being
able to disseminate into the mainstream, as well. So for all of my terrifying
rhetoric in my talk about how terrible these
clashes are, I think that there’s something
quite empowering, especially for
young people, to be able to have these conversations
and have them impact the mainstream in a way that
wasn’t possible before they had access to social
media technologies and other ways of broadcasting
their thoughts to a larger audience. -Yeah, I think what
strikes me is that there’s a kind of dynamic of this rapid
emergence of proliferation of voices. I love the comment
in our keynote, we can now broadcast
our lives, and this idea of multi-layering, the idea of
the proliferation of genders and even the way
they’re obviously reflected in Facebook. But at the same time there’s a
kind of retrenchment and fear that gets triggered by
these kinds of things, and so you see other
forces arising. And so this reinforces,
I guess, the idea that this is a very
rich area for research. And I wondered
what you all think are the advantages
and disadvantages of the methods you’ve chosen? I mean, Lyle and
Ben work largely with quantitative methods
and Alice with qualitative, and I wondered if you could
comment on the advantages and disadvantages of each. -We do actually both
qualitative and quantitative. I think the key thing is the
amount of data you can collect. I don’t care about
the number, but just get a flavor of
different subgroups– what are they
talking about when? What language are they using? What do they care about? To go out and run
interviews– really expensive. To do telephone calls,
surveys– Gallup recently finished
their millionth survey. I have 10 billion tweets. I can’t afford the
budget of Gallup, but I can run more
questionnaires than they can. I can look at changes at a finer
spatial and temporal resolution than they can. The world is changing from
a research standpoint– be able to pinpoint look at
what’s happening in Cambridge versus Boston over this debate. And I think this
level of exposure is, in fact,
tremendously powerful. -Yeah, I think we
have a similar view in that until
comparatively recently, you had the choice
between quant– that is, number-driven stuff. But in language on the
internet, that basically gave you something like
sentiment analysis, so is it 34% positive? A, it’s not particularly
accurate, and B, even if it was really accurate,
what does that even mean? The other way is with qual,
you get a lot of nuance, a lot of complex
answers and drilling in, but of course, you get to ask
20 people in a focus group who never tell you the truth anyway
because you’re asking them the question so you
bias the answer. So we are getting
to the point now where you can get qual-type
insights at quant scale. From our perspective,
doing comparisons is always the key thing. The hard bit is always what
comparisons do you want to run? Because there’s
so much to compare and so much to draw
out, it’s almost too powerful a technique. -As somebody who
mostly does, as I said, small data work in
that I interview people, I often spend large amounts
of time in the communities that I’m studying, I spend a
lot of time talking to people and trying to
understand how they make a meaning of their
world and their lives, I think one of the most
important issues in data analysis right now
is that of ethics. Just because you can collect
and look at something doesn’t mean that you should
collect and look at it. And a lot of colleagues–
well, not a lot. There’s been several
cases in my field in which, say somebody was doing
an ethnography of, like, Tumblr feminists or something,
and one of the research subjects posts a
Tumblr post saying, researchers are
not welcome here, I don’t want you looking at
my stuff, this is not for you, this is not your space. And we see lots of
indicators by which people are trying to target
their online communication to a particular audience. They’re using particular
linguistic codes, they’re marking their
speech in some way, saying it’s for a
particular peer group. And I have another
whole research area on online privacy, which is
one of my other main research areas, and what we see
is that people often– even if they know that
someone can see something that they post on Facebook
or Twitter or whatnot, they assume that
people will know what audience it’s for based
on the language that they use. And the problem with big
data research methods is that that’s all lost. It all gets scooped
up and analyzed and there’s no IRB,
Institutional Review Board, review usually needed
for big data whereas– -No, everything we do is IRB. -Oh, everything is IRB? -I have a full-time
person doing IRBs. Every person who shares their
Facebook consents both to me, something approved by IRB– -No, your work– 100%. You said you have seven– -Everthing is IRB. I can’t say how much
time I spend making sure that everybody signs off. -All right, I’m sorry. I was not meaning
to single the two gentlemen to the right of me. [INTERPOSING VOICES] -I know I sound defensive,
but it’s important. -There’s a lot of conferences
where people are presenting work that is not as rigorous
as Lyle’s in terms of the data collection they– -And most companies
don’t have IRBs– -And most companies
don’t have IRB. —is a much worse problem. -And one of the things that
we see that’s also problematic is that companies like Facebook
have enormous internal research departments and
they don’t publish a lot of their research, and we
don’t know what they’re doing, and we don’t know what
kind of experiments they’re doing on Facebook users. So first of all,
academics in general are at a disadvantage when
it comes to big social media organizations. And second of all,
there is this sense that there are
bodies of researchers who aren’t doing that kind of
ethical work that I’m doing, that Lyle is doing, that other
researchers around the country are doing. So I think these
are some of just the general ethical
questions that are brought up in the age of big
data, and I think we’re going to be
working those out in our various subfields
for years to come. They’re very hot
topics right now. -Just from my
experience teaching this class on big data, a
couple of the greatest areas of the growth of
the use of big data are in predictive policing,
so use by police forces to target certain
areas where crimes may occur based on
patterns and data, and then send more police
there or patrol differently or interview people they
believe at the age of 17 may commit a crime
when they’re 19 or things like that that used
to be reserved for dystopian fiction are actually programs
in at least five cities already underway. And also as Alice
indicated, there’s quite an unregulated
use of data. When you sign your user
agreement with Facebook, I believe you agree
to basically to become an experimental subject. And it’s interesting when you
look at the Facebook experiment that came out about 2 years
ago, which was published in a high-impact
journal, that this caused a huge uproar,
that it turned out this was quite a simple
experiment where it was shown that with something
like 33 million– many, many users, if you
change their feed, the news feed, if you move it
towards more positive stories, people would then start to
make more positive comments in their feed. And there was also
a negative result so you could apparently
emotionally engineer the Facebook audience. And this caused an outcry. But in fact, these
type of small changes are being tested all the time. It was more the fact
that it became public. So there’s actually an
increasing divide, perhaps, between the academic world
and the commercial world, or perhaps between data-driven
science and big data. So these are all areas
that are a lot going on, I guess we could say. One theme that I thought was
interesting that came out in all of your
presentations was– or I would be interested in
hearing your comments on this– was that there were
changes among younger users or in the data that you got
from younger people, or else as Alice said, that younger
people seem to experience greater levels of harassment. And I would imagine
that maybe also– or maybe you could
comment on this– was the language that younger
people using showing up as different? -Yes, I didn’t show it. We can tell within 5
years someone’s age from their Facebook language
extremely straightforward. Some of them are use
of things like pronouns or “the,” which old people
say “the,” putting spaces after periods, but down
to fine-grain versions. People keep talking about “lol”
here with the [INAUDIBLE], but in fact the current “lol”
is mostly League of Legends. You old people are still
laughing out loud from, like, the ’90s, was that? So there’s huge
changes happening, and several people
have alluded to it, but there’s lots of
in-group signaling. And it really is the case
that you ship people. It’s a verb, not a noun. These things have
shifted massively in the last couple of years,
and you can see them stratified. The 14-year-olds
and the 16-year-olds look really different. Also, they’re shifting
in terms of moving away from old, slow platforms like
Twitter to Snapchat, Instagram, much more communication with
images with small amounts of text attached. You old people still type words. That’s very retro. I’m in favor of it, I’m a
word guy, I use words, too. But the whole notion
of how communication occurs– visual,
streaming, instant. If you missed it, you’ll get
something else the next hour. Why are you going to store
at look at yesterday’s email? That’s so antique. So I think the style
of communication is changing even more
than the content, and I think it’s something
to look at– the notion of how does communication occur? How does bonding occur? I’m old enough to
remember people talking on telephones
to each other, very quaint, slow
communication, not texting in the middle of the conference
to see what you just saw. -Yeah, so maybe just
comments from Ben and Alice and then we’ll have
some questions. -I actually had to explain
to my 9-year-old son last week why it’s called “YLS”. By the way, David
Cameron thought it was “lots of love,” as
well, our Prime Minister. That shows how in touch the
UK is with the modern times. I don’t know about platforms,
but we’ve certainly seen significant differences
between the way younger age groups discuss a given
topic compared with olders. And there’s a unifying theme. So we did this for a
handset manufacturer, and if you look at the reviews
of mobile phone handsets by age group, the younger
you are the more expressive you tend to be, so it’s
beautiful, gorgeous, awesome, sleek– that kind of stuff. Younger people talk a lot more
about the quality of the camera being important, they talk a
lot more about the battery life being important. But my favorite, I think,
is overuse of the word “definitely” because
you only know everything when you’re young. And your opinions are far more
polarized the younger you are. And so it’s beautiful,
gorgeous, fantastic, and I definitely recommend it. And I won’t go into what
old people talk about. That’s too embarrassing. But definitely big differences. -I want to echo Lyle’s comments
about platforms like Snapchat. And one of the projects
I’m working on right now is on privacy and
socioeconomic status, and we’re interviewing
16-to-26-year-olds. And one of the things that
I find so interesting about the 16-, 17-, 18-year-olds is
that within every high school there are often apps
that– well, first of all, apps and mobile devices,
they’re very trendy, so the kids will all use an app
for a specific amount of time and then they’ll move
on to another app. And often that is very localized
within a group of friends or a high school, and kids will
share the same media ideologies around a particular
communication platform and then they’ll move
on to something else usually once their parents
discover that it exists. So it’s not even
solely generational. Communication is extraordinarily
contextual in every single way. It differs by age, by
community, by class position, by access– whether
you’re mostly accessing the
internet on your phone or whether you have
access to a laptop or not. All of these things
are quite different. And so this is one
of the nice things, again, about qualitative work
is that you can drill down into these very particular
ways, and rather than trying to make
generalizations, you can really understand how
communication technologies affect young people’s sense
of identity expression and their relationships
with other people. And what we find
is these effects are often quite profound. -Thank you. So we’d love to take some
questions from the audience. -Hi, my name is Ashley. I am a graduate
student in psychology, and I just want to
put the disclaimer that I am heavily entrenched
in qualitative research, so that’s a big bias. I’m currently
involved in a study. We use personal narratives
of HIV-positive women, ethnic minority. And if one were to
look at their language among these
100-something narratives, it would look really similar. They’re all from pretty much
the same socioeconomic status right outside of Chicago,
around the same age. But what we find is when we
look at these narratives, there are just vast
amounts of nuance, like you were speaking to,
and we’ve used this data– it’s longitudinal, so
we’ve used a lot of this to look at coping styles
in women and viral loads. And it has affected
policy change on a really effective level. And that’s a little
bit of a tangent, but I guess I just see
this really big danger and clumping this
together in almost like an essentialist
notion, and I see some possibility of
it moving social justice movements backwards. And this may be a knee-jerk
reaction to seeing women being associated with using the
word “shopping” so much, but when you’re just
looking at their language, you’re not getting
their backstory, you’re not getting
the demographics, you’re not getting
the nuance, it would be easy to come up with
essentialist notions of what this is looking like. So I guess I’m
just wondering how do you keep it from
moving us backwards in the feminist movement,
the queer movement, so on? -I think the big data does offer
a huge benefit, which I did not talk about today, in actually
splitting rather than lumping. So it is the case I look
at our 20-something year old, poor, African American–
they use Twitter really different than 50-year-old
white men or women who are more professional,
or blacks of that age. There’s huge differences. The thing is we
can actually look at detailed narratives of
people from Camden, New Jersey, different
parts of Philadelphia. We do collect,
actually, narratives. People volunteer and write
essays and share them with us. So I think it really is
important to contextualize, and I apologize if
I over-simplified. Really different to look
at work discussions, family discussions,
peer discussions. So I think different platforms–
all these things can be done. It’s really easy to just
average, and what do you get? You get straight,
you get middle-aged because that’s the average,
you get white because that’s the majority. So sure, if you
don’t split stuff, then you obscure
all the variation. The nice thing, certainly
if you’re a Facebook or of that size, you can
split it is finely as you want and look at the
really micro-communities and the variations. You can get the backstory. We’re now taking
people and getting them to share their
entire medical history and their full Facebook history. We know their age, their sex,
we know what sort of insurance they have. We know a lot about
these people– again, with their
informed consent. And you can go back. So I think it’s a
real danger that it’s easy to be facile and say, oh
yeah, girls say “shopping.” That’s nice but that’s
not true of all girls, and not all of them are girls. -I’m thinking of a couple of
examples where– much of this is about how you apply and
think through what you’re trying to communicate and who you’re
trying to communicate it to, and where the potential
for miscommunication lies. So there’s a medical example
where we’ve done work with some pharmaceutical
companies, and if you think about the
journey from the pharma company to the patient getting
a benefit, the way the pharma company
talks about a condition compared with how doctors
talk about the same condition compared with how patients talk
about the same condition subtly shifts to the point where this
word that the patient uses– they’re meaning the same
thing as the pharma company, but it all gets
lost in translation. And we’ve had some reasonably
interesting and successful results where
somebody has modified their style of communication
for the person they’re trying to target, and it’s
had very positive results. -I just want to add to
that the on social media, a lot of the
language that we see is quite performative in that
people are often attempting to manage the impression that
they’re making on other people. So they’re using affiliative
language practices in order to either– the
woman who constantly is talking about cute
dogs and shopping– I mean, I talk about cute
dogs on Twitter– quite a bit, I might add. But she might also be
trying to affiliate herself in certain ways or
trying to perform a particular type of fem
identity for a particular end. And I think that
when we’re thinking about the visible identity
cues that people are making on social media, we also
have to think about the fact that many people see social
media as somewhere where they have a particular persona
that they’re furthering. And one of the other things
that I’m interested in is how a lot of the time
those personas get branded, that people start
thinking of themselves as using advertising and
marketing strategies in using social media and
reaching out to others, and I think that that’s
the kind of thing that the qualitative
work that you’re doing, that Lyle is talking
about, that can really add to our understanding of
those types of social media linguistic practices. -Thank you. -Hello, my name is Kiera. I’m from Connecticut College. I’m a sophomore there. And I’m curious– this
question’s for Alice– what’s your opinion on Yik Yak? -Oh, so those of you who don’t
know, Yik Yak is a mobile app. It is sort of like
anonymous Twitter, that you can say things
anonymously and it can get up or down voted by the
rest of your community. And it’s geolocated, meaning
that generally you’re seeing Yik Yaks from
people around you. So I work at Fordham
University, I often look at the Fordham Yik Yak. And also it’s been accused of
being used for bullying a lot, so Yik Yak has tried to put
all these really interesting technical affordances
to police user behavior, like you’re not supposed to
mention other people by name, and they geofence
off the boundary. So my apartment is close
to a secondary school, so I can’t use Yik Yak
within my apartment because it thinks I’m
a high school student. All right, whatever. I think anonymous internet
communication is interesting. I think that it affords
some very positive things. Quite frequently
there is evidence that people find it easier
to share with other people when they’re not having
their information tied back to a singular identity, tied
to their marketing profile. And I think it is
important for people to have places to
express things that they can’t express in other places. On the other hand,
I think it’s also a place where people
can express things that are very offensive
or problematic or might really
hurt somebody else. And the thing about Yik Yak is
they’ll give personal data over to law enforcement without
even thinking about it. That’s one of their things. If you, say, Yik Yak, oh,
I want to bomb the school, they’ll give that over
to the cops and the cops will arrest that person and
that’s happened several times. So I go back and forth. I don’t believe that
technology is neutral at all. I believe that technology
can and does express its own politics and values. And the people
behind Yik Yak seemed like their ideal was to create
something for self-expression. I don’t know, what’s
your opinion of Yik Yak? I would be curious. -I don’t have Yik Yak. -You don’t have Yik Yak? -I deleted it a while ago. -Does it have influence
on your campus? -Yes, but I try not to see it. -Yeah. But I do know that
there’s been a lot of other similar, anonymous,
campus-based things in the past. There used to be something
called JuicyCampus, which was a gossip. People like to gossip. College students love to gossip. People like to gossip. that People are going to use
social media sites to gossip. I’m not sure whether I would
say that the negative effects of anonymity outweigh
the positive effects, but I think it is
important for people to have anonymous spaces online. -As a researcher
I think it’s also important– people are
always performative. They’re putting on a
persona on Facebook, they’re putting
on one right now. As we talk to each other,
we’re doing a persona. And it’s interesting
to me to be able to see some of these people who are
often fairly nasty on Yik Yak, but the chance to actually see
what they’re seeing in a way that they will not
show me as an authority figure from the establishment
to know what’s going on. So I don’t find much of the
discussion very pleasant, but I guess I do like to know
what people are talking about. And in the end
end, that, I think, is an important
facet quite apart from the separate question of is
it beneficial or harmful, which is not so obvious. -Since we have four more
questions let’s move on. -I’m writing a history of
the Equal Rights Amendment in graphic novel form, which
ends up being really look at the citizenship
of women and men and the history of
it in this country. And one of the things you
notice in the early on is that there is a
very deliberate attempt to create spheres–
the women will operate in the domestic sphere
and the men will operate in the public sphere. And so it’s interesting to
look at how men talk publicly about politics and about
sports, and you often find, of course, they
talk about them similarly, that they’re the teams. And I’ve noticed
as someone who is a woman who talks publicly about
politics, I get policed by men. And I also get women who write
to me and say, oh my god, thank you for saying
that publicly. And then I get little
secret messages. And I have lots of
little background forum of women reassuring each other. And a friend of mine, a man,
wrote a piece supporting Hillary Clinton, which
went kind of viral, and he said he got lots of
messages from women saying, thank god I had this to share. I was afraid to say I supported
Hillary Clinton because I didn’t want to get yelled at,
but I could share this and be like, look, this guy likes her. So it’s funny to compare the
women talking about shopping publicly and then the
harassment that women get for talking publicly. How much of that is about are
women talking in the background that we don’t know about? How much of that is
affecting each other and what they talk about? -Does anyone want
to tackle that one? -I think the different
platforms are very different, so you’re communicating with
your friends over Facebook is very different from
publishing in a blog, and very different from being
in a gaming community which are notoriously competitive,
would be a polite way to put it. So I think that there are many,
many different communities and each person participates
in them in different ways, and it’s very hard to
generalize across them. -One of the things that
we found in our research with young people is that
young people will often use a multiplexity of
different apps to communicate, and they’ll often
have one of those apps will act as a sort of safety
valve, which is where they feel that they can be more honest. And quite often that’s
because either they’re using a pseudonym on
a site like Tumblr, or they’re communicating with
a smaller group of people, like they have a group text
on WhatsApp or something and that’s where they share
their more intimate feelings. But thing about
that is then you’re not participating
in what we would think of as the “public sphere. And what we see historically
in the history of the internet is that men’s speeches tended
to be more in the public sphere. That the blogs
written by men were valued more than the online
journals written by women, and when people are
talking about who’s participating in the
kind of online punditry that we see on
sites like Medium, oftentimes it’s men who are
going into those spaces. So I think that there
is certainly a value placed on where people are
participating and talking about things. And I do think that
the fact that there is this systematic
discouragement of women speaking in these sort
of internet public spheres– I think that’s a
very significant problem. I really do. I think it’s a devaluing
of women’s contributions to the political sphere,
to the public sphere, and frankly I think it’s
a problem with democracy. -So since we’ve got
two more questions, I thought you could
ask them successively and then we’ll have
one or two comments. -Hello, my name is
Caitlin Hannigan. I’m a strategic
communication major at Bridgewater State University,
my first senior year. So Alice, I actually connected
with a lot of the stuff that you were talking
about because we talk a lot about that in my classes. And my question’s more
geared towards you, but anyone is welcome to
answer if they have an opinion. But I don’t know if
you guys remember, but last spring, Curt
Schilling tweeted something out about his daughter
being accepted into a college on a
scholarship for softball. And immediately
responding, people were sending her death
threats– I’m going to rape her, I’m going to do this,
we’re going to find her, all of that stuff. So my question is pretty much
I know you’ve done research on age and gender
and stuff, but I don’t know if there’s a
difference with the celebrities versus “normal people” I guess. Not, like, normal people,
but you get what I’m saying. Just because we talk
about that all the time, but we talked earlier about
how we act differently in front of different
audiences, and celebrities do act differently
probably in the media than they do at their homes
or with their families or something. So I was wondering about that. -We’ll come back to
that, and let’s just hear the last question. -Hi, thanks. My name is Orrin
[? Sur ?] and I’m doing natural language
processing and big data. And I want to go back to
Rebecca’s famous quotation of Anderson of big data
is the end of theory and ask for your
respective takes on that. -Schilling or end
of theory first? Or does anyone want
to pick one of those? -I’m happy to talk
about the Schilling. So I think that,
first of all, we have a culture of
criticizing celebrity women. And many of us may not see
that as a very serious problem, like, who cares if
Kim Kardashian gets her feelings hurt? But I think the problem is that
the distinction between who is a celebrity and
who is not is changing with the era of the internet. So if everybody who puts
their pictures online is now susceptible
to the same type of severe, gendered criticism
that we put on famous women, then it becomes an issue of
who is more public than others? And if all of us are
making ourselves public, and often we’re forced
to make ourselves public as a condition for young
people, for employment, or for education, then we’re all
susceptible to that same type of deeply critical language. And I think that’s a problem. The second thing is
I think that there is a continuum of
celebrity practice by which some people
are internet celebrities or Facebook famous or
whatever, and they actively are looking for
attention online. That’s what my first
book was all about, was attention has a particular
type of social currency. And so I think that when we’re
talking about celebrities, it’s actually quite
important because celebrity as a model of a
commodified self that uses broadcast media
to publicize itself is something that’s
getting closer to the intrinsic
subjectivity of social media than we might like to think. -Thanks. Do you want to
tackle end of theory? -Well, the limitation
of big data is commercial people
to actually go and do something interesting and
meaningful and useful with it. I think that’s the choke point. -In speaking from the
standpoint of psychology, it has for the last
50 years been largely a theory-driven field, but
I see in the last few years these big data are
pushing to be still mostly theory, but increasingly
hypothesis-driven. So the one thing that one
can do from these focus groups of 10 million people
is go not with a theory, but to say, hey, I
want to know what people are saying about
transpeople or about cell phones. Let me not come
in with a theory. Let me first observe empirically
what’s being talked about, and then tie that
back to underlying theory of personality, or
whichever theory structure you want. So I think the ability
to generate hypotheses is unbelievably powerful. It’s not just asking a
question to confirm or falsify a hypothesis. That’s a very
different style, one that’s taken over
in medicine but is fairly new to large parts
of the social sciences. -Great. So that’s a good note to end
on, that data cannot speak for itself. And I want to thank
all these panelists for presenting their research.