Category: Data Storytelling

On Assuming Mental Paralysis

A fellow graduate student recently asked me how I approach literature reviews. This question of how to find, read, and synthesize a body (or more) of research is central to producing good academic work. Yet it brings to mind Bellatrix Lestrange’s vault in Gringotts, where every paper you read yields six more until you’re neck deep with no foreseeable way out.

When I first started studying parents and social media use, I was content with Irwin Altman’s definition of privacy as controlling access to the self. Digging deeper, I learned to think of privacy as contextual integrity (thanks to Helen Nissenbaum) and as boundary management (thanks to Sandra Petronio). As I continued studying privacy over the years, I learned that lawyers, psychologists, communication scholars, economists, and computer scientists all conceptualize privacy in different ways. During my first year in the PhD, I considered creating a disciplinary map of privacy for a class project but quickly realized that was a much bigger undertaking than I imagined.

I’ve grown familiar with the feeling. I took a seminar with Jason Farman on “Place, Space, and Identity in the Digital Age,” and saw that entire careers can (and have) been built around each of these concepts. Place isn’t just a label on a physical space, it’s objects and bodies and relationships and memories and information flows and more coming together in a particular arrangement at a particular moment. Identity isn’t just a list of demographic characteristics, it’s the facets, fragments, memories, experiences, beliefs, roles, imaginaries and more that constantly intersect and intertwine into you. And this morning, while reading John Law’s “Objects and Spaces,” I realized that we can’t even take physical, 3-D, Euclidean space as a given.

Sigh.

It’s easy to see moments like these as overwhelming, paralyzing even. Especially when you do interdisciplinary research and plan to borrow theories and methods from other disciplines. Or to see these moments as challenges, as piles of reading to conquer so that you can one day claim the prize of “knowing” something.

But these moments keep happening. So the options are to feel constantly overwhelmed or to see grand quests pile up, neither of which is healthy (nor encouraging). I’ve come to an alternate response after starting a daily meditation practice: Let it go.

Let go of the overwhelm. Let go of the fear. Let go of the burden. Worried you don’t have time to read everything? Let it go. Concerned that you might overlook something? Let it go. Dreading the moment another scholar tells you, “Yeah, but what about [totally separate body of work that may or may not be relevant to your topic]?” Let it go.

It sounds simple, I know. But these three words, combined with the acknowledgement, acceptance, and even embrace of the vast, unimaginable, and ultimately unknowable amount of prior work out there is freeing.

I spent all day brainstorming the verb for this post’s title. When I do literature reviews, and when I do research in general, I want to assume mental paralysis. Meaning, I want to assume that I will experience moments of mental paralysis, of viewing the work ahead as a sheer, insurmountable rock wall I somehow have to climb, as a tangled thicket in dark jungle through which I have to chop my way out.

But I also want to take up the mental paralysis, to wear it as a badge, to make it part of me. Because even after I climb this wall or chop through those vines, there will be another wall, another tangle. And by accepting that, I hope to take greater joy in those moments when I DO learn something, when a concept finally DOES click in my head, even if it falls apart again a moment later. By acknowledging and expecting the complexity, I release the sense that I need to master it, to someday “figure it out.”

And that, I suppose, is how I approach literature reviews.

(Oh, and for anyone who wants actual advice on how to do a literature review, Raul Pacheco-Vega has a series of relevant blog posts.)

 

Designing Resources to Help Kids Learn about Privacy Online @ IDC 2018

What types of educational resources would help elementary school-age children learn about privacy online? Below I share findings and recommendations from a paper I co-wrote with Jessica Vitak, Marshini Chetty, Tammy Clegg, Jonathan Yang, Brenna McNally, and Elizabeth Bonsignore. I’ll present this paper at the 2018 ACM Conference on Interaction Design and Children (IDC).

What did we do? Children spend hours going online at home and school, but they receive little to no education about how going online affects their privacy. We explored the power of games and storytelling as two mechanisms for teaching children about privacy online.

How did we do it? We held three co-design sessions with Kidsteam, a group of children and adults who meet regularly at the University of Maryland to design new technologies. In session 1, we reviewed existing privacy resources with children and elicited design ideas for new resources. In session 2, we iterated on a conceptual prototype of a mobile app inspired by the popular game Doodle Jump. Our version, which we called Privacy Doodle Jump, incorporated quiz questions related to privacy and security online. In session 3, children developed their own interactive Choose Your Own Adventure stories related to privacy online.

What did we find? We found that materials designed to teach children about privacy online often instruct children on “do’s and don’ts” rather than helping them develop the skills to navigate privacy online. Such straightforward guidelines can be useful when introducing children to complex subjects like privacy, or when working with younger children. However, focusing on lists of rules does little to equip children with the skills they need to make complex, privacy-related decisions online. If a resource presents children with scenarios that resonate with their everyday life, children may be more likely to understand and absorb its message. For example, a child might more easily absorb a privacy lesson from a story about another child who uses Instagram than a game that uses a fictional character in an imaginary world.

What are the implications of this work?

  • First, educational resources related to privacy should use scenarios that relate to children’s everyday lives. For instance, our Privacy Doodle Jump game included a question that asked a child what they would do if they were playing Xbox and saw an advertisement pop up that asked them to buy something.
  • Second, educational resources should go beyond listing do’s and don’ts for online behavior and help children develop strategies for dealing with new and unexpected scenarios they may encounter. Because context is such an important part of privacy-related decision making, resources should facilitate discussion between parents or teachers and children rather than simply tell children how to behave.
  • Third, educational resources should showcase a variety of outcomes of different online behaviors instead of framing privacy as a black and white issue. For instance, privacy guidelines may instruct children to never turn on location services, but this decision might differ based on the app that is requesting the data. Turning on location services in Snapchat may pinpoint one’s house to others — a potential negative, — but turning on location services in Google Maps may yield real-time navigation — a potential positive. Exposing children to a variety of positive and negative consequences of privacy-related decision making can help them develop the skills they need to navigate uncharted situations online.

Read the IDC 2018 paper for more details!

Citation: Kumar, P., Vitak, J., Chetty, M., Clegg, T.L., Yang, J., McNally, B., & Bonsignore, E.  (2018). Co-Designing Online Privacy-Related Games and Stories with Children. In Proceedings of the 17th Annual ACM Conference on Interaction Design and Children (IDC’18). doi:10.1145/3202185.3202735

Parts of this entry were cross-posted on the Princeton HCI blog.

Kids and Privacy Online @ CSCW 2018

How do elementary school-aged children conceptualize privacy and security online? Below I share findings and recommendations from a paper I wrote with co-authors Shalmali Naik, Utkarsha Devkar, Marshini Chetty, Tammy Clegg, and Jessica Vitak. I’ll present this paper at the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW).

What did we do? Children under age 12 increasingly go online, but few studies examine how children perceive and address privacy and security concerns. Using a privacy framework known as contextual integrity to guide our analysis, we interviewed children and their parents to understand how children conceptualize privacy and security online, what strategies they use to address any risks they perceive, and how their parents support them when it comes to privacy and security online.

How did we do it? We interviewed 26 children ages 5-11 and 23 parents from 18 families in the Washington, DC metropolitan area. We also walked through a series of hypothetical scenarios with children, which we framed as a game. For example, we asked children how they imagined another child would respond when receiving a message from an unknown person online.

What did we find? Children recognized how some components of privacy and security play out online, but those ages 5-7 had gaps in their knowledge. For example, younger children did not seem to recognize that sharing information online makes it visible in ways that differ from sharing information face-to-face. Children largely relied on their parents for support, but parents generally did not feel their children were exposed to privacy and security concerns. They felt such concerns would arise when children were older, had their own smartphones, and spent more time on social media.

What are the implications of this work? As the lines between offline and online increasingly blur, it is important for everyone, including children, to recognize (and remember) that use of smartphones, tablets, laptops, and in-home digital assistants can raise privacy and security concerns. Children absorb some lessons through everyday use of these devices, but parents have an opportunity to scaffold their children’s learning. Younger children may also be more willing to accept advice from their parents compared to teenagers. Parents would benefit from the creation of educational resources or apps that focus on teaching these concepts to younger children. The paper explains how the contextual integrity framework can inform the development of such resources.

Read the CSCW 2018 paper for more details!

Citation: Kumar, P., Naik, S.M., Devkar, U.R., Chetty, M., Clegg, T.L., and Vitak, J. (2017). ‘No Telling Passcodes Out Because They’re Private’: Understanding Children’s Mental Models of Privacy and Security Online. In Proceedings of the ACM: Human-Computer Interaction, 1(CSCW), Article 64, pp. 1-21. (CSCW ’18 Online First) doi:10.1145/3134699

Parts of this entry were cross-posted on the blogs of UMD’s Privacy Education and Research Laboratory (PEARL) and Princeton HCI.

Privacy Policies, PRISM, and Surveillance Capitalism in MaC

I recently published my first journal article in a special issue of Media and Communication (MaC) on Post-Snowden Internet Policy. (Unfortunately, the editors misgendered me in the editorial).

In my article, Corporate Privacy Policy Changes during PRISM and the Rise of Surveillance Capitalism, I analyzed the privacy policies of 10 internet companies to explore how company practices related to users’ privacy shifted over the past decade.

What did I do? The Snowden disclosures in 2013 re-ignited a public conversation about the extent to which governments should access data that people generate in the course of their daily lives. Disclosure of the PRISM program cast a spotlight on the role that major internet companies play in facilitating such surveillance. In this paper, I analyzed the privacy policies of the nine companies in PRISM, plus Twitter, to see how companies’ data management practices changed between their joining PRISM and the world learning about PRISM. I drew on my experience with the Ranking Digital Rights research initiative and specifically focused on changes related to the “life cycle” of user information — that is, the collection, use, sharing, and retention of user information.

How did I do it? I collected company privacy policies from four points in time: before and after the company joined PRISM and before and after the Snowden revelations. Google and Twitter provide archives of their policies on their websites; for the other companies, I used the Internet Archive’s Wayback Machine to locate the policies. I logged the changes in a spreadsheet and classified them into substantive or non-substantive changes. I then dug into the substantive changes and categorized them based on how they affected the life cycle of user information.

What did I find? Seventy percent of the substantive changes addressed the management of user information and data sharing and tracking. The changes related to management of user information provided additional detail about what companies collect and retain. The changes related to data sharing and tracking offered more information about companies’ targeted advertising practices. These often appeared to give companies wider latitude to track users and share user information with advertisers. While these policy changes disclosed more details about company practices, the practices themselves appeared to subject users to greater tracking for advertising purposes.

What are the implications of this work? Collectively, these privacy policy changes offer evidence that suggests several of the world’s largest internet companies operate according to what business scholar Shoshana Zuboff calls the logic of surveillance capitalism. Participating in PRISM did not cause surveillance capitalism, but this analysis suggests that the PRISM companies further enmeshed themselves in it over the past decade. The burgeoning flow of user information into corporate servers and government databases exemplifies what legal scholar Joel Reidenberg calls the transparent citizenry, where people become visible to institutions, but those institutions’ use of their data remains obscure. This analysis serves as a reminder that public debates about people’s privacy rights in the wake of the Snowden disclosures must not ignore the role that companies themselves play in legitimizing surveillance activities under the auspices of creating market value.

Read the journal article (PDF) for more details!

Citation: Kumar, P. (2017). Corporate Privacy Policy Changes during PRISM and the Rise of Surveillance Capitalism. Media and Communication, 5(1), 63-75. doi:10.17645/mac.v5i1.813

My Mission

notebook-med

Image by Markus Spiske/Flickr

When I was 13 or 14, my parents gave me “The 7 Habits of Highly Effective Teens” by Sean Covey for Christmas. I devoured the book, re-reading it for the next several years. It was the first book in which I highlighted, dog-eared, and wrote notes directly on the pages.

Habit 2 encouraged readers to write a personal mission statement. I loved the idea but never wrote anything of consequence. Now, having accumulated several more years of life experience, I feel more equipped to write that statement.

The sentiment of my mission coalesced largely over the past six years. The transition from college to work to graduate school to now was difficult and enlightening. I finally have a sense of what I want to accomplish, yet I feel secure enough with myself to accept that may evolve.

So, what’s my mission?

I examine the forces that shape our lives and share that knowledge with the public.

This mission highlights what fascinates me and what I want to do with that knowledge. I am a writer, researcher, and storyteller at heart, and I aspire to write a book one day. In the interest of focusing on systems rather than goals, I aim to write pieces that people can point to and say, “I learned something from that.”

My professional and amateur interests span astronomy, psychology, Internet studies, and history — disparate disciplines bound by a common thread of humanity.

Like many people, I’m struck with awe every time I look up at the night sky. So much exists out there, and while science has enabled us to learn a tremendous amount about what’s up there, it’s impossible (for now) to travel across light years or stand on the event horizon of a black hole. So, why does astronomy matter?

Because every particle that makes up every human being on the planet comes from the stars in that sky. The universe began with hydrogen, a smattering of helium and a smidgen of lithium. All other elements in the periodic table, including the carbon that forms the basis of life as we know it, emerged from nuclear fusion occurring in the cores of stars and in the aftermath of star explosions. Everything that’s inside you comes from up there.

What goes on inside us, particularly our brains, also captivates me. While we don’t have to think about telling our body to breathe air, pump blood, or digest food, our thoughts drive so much of our behavior. And while thought processes may feel automatic, they’re malleable and well within our control. Figuring out how to change the way we think and implementing those changes isn’t easy. But I take comfort in the paradoxical notion that while I can’t control anything outside my own mind, taking control of my own mind grants me boundless potential to construct a fulfilling life.

Nowadays, that life is not just experienced; it is increasingly documented by digital technology that creeps deeper into our daily lives. Personal and sensitive communications, ranging from text messages to financial transactions to data points about our physical activities flow through privately owned networks and sit on servers operated by companies that have wide latitude to use that data as they see fit. We as individuals must ensure that this emerging ecosystem of networked digital technology benefits, rather than restricts, us.

To do so, I think it’s important to put this moment in historical context. The human race has advanced tremendously over its existence on this planet. Look around you. So much of what you see and feel was designed or affected by humans. Buildings, roads, cars, books, families, music, math, elections, and the disease-resistant tomatoes in your fridge are the result of human activity.

Even if you’re sitting in middle of an ocean, forest, desert, or glacier, the device (or perhaps piece of paper) on which you’re reading these words was invented by humans. The language you’re reading right now, the shapes of the letters and the grammatical rules that render these words meaningful were developed by humans.

This point reverberated while I recently read Amsterdam: A History of the World’s Most Liberal City. As author Russell Shorto described how the philosopher Baruch Spinoza first posited that church and state could exist as separate entities, it hit me in my gut that values, principles, and norms change. That there was a time when people truly believed that dark-skinned humans were inferior. That 100 years ago, women in the United States had no right to vote. That the notion of “this is just how things are” is simply not true. History is not facts and timelines; history is about moments and people who seize those moments and make them matter. History is learning how people have harnessed their potential and applying those lessons to the present day.

As I move through life, I want to understand more about these forces, the physical, internal, societal, and historical forces that have brought me, you, and those around us to this particular moment in time. And if in that process, I say something that makes you go, “Hmm, I never thought of that,” well then, mission accomplished.

This post also appears on Medium.

Earn a Graduate Degree and Write a Thesis: Check

Last week, check marks sprouted next to two items on my bucket list: earn a graduate degree and complete an individual thesis.

Before embarking on both journeys, I knew I loved to research and write. I felt like my mind, fascinated by such topics as journalism, astronomy, neuroscience, and colonial-era U.S. history, embodied the aphorism that a journalist’s expertise is a mile wide and an inch deep. Two years after becoming a student the University of Michigan School of Information, I have discovered where I want to go deep.

I want to understand how digital technology affects our relationships with ourselves, our significant others, our kids, our parents, our friends (and Friends), our governments, our devices, and the companies that manufacture those devices and harvest the data they so dutifully collect.

I’m a Millennial. I hand-wrote book reports in elementary school and made science projects out of cardboard and foam. My family bought a computer when I was nine years old, and I began typing my school assignments because tapping the keys was more fun than scrawling the pencil across the page. As a high schooler I conversed with friends over AIM; as a college student I was among the first generation to latch my social life to Facebook. I studied journalism as an undergraduate and watched digital technology pull the rug out of that industry right as I graduated and faced “the real world.”

I cannot imagine my life without digital technology. But I also wonder whether and how it is changing the way we live. Excited by our ability to capture, store, and disseminate large amounts of data, I designed my own curriculum in data storytelling to learn the basics of programming and design and apply those skills to the art of storytelling. The idea that people could use data to discover personal information (e.g., someone’s pregnancy) captivated me.

This became the basis for my thesis research in which I interviewed new mothers about their decisions to post baby pictures on Facebook. I had begun seeing baby pictures on my own Facebook News Feed, and I was curious whether the question of what to post and not post online entered new mothers’ minds.

As I was wrapping up one research interview a few months ago, the participant asked what I was studying.

“Data storytelling,” I replied, launching into my well-rehearsed, 30-second definition of this field of study.

“I feel like Facebook is the definition of data storytelling,” she said. “I am telling my life story in the way that I want to,…And it’s all data…That’s, like, the perfect thesis for what you’re studying.”

Her statement comforted me because I, for some reason, had equated data storytelling to working with numbers. But data is data, whether words or numbers. My thesis distilled more than 400 pages of interview transcripts into a story about what types of pictures new mothers do and don’t post online as well as what factors influence their decision.

The most rewarding aspect of completing this degree and this thesis has been hearing people’s enthusiasm and encouragement when I tell them what I’m doing. It is so exciting to believe you’re helping to make sense of what feels like a rapidly changing world, but also to realize that while the circumstances in which you’re asking the questions may be changing, the questions themselves are timeless. In the case of my thesis, taking baby pictures is nothing new, but broadcasting them to an audience of hundreds is.

One of my professors quoted a colleague of hers as saying, “Graduate school was when they stopped asking me the questions they already know the answers to.” In my time at UMSI, I’ve helped to answer some of those unanswered questions. I’m leaving campus with a better sense of what questions I want to ask of the world moving forward.

#MGoBlue  #MGoGrad

A Citation Network of Wikipedia’s Featured Articles

Type an unfamiliar term into Google, and chances are your quest for answers will cross paths with Wikipedia. With more than 470 million unique monthly visitors as of February 2012, the world’s free encyclopedia has become a popular source of information. Our team (Jackie Cohen, Priya Kumar, and Florence Lee) used network principles to explore where Wikipedia gets its information.

Our analysis suggests that Wikipedia’s best articles cite similar sources. Why is this important? Information about the most frequently cited domains may give Wikipedia editors a good starting point to improve articles that need additional references.

We reviewed the citation network of Wikipedia’s English-language featured articles to discover which categories of articles shared similar citation sources, or domains. Wikipedia organizes its more than 4,200 featured articles into 44 categories; we found that every pair of categories shares at least one domain, creating a completely connected network.

In the network graph (Figure 1), each category is a node. If two categories share at least one domain, an edge appears between them. Since every category pair shares at least one domain, each node shares an edge to every other node. The graph has 44 nodes and 946 edges.

Figure 1: Citation Network of English-Language Wikipedia Featured Articles

network_graph

But the mere existence of an edge doesn’t tell us about the strength of the relationship, or the number of shared domains, between two categories. The two categories could share one domain or hundreds. We assigned weights to the edges to determine which pairs share more domains than others.

First, we determined how many shared domains existed in the entire network. If a domain appeared in articles of at least two categories, we considered it a shared domain. For example, at least one Wikipedia article in the biology category cited an nytimes.com link, and at least one Wikipedia article in the law category also cited an nytimes.com link. So we added nytimes.com to the list of shared domains. Overall, we found 1,103 shared domains in the network.

We calculated edge weights by dividing the number of shared domains between a category pair by the total number of shared domains in the network. For example, biology and law shared 14 domains, so the pair’s edge weight was 0.0127 (14 divided by 1,103).

The distribution of edge weights appears to be a power law distribution (Figure 2). But graphing the distribution on a log-log scale (Figure 3) shows a curved line. Despite the linear distribution’s long tail, it doesn’t appear to be a true power law distribution.

Figure 2: Edge Weight Distribution – Linear Scale      Figure 3: Edge Weight Distribution – Log-Log Scale

linear                  log-log

We scaled the edges on an RGB spectrum. The vast majority of category pairs cite fewer than five percent of the shared domains, which is why thick cables of blue traverse the network graph in Figure 1. The occasional turquoise edges represent the pairs that cite more than five percent of shared domains.

The pairs that share the most domains are:

  1. Politics and Government Biographies &  Religion, Mysticism, and Mythology (223 shared domains; 0.2022 edge weight)
  2. Physics and Astronomy & Physics and Astronomy Biographies (159 shared domains; 0.1442 edge weight)
  3. Physics and Astronomy & Religion, Mysticism, and Mythology (150 shared domains; 0.1360 edge weight)

The second pair feels intuitive. We scratched our heads at the first pair and found the third pair interesting, given that the two categories often appear on different sides of various public debates. Some of the shared domains between this pair, such as slate.com, jstor.org, and christianitytoday.com, were unsurprising, but we did notice several unexpected shared domains in this pair, including brooklynvegan.com and vulture.com.

Figure 2 depicts an elbow around the edge weight of 4.6 percent. If we use this as a threshold to create the network, that is, only draw an edge if its weight is higher than 0.046, the network becomes far less connected (Figure 4).

Figure 4: Citation Network with an Edge Weight Threshold of 4.6 Percent

threshold

We also examined the domains themselves. The three most popular shared domains were:

The widespread citation of these domains aligns with Wikipedia’s encyclopedic nature; these sites are gateways into vast swaths of digitally recorded information and knowledge.

Considering the least popular domains, 601 domains were only shared between one category pair. Removing those domains from the graph only deleted four edges, since most category pairs share more than one domain. This suggests that edge weight is a better threshold for examining the relationships in this network than domain distribution.

While typical network characteristics such as centrality measures, community structures, or diffusion were not relevant in the completely connected network, examining edge weights yielded interesting findings. Future work could examine network characteristics of the thresholded graph as well as consider whether patterns exist in the way various category pairs cite different domain types (e.g., journalism, scholarly, personal blogs, etc).

Project Code: Available here

The project can be replicated by running: grab_data.py, manage_data.py, data_counts.py. The first two files collect and parse the data we describe above. The data_counts.py file contains all the network manipulation, and, if you download the entire repository, can be run immediately (the repository includes the results from the former two files). This last file contains comments that explain where in the code we determined different network metrics and examined aspects of our network. This includes where we implemented edge weight thresholding (Figure 1, Figure 4) and where we conducted Pythonic investigations into whether the edge weight distribution was a power law distribution.