Like many people, I enjoy having a routine. This summer, after moving from a cubicle into a shared office space, I began going to campus more routinely and working a similar schedule each day. The regular schedule plus the commute activated more natural boundaries around “work” and “home” time. On campus, I focused a bit more and got distracted a bit less. Most important, I felt anchored. I cherish the self-directed and flexible nature of PhD life, but it sometimes left me feeling like a dandelion blowing in the wind.
This new routine has done wonders for my sense of well-being. But it hasn’t done much for my time management skills. I used to think I was great at time management because I always met my deadlines and my expectations. After an exhausting first semester in the PhD program nearly two years ago, I realized I was terrible at time management. The only reason why I met my deadlines (and satisfied my perfectionist tendencies) was that I let work take priority over everything else. If I didn’t feel like I had accomplished enough by 5:30, I’d keep working until 9, 10, or 11 pm. If I didn’t feel like I had gotten enough done enough by Friday evening, I’d let work consume Saturday and/or Sunday. This didn’t leave my body, my mind, or my husband very happy.
Since that realization, I’ve re-framed my attitude toward work (it is an important part of my life, but not the most important) and changed my practices (regularly went to campus). The fall semester started this week, which means goodbye languid summer days, hello bustling campus and fuller schedule. I don’t like feeling overwhelmed by this, and I don’t want to spend the next four months waiting for winter break.
Various productivity systems, designed for academic life and beyond, suggest keeping a detailed schedule or assigning specific tasks to each day. I tried these approaches and found them rigid and stifling. So I’m going to adapt their principles into a system that works for me.
First, I commit to a consistent weekday wake-up and go-to-bed time. My alarm goes off at the same time every weekday, but I snooze it for 5 to 75 minutes. I’d like to limit the snoozing to about 10 minutes. To help with that, I intend to go to bed at a consistent time, and to begin my bedtime routine 30 minutes prior to that bedtime.
Second, I will go to campus on weekdays unless I have a scheduling reason to work from home. My experience this summer reminded me that it’s much easier to treat the PhD as a job when it involves a distinct workplace and a commute.
Third, I’ll restart a practice I followed when I worked full-time — tracking my hours. I was fortunate to have supervisors who let me take comp time if I ever worked more than 40 hours per week, so you bet I tracked my hours. I can get obsessive with practices like this, which is why I refrained from tracking my hours as a PhD student. But since I work on various projects, eagerly say yes to other projects, tend to fall into rabbit holes while working on any project, and am a recovering perfectionist, I think time tracking is essential to improving my time management skills. I keep things simple and do this in a spreadsheet.
Fourth, I’ve created a task management workflow to help me figure out what to work on when. I’ve written a month-by-month list of my commitments, deadlines, and events. At the end of each week, I’ll spend half an hour previewing the next week. I’ll create a to-do list with the tasks that need to be completed that week. I’ll then look at the calendar and schedule time blocks to work on those tasks. As I go through the week, I can move things around if needed. After a few weeks of this, I hope to have a better sense of how much I can accomplish in a typical 40-ish hour week and how much time to budget for certain tasks. This will (hopefully) help me let go of the perfectionist tendencies, resist the temptation of distractions (Twitter, I’m looking at you) and understand the “price” of saying yes to a given task.
Finally, I commit to keep my campus desk tidy. Stalagmites of papers and books make my home desk an uncomfortable place to work, and looking at them unsettles my mind. Yes, I’d like to clean them off, but this is about baby steps. My campus desk is big enough that the two piles that have already sprouted aren’t in the way. I’d like to keep it that way.
So that’s my plan for this semester. Check with me in four months to see how it goes.
Type an unfamiliar term into Google, and chances are your quest for answers will cross paths with Wikipedia. With more than 470 million unique monthly visitors as of February 2012, the world’s free encyclopedia has become a popular source of information. Our team (Jackie Cohen, Priya Kumar, and Florence Lee) used network principles to explore where Wikipedia gets its information.
Our analysis suggests that Wikipedia’s best articles cite similar sources. Why is this important? Information about the most frequently cited domains may give Wikipedia editors a good starting point to improve articles that need additional references.
We reviewed the citation network of Wikipedia’s English-language featured articles to discover which categories of articles shared similar citation sources, or domains. Wikipedia organizes its more than 4,200 featured articles into 44 categories; we found that every pair of categories shares at least one domain, creating a completely connected network.
In the network graph (Figure 1), each category is a node. If two categories share at least one domain, an edge appears between them. Since every category pair shares at least one domain, each node shares an edge to every other node. The graph has 44 nodes and 946 edges.
Figure 1: Citation Network of English-Language Wikipedia Featured Articles
But the mere existence of an edge doesn’t tell us about the strength of the relationship, or the number of shared domains, between two categories. The two categories could share one domain or hundreds. We assigned weights to the edges to determine which pairs share more domains than others.
First, we determined how many shared domains existed in the entire network. If a domain appeared in articles of at least two categories, we considered it a shared domain. For example, at least one Wikipedia article in the biology category cited an nytimes.com link, and at least one Wikipedia article in the law category also cited an nytimes.com link. So we added nytimes.com to the list of shared domains. Overall, we found 1,103 shared domains in the network.
We calculated edge weights by dividing the number of shared domains between a category pair by the total number of shared domains in the network. For example, biology and law shared 14 domains, so the pair’s edge weight was 0.0127 (14 divided by 1,103).
The distribution of edge weights appears to be a power law distribution (Figure 2). But graphing the distribution on a log-log scale (Figure 3) shows a curved line. Despite the linear distribution’s long tail, it doesn’t appear to be a true power law distribution.
Figure 2: Edge Weight Distribution – Linear Scale Figure 3: Edge Weight Distribution – Log-Log Scale
We scaled the edges on an RGB spectrum. The vast majority of category pairs cite fewer than five percent of the shared domains, which is why thick cables of blue traverse the network graph in Figure 1. The occasional turquoise edges represent the pairs that cite more than five percent of shared domains.
The pairs that share the most domains are:
- Politics and Government Biographies & Religion, Mysticism, and Mythology (223 shared domains; 0.2022 edge weight)
- Physics and Astronomy & Physics and Astronomy Biographies (159 shared domains; 0.1442 edge weight)
- Physics and Astronomy & Religion, Mysticism, and Mythology (150 shared domains; 0.1360 edge weight)
The second pair feels intuitive. We scratched our heads at the first pair and found the third pair interesting, given that the two categories often appear on different sides of various public debates. Some of the shared domains between this pair, such as slate.com, jstor.org, and christianitytoday.com, were unsurprising, but we did notice several unexpected shared domains in this pair, including brooklynvegan.com and vulture.com.
Figure 2 depicts an elbow around the edge weight of 4.6 percent. If we use this as a threshold to create the network, that is, only draw an edge if its weight is higher than 0.046, the network becomes far less connected (Figure 4).
Figure 4: Citation Network with an Edge Weight Threshold of 4.6 Percent
We also examined the domains themselves. The three most popular shared domains were:
The widespread citation of these domains aligns with Wikipedia’s encyclopedic nature; these sites are gateways into vast swaths of digitally recorded information and knowledge.
Considering the least popular domains, 601 domains were only shared between one category pair. Removing those domains from the graph only deleted four edges, since most category pairs share more than one domain. This suggests that edge weight is a better threshold for examining the relationships in this network than domain distribution.
While typical network characteristics such as centrality measures, community structures, or diffusion were not relevant in the completely connected network, examining edge weights yielded interesting findings. Future work could examine network characteristics of the thresholded graph as well as consider whether patterns exist in the way various category pairs cite different domain types (e.g., journalism, scholarly, personal blogs, etc).
Project Code: Available here
The project can be replicated by running: grab_data.py, manage_data.py, data_counts.py. The first two files collect and parse the data we describe above. The data_counts.py file contains all the network manipulation, and, if you download the entire repository, can be run immediately (the repository includes the results from the former two files). This last file contains comments that explain where in the code we determined different network metrics and examined aspects of our network. This includes where we implemented edge weight thresholding (Figure 1, Figure 4) and where we conducted Pythonic investigations into whether the edge weight distribution was a power law distribution.