I was playing with the Reddit API, and specifically the Python wrapper (PRAW) which makes it straightforward to grab large numbers of post titles or comments. I don’t have any current research questions about it, but I thought I’d do some butterfly collecting with coordinators. So, I’m just curious: do subreddits use coordinators differently?
For each subreddit, I got the titles from a random 1000 “hot” posts in the subreddit’s history. “Hot” posts are dynamic and updated constantly. How “hot” a post is a combination of high upvotes, and a lot of current activity. (Other categories of posts are “top” and “new”, self-explanatory, and “controversial”, which is also a post with current activity, but an even split of up- and down-votes.) I figure “hot” posts mean the subreddit generally agrees with the content or ideology of the sub, and “hot” also means that I’m getting a snapshot of current and fresh content. For those 1000 titles, I just counted the instances of “and” “but” “or” and “and/or”. This is what I got.
Right off the bat you see that for the most part, every sub follows the same pattern of use. “And” is the most common coordinator”, followed by “but” and “or”, which occur at usually 1/4 the rate of “and”. All the titles were in English, though that is not the case across Reddit. Even though these subs have very different content and the nature of their titles varies, this overall pattern persists. There is not much research on corpus frequency of coordinators and whether or not it’s tied to any variable. I was only able to find a few works that discuss the sociolinguistics of coordinators, for example Mary Shapiro’s 1997 dissertation, which discusses the issue briefly and I think one finding was that increased use of “and” is correlated with increased perception of formality.
Globally, I was curious if there were any effects of “braininess” or even “assumed gender” on coordinator use. I don’t have a normalized ranking of subreddits by “impressions of braininess”, but Science and AskHistorians are typically considered very brainy subreddits. They are both outliers in their frequent use of “and”. The subreddit Funny is typically not considered very brainy, and it ranks very low in use of “and” and other coordinators. More brainy subreddits are characterized by complex sentences and extended argumentation, which generally requires increased coordinator use. If I were to speculate on why Fifthworldproblems is the highest user of “and”, it would be that it uses very odd and idiosyncratic language, and users are attempting to simulate surreality but stringing together sentences (Although it remains to be seen why the seventhworld is inhabited by so few “ands”). By assumed gender, I coarsely chose two subs known for their subscriber population dominated by a particular gender: TwoXChromosomes, which deals with women’s issues, is assumed to be populated mostly by women. MensRights, which deals with social justice advocacy for men, is assumed to be populated mostly by men. These didn’t show much of a difference in use of coordinators. Another comparison could be made between the subreddits Christianity and Atheism, although it isn’t very different: apparently Christians and Atheists do not vary much in their use of coordinators.
Besides those extremely coarse and assumed pseudo-sociolinguistic variables, the more easily definable content type of subreddit seems to have some effect on coordinator use. I’ve identified three groups: question-oriented, surreal, and headline-oriented. The lines in fine dots are “question-oriented” subreddts: AskReddit and AskHistorians. The titles are either all or for the most part questions. For the question-oriented subreddits, you see an uptick in the use of “or” compared to other subs. The lines in wide dashes are “surreal” subreddits that intentionally use semantically obscure language to create a surreal world: Fifthworldproblems, VXJunkies, and seventhworldproblems. Even though the post titles are nonsense, they still follow the pattern of coordinator use. Fifthworldproblems even surpasses Science in its use of “and”. If someone really wanted to create surreal, irregular English, they would cut down the use of “and” in favor of the other coordinators. But this variation is below the level of consciousness for most. Lastly, there are “headline-oriented” subreddits, indicated by a line started with a large dot: these are Science and Politics. The posts of these subs are for the most part simply pithy article or news headlines, which is definitely a separate type of language use from asking questions or playing with language in a surreal way. It doesn’t seem like a post title being a headline makes any difference since they seem to be at opposite ends of the spectrum in coordinate use.
The clustered bar graph allows you to compare more easily the relative proportions of coordinators for each subreddit.
Some new observations emerge. It is easier to see that most subreddits follow the pattern. In order of frequency are AND > BUT > OR > AND/OR. Green > blue > yellow > Dark green. Except for AskReddit, Science, and AskHistorians (and marginally TwoXChromosomes, and politics). The three clear cases are all fairly brainy subs, the two “Ask-“ Reddits are question-oriented, showing an increased preference for disjunction. Setting these three aside: for most of the subreddits, BUT and OR appear about the same number of times. Except for VXJunkies, SeventhworldProblems, and Fifthworldproblems, where it seems like BUT is used way out of proportion, many more times than OR. I suppose this is one aspect where users succeed in using surreal or irregular English, in that they use too many BUTs.
Definitely looking for tips on visualization, but at this point I’m just exploring, so I’ll look at everything all at once.
All done! Went through about 6 different versions to arrive at this simple and clean letter-based logo. No mountains or city skylines this time, but that’s okay.
Now, if someone could teach me how to do Layer Blends in Overlay mode in a vector file, that’d be great. I made this using photoshop and pseudo text vectors, but it’s still not a proper EPS or print format.
I’m pretty proud of myself for getting this experiment together, and it actually showed some kind of result. What exact result that is I don’t know, but there’s definitely a slope on this graph.
What’s this all about? I’m keeping it secret until I hear back from the conference! I’m just too excited not to share the visual result, though.
I’m reading Steinbeck’s Travels with Charley (in Search of America), which is this adorable tale of him driving cross-country with his French Poodle, Charley. He makes all sorts of pithy remarks on the people and places he visits.
This got me thinking about my own travels. I’m kind of obsessed with tracking everything I do – Strava for every little run and bike ride, Goodreads for every page read, Instagram for every fancy dinner I have (if that’s gamification, I fully submit myself to it). And I know people track their travels in maps like this.
The really fun thing is, is that this was done in LaTeX. I found a StackExchange post giving some details about how to do this, and I made a map showing my own travels. The darkest blue are states that I’ve lived in (with lines showing big moves), and the lighter blues I’ve spent at least a night in.
I’m seriously obsessed with Karl Ove Knausgaard’s My Struggle, the monumental 6-volume autobiographical fiction that is took Norway by storm. I’m finishing up Volume 2, which mostly deals with his second and current marriage. Volume 1 is split between his teenage years and after his father’s death. Volume 3 is supposedly a lot about poop in his childhood, I’m told by my reputable source. Anyways, it’s full of angst and thoughts about getting to grips with masculinity, unimaginable writer’s block, being socially awkward, and trying to raise a family with a bipolar wife. Great stuff.
I was interested in, how much readership changes over the course of the six volumes. Each one is at least 300 pages, and they’re not typical action-packed fare at all. It’s hard to sell it to people, honestly. But for those I know who are reading through it (n=2), we’re enamored.
I just did a quick look at the number of Goodreads reviews and Amazon reviews across the six volumes. Note, only Volumes 1-5 are out in English, and Volume 5 is only out in Hardcover right now. Goodreads seems to have much more, but that’s because it collects reviews in all languages, so presumably there’s some Norwegians in there inflating the numbers. I looked at Amazon.com, which serves the US only, so there’s fewer results. And yes, I know that reviews doesn’t equal number of readers, but I’m not sure I can ever get my hands on that real number.
But look – it’s interesting that the relative numbers of reviews are generally the same. I don’t think there’s an option to feed reviews on Amazon to Goodreads or vice versa (but there is for Goodreads to other social media like FB or Twitter), so that wouldn’t explain the covariance. I think the trend is basically as you would expect. Many people read the first one to see what it’s all about, a relatively small subset of those move on, and then it trails off. But I’m bolstered to see that the downward slope Volumes 2-5 isn’t that steep. I’m under the impression that those who move onto 2 intend to continue through 5. 6 however, is a bit of a task…
Some more fun graphs. The first 5 volumes average 531.6 pages, but the last one is clearly a ridiculous outlier, clocking in at 1,119 pages (in the Norwegian edition). War and Peace (Oxford) is 1,392 pages; The Count of Monte Cristo (Penguin) is 1,276. As far as overal page counts: Knausgaard totals 3,777 pages. Proust is 4,215. Ferrante is 1,682 (331, 471, 400, 480).
The campus library has the Norwegian editions, so you can see the behemoth (and when they changed dewey decimal stickers) here. Those impressive black hardcovers are Volumes 2-6.
It’ll be some time before I make it through that, I’m sure.
Every once in a while an AskReddit thread will come up that just reaps in so much linguistic gold - this one asks for “funny ways to say yes”
Users come up with hundreds of variants. Some of my favorites:
What’s interesting is how many are patently unfunny, though. One was even called out:
Why are the funny ones funny and the unfunny ones not? Most of the ones I thought were funny invoked some kind of cultural meme. things that are so obvious and well known, that asking the question of it is stupid - which makes it funny. With the math one, there’s no “obvious” cultural meme aspect, it’s just a mere truism, or tautology.
I asked the students in my introductory linguistics class to participate in a small naming survey. They were shown a picture of a food and simply asked to type in what they would call that item if they were ordering it in a restaurant. For the 26 students who self-identified as Washingtonians, these were the numbers:
The first two don’t show any clear regional effects, although there was some consternation about the difference between the three options. I myself didn’t have as principled a distinction between “burger” and “hamburger” as some of the students did. For mussels and fries, students either never encountered that as a dish before (“other”), or couldn’t decide if they were clams or mussels.
The last two were really the object of study. My roommate, a native Washingtonian, has named quite a few things that were surprising to me (“Honey Bucket” for portapotty??) but none stood out quite as much as “jojos”. I wanted to see for sure whether it was a Washington thing or not, and it seems like it is. An informal survey of my friends and family back in Los Angeles definitely confirms my suspicion – no one there calls these thick potato wedges “jojos”, and no one could guess what they were.
Lastly, the popular drink characterized by tapioca balls and a thick straw – what’s it called? I really want to do a larger-scale study of this exact item. Lots of sources have told me that Southern California is the odd one out here – only there is the drink uniformly called “boba”. It seems this survey supports that, with most of the Washington students preferring to call it “bubble tea”.
Somebody on Reddit posted their store’s EOD report (probably not the best idea for them, but oh well) and the list of numbers was just begging to be plotted. So I did so.
I learned how to make a running totals column in Excel, and used a “Filled Radar” chart for the first time – I thought it showed the hourly fluctuations particularly well, since it’s an hourly report and the chart resembles a clock.
I was talking with a student at office hours about Formal Languages, and the discussion dredged up all sorts of memories about old work I’ve done. In particular, I was resminiscing about this phase when I was really into the idea of automata, and their generalization across domains, not just natural language. Coffee was a target, so I put together a finite state automata that would generate all coffee and espresso drinks.
Look, I know automata aren’t flowcharts, and this isn’t rigorous, but if we think of the components of a latte as morphemes (milk, coffee, water), and they’re put together to create a word (drink)…
Continuing my latest bandwagoning on the data train, using a method I learned in class which has turned out to be insanely applicable to everything I'm interested in (Pivot Tables in Excel are amazing), I did some organizing and preliminary analysis of my data for my reaction time experiment. Preliminary analysis is giving me the following bar plot:
Which, seeing that nice curve - is intensely gratifying. Coming from a totally dyed-in-the-wool theoretician for much of my career, I'm slowly beginning to experience the rewarding feeling of doing experiments: it's a really long and involved process to read the background lit on a domain of study, find a research question, design an experiment addressing that question, recruit participants, run the experiment on humans, get a bunch of data, analyze it, and get some awesome plots that look exactly how you want it to. I'm so excited -- not just for the fact that this means the design of the experiment was sound and I can safely proceed to Experiment 2, and finish this dang paper, but also for the fact that I guess this means I can officially add "experimental methods" to my skillset.