A little factoid gets thrown around a lot around election time that the Democrats tend to win when the voter turnout is high.
Obviously this isn’t really talked about much by Republicans.
It’s commonly believed among Democrat circles that this is the reason why Republicans campaign for more restrictive voter registration laws.
Republicans obviously don’t agree, stating the laws are to prevent voter fraud.
So who’s right?
Well it’s pretty tough to get ahold of voter fraud data (studies are at best inconclusive), but getting election result data isn’t.
I’ve built a dataset with the FEC election results from the years 2000 - 2014 that’s designed to work really well with R’s tidyverse.
If you’re following along with the code, the dataset’s located in
The May release of Kafka 0.10 included a very cool new component: Kafka Streams, a stream processing library that directly integrates with Kafka. There are a number of things that Kafka Streams does differently from other stream processors, and the best way to learn is through example. To that end I’ve chosen a highly relevant example in the social networking space, something that I think any developer can put to use immediately.
Almost everyone in Congress has a Twitter account. Twitter even released a guide for politicians to effectively use the platform. For one thing, it’s another social channel to communicate with constituents (looking at you, @CoryBooker). More interestingly it’s also used to promote awareness of certain political issues and push agendas. Most people use hashtags for pointless stuff or shamelessly self promote:
There’s been a lot of dicussion about religion and violence in light of recent events, with some claiming that certain religions are inherently more violent than others. I thought it would be interesting to examine this assertion by looking at holy texts from six major religions (Buddhism, Christianity, Hinduism, Islam, Judaism, and the Church of Jesus Christ of Latter-day Saints) and “measuring” the violence in each.
OR: How the DCOS turned me into a cluster computing MacGyver.
Last night was the first Republican presidential debate of the 2016 race, and while many were thoroughly enjoying what I’m sure was a spectacular shitshow, I was hard at work monitoring the Twittersphere to collect the data everyone really wants to know about the candidates: who’s getting cussed at the most (spoiler it’s Trump).
This is a follow-up to an incredibly insightful post by Charles Petzold. You really should read it first, it’s quite excellent.
Random walks are a very handy simulation tool.
They can be used to simulate financial data (e.g. stock prices), count data as a function of time - even molecular motion.
They can also be used to traverse graphs.
This post explains the basics of the random walk in a couple of different scenarios (with pictures!) and shows some tips for implementing them in really nice ways.
The language of choice here is Julia, but the implementation details apply to any language with a decent random number generator and a
I’ve been spending a lot of time learning functional programming lately.
Scala got me into it first, but it wasn’t long before I shed my object-oriented background and started working in the land of expressions and immutable structures via Clojure.
After doing several problems on exercism I began to notice some pattern with a little function called
reduce, which seemed to be popping up everywhere as a kind-of Swiss army knife for collections.
I started exploring deeper and it wasn’t long before I started seeing words like lambda calculus, monads, and combinators.
I don’t have a computer science background so I’ve been learning this as I go.
These are my thoughts on the basic practical uses of reduce.
I plan on going deeper into the theory as I learn more.
I mess around with Twitter data a lot. I like to poke around and ask random, mostly irrelevant questions about tweets and make some plots. Rather than working with huge datasets and doing hardcore machine learning, I prefer just to learn about the datasets and have a little fun, so when I fetch data from Twitter it’s usually a pretty modest amount.
The Gadfly package is pretty much the coolest thing since sliced bread when it comes to plotting in Julia. While it’s designed for plotting statistical graphics, having it’s roots in Hadley Wickham’s ggplot2 for R, it can definitely serve to make some really nice plots of any kind.
Last weekend I attended Data Day Texas here in Austin and it was awesome.
I decided it would be fun to look at the tweets for the day, hashtagged (real
These tweets were scraped around 9:00 PM on Jan 10, 2015 using the REST api.