December 17 2016

A little factoid gets thrown around a lot around election time that the Democrats tend to win when the voter turnout is high. Obviously this isn’t really talked about much by Republicans. It’s commonly believed among Democrat circles that this is the reason why Republicans campaign for more restrictive voter registration laws. Republicans obviously don’t agree, stating the laws are to prevent voter fraud. So who’s right? Well it’s pretty tough to get ahold of voter fraud data (studies are at best inconclusive), but getting election result data isn’t. I’ve built a dataset with the FEC election results from the years 2000 - 2014 that’s designed to work really well with R’s tidyverse. If you’re following along with the code, the dataset’s located in fec-election-results/fec_tidy.csv.

August 11 2016

The May release of Kafka 0.10 included a very cool new component: Kafka Streams, a stream processing library that directly integrates with Kafka. There are a number of things that Kafka Streams does differently from other stream processors, and the best way to learn is through example. To that end I’ve chosen a highly relevant example in the social networking space, something that I think any developer can put to use immediately.

June 01 2016

Almost everyone in Congress has a Twitter account. Twitter even released a guide for politicians to effectively use the platform. For one thing, it’s another social channel to communicate with constituents (looking at you, @CoryBooker). More interestingly it’s also used to promote awareness of certain political issues and push agendas. Most people use hashtags for pointless stuff or shamelessly self promote:

December 02 2015

There’s been a lot of dicussion about religion and violence in light of recent events, with some claiming that certain religions are inherently more violent than others. I thought it would be interesting to examine this assertion by looking at holy texts from six major religions (Buddhism, Christianity, Hinduism, Islam, Judaism, and the Church of Jesus Christ of Latter-day Saints) and “measuring” the violence in each.

August 16 2015

OR: How the DCOS turned me into a cluster computing MacGyver.

August 15 2015

I thought it would be fun to see how the candidates ranked between the Profanity Power Index and the major polls. So I pulled some data from Real Clear Politics, converted the numbers to ranks, and compared them to the ranking by total profanity used during the debate.

August 07 2015

Last night was the first Republican presidential debate of the 2016 race, and while many were thoroughly enjoying what I’m sure was a spectacular shitshow, I was hard at work monitoring the Twittersphere to collect the data everyone really wants to know about the candidates: who’s getting cussed at the most (spoiler it’s Trump).

July 12 2015

This is a follow-up to an incredibly insightful post by Charles Petzold. You really should read it first, it’s quite excellent.

May 17 2015

Random walks are a very handy simulation tool. They can be used to simulate financial data (e.g. stock prices), count data as a function of time - even molecular motion. They can also be used to traverse graphs. This post explains the basics of the random walk in a couple of different scenarios (with pictures!) and shows some tips for implementing them in really nice ways. The language of choice here is Julia, but the implementation details apply to any language with a decent random number generator and a reduce function.

April 04 2015

I’ve been spending a lot of time learning functional programming lately. Scala got me into it first, but it wasn’t long before I shed my object-oriented background and started working in the land of expressions and immutable structures via Clojure. After doing several problems on exercism I began to notice some pattern with a little function called reduce, which seemed to be popping up everywhere as a kind-of Swiss army knife for collections. I started exploring deeper and it wasn’t long before I started seeing words like lambda calculus, monads, and combinators. I don’t have a computer science background so I’ve been learning this as I go. These are my thoughts on the basic practical uses of reduce. I plan on going deeper into the theory as I learn more.

March 15 2015

I mess around with Twitter data a lot. I like to poke around and ask random, mostly irrelevant questions about tweets and make some plots. Rather than working with huge datasets and doing hardcore machine learning, I prefer just to learn about the datasets and have a little fun, so when I fetch data from Twitter it’s usually a pretty modest amount.

February 01 2015

The Gadfly package is pretty much the coolest thing since sliced bread when it comes to plotting in Julia. While it’s designed for plotting statistical graphics, having it’s roots in Hadley Wickham’s ggplot2 for R, it can definitely serve to make some really nice plots of any kind.

January 18 2015

Last weekend I attended Data Day Texas here in Austin and it was awesome. I decided it would be fun to look at the tweets for the day, hashtagged (real verb) with ddtx15. These tweets were scraped around 9:00 PM on Jan 10, 2015 using the REST api.