Archive for the ‘Data Panning’ Category
SXSW Music Data Pr0n, Take 2
This time I used IBM’s Many Eyes to visualize the data set:




I think I like Many Eyes’ visualizations better than Swivel. It’s wonderful that there’s more than one site for this kind of thing. I hope even more show up.
Swivel Dataset: SXSW Band Origins
I wrote a web scraper for the sxsw music site using scRUBYt and posted some of the results to Swivel:
By state:
and By City:
Many Eyes: Data Visualization is the New Porn
IBM’s Many Eyes is like Flickr for data visualizations. Hm. That phrase sounds familiar.
Many Eyes left a much better first impression on me than Swivel though. Here are Causes of Death, and US government expenses 1962-2004:
Biblical Citations via Google
I wonder which Bible verses are cited most often on the web. What are the most popular pieces of Holy Wisdom, according to Google?
I seek to answer this question with some code I’ve written.
Here’s a chart of Google results for every verse in Genesis:

I’m writing an interactive browser for this dataset in Proce55ing (that chart is a screenshot actually), so stay tuned if you want to see the finished product.
There are some interesting problems with displaying a chart like this on a normal computer screen. For one thing, you’d need a screen 31,000 pixels wide in order to display it at a one-pixel per verse resolution. That’s one hell of a sparkline.
I’ll probably borrow a few ideas from some of the Human Genome Project data browsers and visualizations out there on the net.
What I’m aiming for at this point is a way to seek out the most often cited verses, and see them in context of who quotes them and why.
It’s obvious that some verses are more popular than others, but why? From a textual analysis of the most popular verses, I’d also like to build a Markov process to generate new verses that might also be appealing.
God damnit. Somebody already did it. I’m starting to get sick of this postmodern creative dilemma shit.
[Edit: Fucking shit. I hate my life. Just one original idea. That's all I fucking want.]
TV Sitcom Front Door Locations Via Swivel
The data set I mentioned yesterday (where the front door is on various sitcom sets), after I uploaded it to Swivel:
For some reason, front doors on sitcoms tend to be on the right side of the screen more often that on the left. Of course this is only true out of the shows I remembered to include in this data set. I don’t know if even my OCD tendencies would allow me to take the time to make an exhaustive list of tv sitcom door locations.
Netflix Prize
I registered to compete for the Netflix Prize. I don’t have a plan yet but I’ve got some ideas.
Marc at Orielly Radar and Greg Linden point out that mashups are probably going to be a popular strategy. IMDB and Amazon are the obvious complimentary datasets, but what about others?
For instance, I wonder what NetflixPrize + AOLDataLeak can produce.
Update:Oreilly Radar points to a Google Tech Talk by Dan Frankowski on- get this- how to identify individuals from anonymized movie ratings (he mashes the ratings up with non-anonymous movie message board posts).
AOL Data Leak = Nonfiction Spoon River Anthology?
In reference to the AOL Data Leak, Ryan Meek pointed me to a book called Spoon River Anthology by Edgar Lee Masters. It’s a story of a town as told by the epitaphs of its citizens (emphasis mine):
The Spoon River Anthology
The original work was published as a serialized version in 1914-15.
In the Anthology, the dead in an Illinois graveyard relay, in matter-of-fact but haunting tones, details from their lives. The Anthology was original, provocative and influentual. Its literary significance has been compared with Walt Whitman’s Leaves of Grass [published in 1855].
Masters wove a thread of partial reality throughout the Anthology. Many of the characters and their experiences can be identified with former residents of Lewistown and Petersburg, Illinois. Masters’ used his childhood experiences in these two communities, as a basis for the poems.
Obviously the AOL users are (presumably mostly) still alive, and are very much from the non-fiction section. As well, the data covers only three months and was not intended to summarize each user’s life, as opposed to the epitaphs in Spoon River.
Deacon Taylor
I BELONGED to the church,
And to the party of prohibition;
And the villagers thought I died of eating watermelon.
In truth I had cirrhosis of the liver,
For every noon for thirty years,
I slipped behind the prescription partition
In Trainor’s drug store
And poured a generous drink
From the bottle marked “Spiritus frumenti.”
Compare to AOL anonid 39509 who searched for
- “games for church youth groups”
- “victory christian church”
- “religious contact tables for myspace”
- “pictures of jesus in the clouds”
- “starting a small bible study group”
…and
- “preteen nude pics”
- “midget porn”
- “free sex personals”
- “porn” immediately followed by a search for “youth games”
Clickable Imagemap from Beautiful Evidence
As I promised in my previous post: a Clickable Imagemap of “Cubism and Abstract Art”.
I wrote it on OSX/Firefox/Safari. I guess I’ll find out tomorrow at work if it breaks on Windows/IE and issue apologies and corrections as appropriate.
There were some other things I thought of doing but couldn’t justify spending time on:
- highlighting the selected movement in the diagram itself (the heading to the right already tells you what you’re looking at)
- more examples (collecting the images was the most time consuming part)
- google map mashup (there’s room below the diagram for a map but that just seems gratuitous)
I’m not even half-way done with the book (can you tell I’m in love with it?) and I have a half page of notes I plan to write about.
Texas Legislature Entity Relationships
This afternoon (between making chocolate bacon and working on the Export Atlas – whew. busy day.) I tried to install a Content Addressable Storage (yay, more storage buzzwords!) server app a friend of my is working on. Since this app would nuke all the hard drive contents on this otherwise unused box, I checked to see if there was anything I wanted to salvage first.
There was!
Amongst a lot of virii and spyware which rendered this particular machine useless many moons ago, I found bunch of XML files and scraper code from a next-generation version of this thing I wrote a couple of years ago. So I pulled them off that PC and moved them to my mac, where I am going to continue to develop the idea anew.
This was a standalone demonstration app that ran entirely within IE, using javascript and the Microsoft.XMLDOM COM object to generate HTML output via XSLT. Consider for a second that several of the source XML files are a couple of megabytes long, and you can understand that this was a slow piece of crap that went out to lunch every time you clicked on something. It did prove the concept though.
With it you could see a summary of any committee, which includes the most recent hearings, bills passed or allowed to die, and more interestingly, top campaign contributors. This is an idea I’d like to continue to develop so here’s the first step in its reincarnation as a server-based web app. Here’s the entity relationship diagram for the database:

I wrote new XSL docs to transform the source XML into SQL insert statements and imported all of the legislators and their committees into a new MySQL database this afternoon. As an indicator of progress, this means three of the fifteen boxes in the above diagram have been migrated from XML to MySQL so far.
I’ll probably alternate between migrating data and building the UI in php over the next few weeks.
I don’t have any good reason to be doing this, seeing as I make no money from it, and it is a considerable amount of work. I guess it beats collecting stamps.
Or dating.
