In class, I’ve mentioned a few times K-Means as a way of getting color data out an image.  It’s basically a a grouping system, that let’s you find clusters in data.  Here’s some info: 


and some links I think you might find interesting: 


the hue-histograms I particularly like — it’s a completely other way of thinking about image representation. 


using tools like curl, and working in groups, automate the download of a large quantity of data. Think of something massive — every animated gif on wikipedia, an image for every number from 0 to 1,000,000, the google hit results for the 1000 most popular names, 10000 pictures of justin bieber, etc. Again, try to imagine culturally significant searches, such as “family” + year, as I showed you (“family 1961”, “family 1921”, etc) and to consider a search string where it’s easy to check a bunch at once.

For example, with family, we can do:

for (int i = 1900; i < 2012; i++){
string term = “family + ” + ofToInt(i); …..

this has some useful info about CURL, such as how to add a user agent


some interesting resources:
http://www.propublica.org/nerds/item/doc-dollars-guides-collecting-the-data https://scraperwiki.com

once you have a the data set, what are interesting or provocative ways you can sort the data. The goal of this project is lots of data. We don’t want to do these things by hand, but rather, use the power of the computer for you. It’s great at doing things repeatedly.

please do use me as a resource too — I’m happy to help.



for Curl, please try curl -L -o outputname urlToDownload

for more complicated urls, you can use this as a good guide:



  • write a regular expression to find some specific kind of text in a larger piece of text.  Show an example of this.  (you can use a tool like http://reggyapp.com/ or http://gskinner.com/RegExr/). 
  • experiment with the command line too “curl” - use it download things from the internet.  Can you use it download many of something?  Bonus points if you scrape something massive
  • take a look at the “sorting” code I showed you and think about what sort of data you’d like to load and sort. 

come prepared with questions!


pop culture visualization. 

Think about the intersection of popular culture and the language of charts and graphics. Create a visualization that’s funny, irreverent, witty, tells us something profound or meaningless, but it must contain a significant quantity of data.  think 50+ data points. 





the ad at the beginning is pretty terrible, but this app is pretty fun. 


With the mindset of tufte’s graphical excellence, find a data set the shows change, ie show data captured at two or more periods of time.  Visualize these data sets in an elegant way and use visual form to help the data tell a story. 

this about a question, huntch or intuition that leads you to collect data which tell you something about the human condition:

  • what it means to be alive
  • what the world is made out of
  • how do we live now vs how we lived before 
  • what changes have occurred that tell us something meaningful

data can be used to tell stories so please try to find a good one.   Think about this as the anti / opposite of your first assignment, which was about fairly personal data. 

watch this great set of videos and be sure to take a look at the links: 


you can podcast it, or watch it with the annotations.  There’s some real heros in this video and good lessons in there too.  I like the section on tools.    Come prepared to talk about projects referenced here that stick out to you as particularly good. 



Graphical Excellence

is that which gives the viewer the greatest number of ideas, in the shortest time, with the least ink, the smallest space, and which tells the truth about data


— Edward Tufte