Search Query Visualization
Project 3 - Visualization of AOL Search Queries using VTK - CS 526
In this project, I chose to visualize search data that was released by AOL for the scientific community. This data encompasses about 36 million (36,389,567) web queries collected from about 657,427 AOL users over a period of three months (01 March - 31 May, 2006).
Reference: G. Pass, A. Chowdhury, C. Torgeson, "A Picture of Search" The First International Conference on Scalable Information Systems, Hong Kong, June, 2006.
I put all the data into a MySQL database and split the queries into words and consolidated them into Word Vs. Count values. This gives us the "Top Queries". The application shows these words by varying the size of the text for each word based on the number of times it was queried. (Time Taken: 3 days.)
When you click on a search term, the application loads up all the records from the database into memory, and draws various charts and graphs which helps us infer a lot of things about the way people look for data.
The simplest and direct way of looking at the data is by looking at literally all the queries in one screen. Search Query Visualization plots each occurrence of the selected word in a query as a point in graph with Date and Time as the X and Y co-ordinates. Click on a point and it tells you what the search query was, when it was queried, the URL of the page that the user clicked on from the search results and the page rank of that URL.
Then, there is grouping the data by Day of the Month, Day of the Week and Hour of the Day. Click on the images above to see a more detailed version of the sample visualizations.
The more interesting part of the application is to see relationships between words. To see what a particular term was search along with. This is beautifully portrayed using the sphere relationship images. It tells you how often another word was searched along with the current search term.
An even better thing is to find out what people who searched for a particular word also searched for. Once you have selected a term to visualize, click on the big button that says "See what they also searched for". It will take a while to come up with the words, because it takes the people that searched the current term, find out what they also searched for, sort it, sieve it and present it to you in varying sizes of text based on their count, in ascending order. Whew!
I used VTK along with C# on the dot net 1.1 framework to develop the application. I used a MySQL connector for dot net to talk to the database. It worked pretty well with each other. The entire application along with the dataset is a couple of gigs.