Vancouver City Talks!
General description: A web application which computes/visualizes the quality of life in your neighborhood.
Implementation period: Feburary 2013
Technologies: MongoDB, PhP, JS, & HTML
Event: International Open Data Hackathon.
Contributors: Kazem Jahanbakhsh, Priyanka Gupta, Ajay Sridharan, Amir Moghaddam, & AmirHossein Hajizadeh
This application has been developed for Vancouver Open Data Hackathon. We have used different types of data from Vancouver city (e.g. crime rate, business licences, property taxes, parks, libraries, and schools) to compute an overall score for different parts of Vancouver city. The core of the application is our algorithm for predicting neighborhood quality by integrating different types of data.
For our model layer we took the following steps:
1- Cleaning/Normalizing/Parsing Data: parsing these data and normalize them in very uniform format which only contains important information along with lat/lon
a- Crime Data:
Value: tells you how safe your area is. It has the time dimension which should be taken into account.
Challenge: crime location comes in truncated street format like "18XX SPYGLASS PL", so we need to use google or yahoo api to get the lat/lon. time granularity is in month/year.
b- Property Tax Report:
Value: richness of different regions in city
Challenge: translation from address to lat/lon
c- Business Licences:
Value: we can use this data in order to find nearby reataurants. this shows how much you can enjoy in your neighborhood and its a notion of
value: how green is your neighborhood.
Value: another factor for the richness of different regions.
Value: the density of schools around you for families.
2- Importing Data: We imported the cleaned version of each dataset into MongoDB. All of our queries are geospatical queries.
3- Computing Score: We segementized Vancouver city into regions (circles) parametrized with their centres and radius. We computed a score for different regions which aggregates all datasets and come up with a single score reflecting the quality of life in different neighbourhood. We used a linear function to map the six features to the output dependent variable as shown below:
score = c1*crime + c2*business + c3*tax + c4*libraries + c5*schools + c6*parks
The main challenge is computing scores as well as making sure that the assumption about the linear relationship is valid.
For visualization we used Google Heatmap Layer.
The following image also shows our protyping/planning steps for building the final app:
Another interesting approach would be to use ideas from statistical graphics in order to visualize data s.t. user can easily find out the quality of different neighborhoods by looking at our data map. Edward Tufte in "The Visual Display of Quantitative Information" book have listed a set of properties that a statistical graphics should satisfy:
Considering above requirements for data visualization, now the main question is how we can design a visualization layer which consumes different kinds of data and present them naturally s.t. a viewer can find out the quality of life in different neighborhoods.
- show the data
- help the viewer to think about the substance not the methodology
- avoid distorting what the data have to say
- present many numbers in a small space
- make large data set coherent
- encourage the eye to compare different pieces of data
- reveal the data at several levels of detail, from a broad overview to the fine structure
- serve a reasonably clear purpose: description, exploration, tabulation, or decoration
- be closely integrated with the statistical and verbal description of a data set
You can check the app from here: VanCityTalks App!
You can download the source code for this project from github: VanCityTalks Github Repo.
Me and the team in Open Data Hack!
You should follow me on Twitter.