An Open Letter to Mayor Bloomberg & the Future Head of the DOITT

I was part of a team that recently submitted an app for the NYC BigApps contest and therefore had the opportunity to play with a lot of the data sets in the NYC Data Mine.  In the spirit of continuous improvement, I wanted to suggest how the city can take a great resource and make it even better.

After combining nine data sets (and being unable to combine many others), we feel that we can talk to some of the challenges in working with the data and the overall opportunity to improve it.  Our main suggestion is that the City should think of itself as a platform that provides standardized data and end users like ourselves add value by bringing together the different data sets and 3rd party services.  We can’t claim to have created the “government as a platform” idea, but we can tell you some of the issues New York City faces if this is the path you (hopefully) take.

With that, here are a few suggestions:

  1. Make all your data machine readable.  Some of your data is incredibly rich, but buried in formats that are hard to parse.  The best example is probably the community district demographic info: there are over 100 spreadsheet pages of rich data there, but it’s incredibly difficult to use in CSV form.  Ideally, this would all simply be XML and it could be reused in seconds.
  2. Open source formats please.  Some of your data is locked in proprietary formats, accessible only to users with multi-thousand dollar software packages (a great example are the gdb files for ArcView shape data).  If you want random citizens like us to mash up your data, please provide the data in exclusively open source formats as our software budget is $0.  (It looks like this is changing – I noticed that some of the gdb files at the Data Mine now include shape files as well.  Kudos)
  3. Add lat/longs (the commercial standard for mapping services) to every address/point.  Some of your address data consists just of addresses without lat/longs.  Some of these (e.g., certain school locations) are improperly formatted, so we can’t use commercial systems like the Yahoo Geocoding API to calculate lat/longs and put them on a map or assign them to a district/neighborhood.  Other files contain lat/longs but they’re in a different coordinate system: this means that developers need to convert them into lat/longs in order to map them – and this process introduces errors and takes unnecessary time.
  4. More documentation.  Some of your files contain lots of data but lack the documentation necessary to make them usable.  The best example are the health info shape files: they’re full of statistical data for different areas, but there’s no way to figure out exactly what each of the statistical values means.  Similarly, the shape files for each community district contains their area – but there’s no unit of measurement (it’s something called ‘internal units squared’).
  5. More data.  You gave us some great data to play with, but there’s still a ton more.  What about Compstat for crime data?  School scores and their trends over time?  The location of different alarm boxes so we can see how response times vary by area?

These suggestions are meant entirely as an opportunity to build off of a great start.  We – the people of New York – want to work with the city to help make it a better place.  However, the value we add is in creating tools and finding hidden relationships in the data – not in standardizing it; that’s where you can make the system work.

If you provide us with well-formed, machine-readable, standardized data, we’ll help build the services that citizens need and free up the city’s resources to focus where they’re needed most.  We’ll also find new relationships in the data that might help you rethink policy initiatives (see how your data suggests an interesting link between ‘education’ and ‘going green’; I’ve no idea if you knew this).

2009 was a great start for a more open NYC.gov.  Can’t wait to see what you do in 2010.

Uncovering UncoverYourCity.com

So last week, Wendy, Jill and I unveiled UncoverYourCity.  This is a site that we put together as part of the NYC Big Apps competition (you can vote for us here; you’ll need to create an account).  We want to share a bit of background on what the project is and why we did it.

Why?

There’s a nascent movement called Government 2.0 which seeks to apply the principles of the web to government and make it more open and efficient.  One of the first steps in governments becoming open is making their data available online for any citizens who want to use it.  The government has some of the most interesting data out there – everything from demographic info to build permit locations to school scores – and they’ve got more information than just about anyone else.

This movement has gained a lot of traction at the municipal level and San Francisco, Vancouver, Toronto and New York are some early cities to start putting municipal data online.  New York has gone a step further by creating the Big Apps competition to get people to showcase what could be done with the data.  We decided that we wanted to create an app to support the the city and also learn what could be done with the data.

So What Is It?

We wanted to create an app that would compare the quality of life in different New York neighborhoods and help people find the neighborhood that was perfect for them.  If you know this city, you know that there are 8 million people that exhibit remarkable diversity.  It’s what makes the city magical but also makes it hard to grasp.  We wanted a tool to help people grasp it.

However, we quickly realized that this was way too hard to do (more on that in a future post) and that we weren’t comfortable placing a “quality of life” ranking on different areas.  Instead, we decided that the right thing to create was an app that would let people learn more about the neighborhoods they live in and compare them with others.

The result is UncoverYourCity.  We’ve combined almost a dozen different data sets (sounds easy, but it’s not) so that people can see how their neighborhood squares with others.  You can use it to discover the leafiest streets in NYC, compare the neighborhoods with the highest and lowest murder rates (bet you don’t guess either one)  or see interesting relationships like that between poverty and renting.

This isn’t a gimmick, rather, we believe it’s got the potential to help you see the challenges facing the city in a new way.  Take the Mayor’s plan for making the city greener.  I’ve no idea how the city is thinking of making the city greener, but one hypothesis might be that if we increase population density we might be able to increase recycling rates (if you live in condos, etc. they usually have recycling designed into the building).  However, our stats suggest that there’s no relationship between recycling and population density:

However, there’s a pretty strong relationship between education levels (% population holding bachelor’s/graduate degree) and recycling rates (graph below). This suggests that making the city greener may need to include elements to improve education. It’s a similar story if you compare recycling rates with medium household income or poverty rates.

The tool can also show us outliers that may represent opportunities to learn new approaches to apply elsewhere in city.  One of my favorites is the relationship between Median Household Income and Family Poverty.  There’s a big outlier in the bottom left of the graph: Brooklyn Community 13 – if it was like other districts, based on its income it should have a poverty rate of about 28% but instead its holding out at 18%.

Is this due to the housing projects of Coney Island working as planned?  Maybe it’s the tight Russian community of Brighton Beach taking care of their own and making sure that everyone’s doing okay.  Or maybe Sea Gate’s population is so affluent that it skews the poverty level down.  I don’t know, but if I were trying to reduce poverty in the city I’d try to find out.

So give the app a try.  It’s not perfect – the site’s a bit slow (we’re not great programmers) and the navigation can be awkward (we ran out of time to get it polished) – but there’s something there for everyone.  If you want to learn more about how we built it and why the Gov 2.0 movement is important, stay tuned to this blog (we’re also open sourcing all the code; stay tuned for links to code and data).  And, when you’ve got a moment free, vote for us.