11 3 / 2013

Creating an in-framework copy of PopIt instance(s) data

by edmundvonderburg

We’ve been thinking about how to access PopIt data and have decided that the original approach of “always query the API” is not the best one. Instead we are planning to make it really easy to copy (and keep synced) all the data from a PopIt instance to the data model used by your preferred framework.

Scenario: You are building a website in your favourite framework (like Rails, or Django, or whatever). You find that some of the data you need for your site is already available from an existing PopIt instance, or you decide to use PopIt for your person data (so you can share it easily).

Querying the API: This is seemingly the simplest, but has the following problems:

  1. Constantly querying the API for data. If it goes down or is slow your site suffers. Latency will always be higher than a local db.
  2. Probably need to store something locally as well to know what is in the API to serve.
  3. Can’t use the framework tools you are used to for complicated queries against other data you store. Running complicated queries against the API is tricky (or not possible).
  4. Using data from several PopIt instances tricky, how do you know which one to query for it?
  5. You have a constant nagging feeling that you should have just created your own models, even if they will later prove to have issues.

Syncing data locally: You use tools that we provide to sync data from one or more PopIt instances locally.  Advantages are:

  1. Data is local so there is no network latency in accessing it.
  2. You can use the framework native tools to query and manipulate it (allows for complicated queries).
  3. Updating the data from the PopIt instances is simple - scripts provided.
  4. Data from several PopIt instance can be treated as one dataset.
  5. Add your own data that is not in the PopIt instances.

But there are some downsides:

  1. Data needs to be updated on the PopIt instance and then synced. We’re not planning to be able to sync back local changes to the PopIt instance.

We’ll be fleshing out the ideas more in future, but this is the general approach we’re going to pursue. Comments welcome.

20 7 / 2012

How to define the mapping for the migration tool

by dmoritz

Hi there. As you might remember, I am working on PopIt for my summer of code project. Today I want to show you the migration tool which lets you import an existing database into PopIt. I would really appreciate if you could try it and let me know about any issues you came across or feature requests.

The migration tool is intended to help you with your initial import into PopIt so that you can start right away. The migration process consists of 3 steps.

  1. Upload you csv (Comma Separated Value) on the migration page
  2. Define the mapping (the type of data that can be found in your csv)
  3. Let the tool do the import (this might take a few minutes, depending on the amount of data)

The definition of the mapping is definitely the most difficult part which is why I will go a little bit more into detail. Imagine you have a CSV file which looks like the following file.

ID,  Name,     Email,              Wikipedia,             Address,           "Party Name"
0, John Doe, jd@party.co.uk, www.wikipedia.org/jd, 123 Baker Street, Cucumber
1, Susie, susi@party2.co.uk, , 42 Star Street, Tomato

The migration tool won’t be able to figure out what you mean by the different column names which is why you have to define a mapping and tell the importer what the different columns mean. For humans it’s pretty obvious that “Wikipedia” means a link to the wikipedia article.

So in above case you should tell the migration tool that “Wikipedia” is a link with the type “Wikipedia”.

In the image below you can see an example mapping.

To improve the results, keep these tips in mind:

  1. Always provide a full name, fist name or last name.
  2. if you leave the schema group field empty, the corresponding column will not be imported.
  3. Use unique names for the database attribute.

You want to try the migration tool with your own data? Great. Head to http://www.gsoc12-popit.devtation.de/migration and upload your example file. Let me know what you think in the comments.

In order to log in, use the following credentials:

  • user: test@test.co.uk
  • password: tJo1zBum

The database will be deleted every night at 5:30.

Tags:

Permalink 1 note

26 6 / 2012

PopIt Life Cycle

by edmundvonderburg

If you start a PopIt site and it becomes a huge success then we expect that you may leave our hosting platform. And we regard this as a good thing. This post explains what our ideal lifecycle for PopIt sites would be.

At the start you’ll just have the desire to run a transparency site. You may or may not have some data that you’ve already gathered. You create a PopIt site and start to enter in the details you have. 

You then share your site with others and they start to suggest details to add, like missing phone numbers or people. You start to give some of these people user accounts so that they can edit the data directly. Others start to build things using your data which is available over an API, which means that more people suggest new data and corrections.

You now have a site with data in, and a small community of people who are helping to maintain it. Perhaps at this point you decide that you’d like your own site so you create one. You access all the data via the PopIt API and use the PopIt web interface to edit and maintain it. This is much easier than creating a whole site from scratch.

Now that you have your own site you want to add more data that PopIt does not support. You continue to maintain some of the data through the PopIt web interface, but some additional data you provide by writing it using the API. Perhaps to create other data-stores that you access through an API.

You’d like your site to be faster, so you move the data and code onto your own server by installing the PopIt code base and us sending you your database.

Finally you’ve made so many changes to the data that you’d rather talk to the database direct - you stop using the PopIt API completely and access the database directly. Perhaps you still run the PopIt code though - so that others can access your data through the API.

We don’t expect all sites to take all the steps above - many will not need to go all the way to fulfil their aims. But if you need to you can - there is no lock-in at all. Our intention is that there should be minimal barriers to getting started and making progress.

30 5 / 2012

Making PopIt sites discoverable

by edmundvonderburg

Even better than easily creating a transparency site using PopIt is finding one that has already been started and joining forces with them. To do this PopIt sites need to be discoverable.

This post is about making this possible.

The “Hosting Site” part of PopIt is at http://popit.mysociety.org/ and is currently just a way to create new sites. In time I’d like it to be where you go to browse all the PopIt instances. You should be able to search their names and descriptions, and view them on a map regarding the physical areas that they cover. This would be both the sites that we host and the ones run by others on their servers.

Each site will make some metadata available about itself over the API. The details will include the site name, description, area covered, number of entries, number of recent additions/edits. The hosting site will poll this information occasionally and store it locally.

The clever bit will be that each individual site will be able to ping the hosting site and announce itself. The hosting site will then fetch information about it using the API. This will allow sites that others run to appear in the central registry (if they want it to). This provides a central place to search for existing sites, something that is very hard to do currently.

If a site goes away the API call will fail and after a few fails it will be listed as missing in action.

We should even list sites that are not run on PopIt but that make information available about them in the same way. After all it is the data that matters, not how you store it.

21 5 / 2012

GSoC 2012

by dmoritz

Hello, my name in Dominik and I’m one of the two Summer of Code students that work on projects from mySociety, more precisely PopIt. I’d like to use this post to briefly describe what I will be doing.

My summer will be split up into two parts which will have a different focus. In the first half I will work on PopIt and will help to get it to a point from which I can start my second project for the second half which will be a website that collects information about university professors.

You may ask why university professors?

I think that university professors play a very important role in our societies. They teach future generations, give statements in the media and are considered to be generally credible. However, it is often very difficult to find background information that helps you understand statements. That is why I want to build a website that offers a place to make important information available.

Developing this platform will also help validating the PopIt API and data model.

Open does not only mean availability, but also accessibility!

Okay, back to the first half. First, I will work on issue #95, being a simple migration tool that lets you initially import data into PopIt. After that I will continue to work on PopIt but I don’t know exactly what it will be.

That being said, I am excited about this summer and will post updates soon.

18 5 / 2012

from_this toThis

by edmundvonderburg

I’m used to, and prefer, the Perl variable convention of names_like_this.

JavaScript convention is namesLikeThis.

PopIt is currently somewhat confused and I just tried switching over with some shell commands:

ack --type js -f             | xargs -n 1 perl -pi -e 's/(\w)_(\w)/$1 . uc($2)/eg;'
find */views/ -name '*.jade' | xargs -n 1 perl -pi -e 's/(\w)_(\w)/$1 . uc($2)/eg;'

Needless to say this was over zealous and broke lots of things. Notably the keys used in the database. We’ll have to implement the schema updaters before we can run this properly.

But notice is hereby given that this change will occur at some point :)

18 5 / 2012

Blocks of work in our future

by edmundvonderburg

On Monday I’m joined by a couple of GSoC students on the PopIt project. Hurrah!

These are some of the big blocks of work that we’ll be dividing up and tackling:

Representing partial dates. In the past we’ve done this using django-date-extensions which is a bit of Django code that lets you leave the day or month field empty (written by our very own Matthew). There are some issues with this approach (eg how do you represent Q2 of 2008, and how do you sort easily asc and desc in the db). So another approach is needed and I’m thinking that an abstraction that has a start and an end date would probably work well. Perhaps. Needs more thought, but it is something that PopIt will need.

Very simple image resizing proxy (should be in a new repo so we can release it as an npm package for others to use). See this previous post for the background to this. Check there isn’t one already.

Code that lets you edit the data client-side via the API. Probably based on Backbone This is one of those easily described tasks that is actually huge. Will also probably involve quite a lot of interaction with mySoc’s designer Jed. I’ve got some ideas about how to make this a really elegant thing.

Schema updates. MongoDB is schema-less - but this doesn’t mean that our data is. We’ll need some way to update the data as we change the way that it is stored. I’m thinking that we’ll embed a schema version in the data and then can either lazily update it as we access it, or more likely have a script that runs over all the data updating it. We’ll want some sane way to store and manage these change sets. Oh for a MongoDB version of Django’s South :) Again probably releasable separately as an npm module I hope - perhaps there is one already?

Code documentation - I’ve not written nearly as much as I should have. Naughty me. I’m guessing we’ll use something like JSDoc - as suggested in Google’s JS style guide (actually we should probably try to apply most of that document). This is more of an ongoing task that will mostly fall to me for the existing code, but we should do it as we go for the new stuff. Just like the tests.

Migration Tool - this is Dominik’s first task. This is to help people get data they already have into PopIt. Should be easy enough for anyone to use - if you can program you should use the API instead.

Ops - deployments, backups, restores - that sort of thing. We have a sysadmin joining the mySoc team soon - no doubt he’ll have plenty to say on that. I’m really bad at sys admin.

17 5 / 2012

Proposal: Handling images in the API

by edmundvonderburg

In PopIt it will be possible to store images of people. And these images will be made available in the API. The details that I expect to be interesting are the meta data, and a URL that can be used to serve the image.

The meta data is easy - it’s just text strings that define the source, a caption, attribution, etc.

The tricky bit is the URL, or more specifically what should the URL point at? Should it be the original uploaded image? Or should there be several URLs for different sizes of images?

In YourNextMP.com (a site I ran for the 2010 UK elections) the API provided several URLs for each image, and the size of the image for each URL: 

http://www.yournextmp.com/candidates/gordon_brown?output=json

    .....
    "image": {
      "small": {
        "url": "http://yournextmp.s3.amazonaws.com/images/35/56/000023556-small.png",
        "width": "35",
        "height": "40"
      },
      "medium": {
        "url": "http://yournextmp.s3.amazonaws.com/images/35/56/000023556-medium.png",
        "width": "87",
        "height": "100"
      },
      "large": {
        "url": "http://yournextmp.s3.amazonaws.com/images/35/56/000023556-large.png",
        "width": "219",
        "height": "250"
      }
    },
    .....

These sizes were chosen to suit the design on my pages. So I was happy. And most people were happy with the defaults, at least no-one complained to me about them.

But these were the halcyon days of yesteryear, when we didn’t need to worry about responsive design or retina displays. Now my choices for pre-sized images would be almost useless.

And as the API is core to PopIt should I be making sizing decisions for the API users? I don’t think so.

So my answer is this: The API provides only one URL per image - the raw uploaded file. It’ll provide the pixel height and width as well for convenience. The idea is that this is the raw material and what you then do with it is up to you.

This url can then be passed to a resizing proxy to manipulate it into exactly what you need for your purposes. There are free services on the web that allow for this (eg src.sencha.io). It’ll probably be worth building in a very simple proxy to PopIt as well - one that just resizes the image to the dimensions it is given. We’d use that in our own front end.

I think that this approach neatly sidesteps the issues - it gives the API user something simple and complete and lets them decide what to do with it.

Thoughts and suggestions about this very welcome.

15 5 / 2012

Don’t be selfish

by edmundvonderburg

Historically most things have been shareable. I could lend you my jumper, you could lend me a book. Some things however adapt themselves to the user - like shoes or fountain pens - these are less likely to be shared.

Recently things have started to be designed to be selfish - like the iPod: Your music in your pocket. This led to the iPhone and iPad also being selfish by design - as well as all the other brands. By this I mean that if I borrow your phone it has your contacts in it. If I play your game I might change your high score. If I read a book on your Kindle I’ll change your last page read position.

But all these devices have network access. It would have been possible to build in a login screen that let you borrow someone else’s hardware and make it yours briefly - the various contacts and apps and data being fetched from the network as required. The devices would have become shareable and would not be selfish anymore.

My point is that they were not shareable from the start and so now people don’t expect them to be. This expectation that forms about how something works is very hard to shift once it has taken hold. Hence it is vital to get it right early on.

GitHub deserves massive praise for the way that it has created the feel that your code is both yours and shared at the same time. Being able to publish a repo and control it, whilst at the same time making it possible for anyone to copy it, change their copy and then send you back the changes is amazing. And, most importantly, it is what people expect to happen. It has sharing at its very core.

I’d like the data in PopIt to be thought of in the same way. I’d like people to run sites and get credit for their own work. And I’d like for others to be able to contribute so easily that it is almost as if that data is theirs too.

We’ll start adding user accounts etc to PopIt soon and right from the start we’ll be thinking about how best to encourage sharing and collaboration.

11 5 / 2012

From two ports to one

by edmundvonderburg

Every now and then a seemingly small change leads to a huge commit. For me this is https://github.com/mysociety/popit/commit/22c9cf8f3673590b9865e81c224dcf8bd5a77f0c which changes the code so that instead of running the hosting app on one port and all the instances on another they now both run on a single port and the code takes care of deciding which app should handle the request.

Obviously this is better - for starters it will allow things like the custom varnish config to be scrapped - varnish now doesn’t need to split the traffic up. It also makes developing easier - there is just a single app to run.

However it will break almost all the installations out there - they will now need to tweak how they are set up. Apologies for this. I tried to make the change backwards compatible but it just was not worth it.

I’ve also updated the install instructions (they are much shorter now).