One of the earliest challenges faced by TransportAPI was making bus timetable information available for every bus stop and service across the whole of the UK. There were two sides to the problem. What data sources would we need to provide the most comprehensive service possible and how to make this easily consumable for developers building apps and services? Turned out there were quite a few source datasets that we needed. The Traveline National Dataset provides all of the UK bus schedules; the smaller National Coach Services Dataset (NCSD) contains long distance coaches: both are in TransXChange format. There are the best part of 20,000 files in those two datasets – one per bus service.
We also needed the National Public Transport Access Nodes (NaPTAN) dataset, that describes the physical bus stops (where they are and what they are called).The final piece of the jigsaw puzzle was the National Operator Code dataset that tell us who runs what service. So we imported all of these into our databases and then had the small job of figuring out how it all plumbed together.
Turns out there’s a huge number of bus stops in the UK – we usually round it up to half a million but if you are into your data then the number is 434007 bus stops nationwide.
Every operator ultimately generates their own schedules. Through a long chain of organisations and transformations, these get released in the National dataset. So this source dataset has hundreds or thousands of contributors – an army of schedulers, each doing things slightly differently. Given our knowledge and experience, over time we’ve managed to find every quirk in how people represent their services, and accounted for it in our importer.
Then we needed to figure out the best way to make a huge dataset like this accessible to developers. And everything starts with Search right? So we developed search features, that allow you to find bus services and stops in different ways. You can find by proximity, or search in service or stop name or a combination of these. This returns unique IDs for stops and services to retrieve bus stop departures, and bus service timetables. You can retrieve individual stop and service timetables, but you can also package up collections of these, like all departures, for the nearest 20 bus stops.
We can serve this information out as pure schedule data, or we can combine it with live sources to provide Real Time Information predictions for stops, and service status information for bus services. Where operators want us to serve their schedules direct, and avoid the transformations associated with getting merged into the National dataset, we can do that too, making sure that coverage remains complete, and without duplicates.
Its been a huge undertaking, and a process of constant improvement. Every week when the National dataset is released, and we hit that green button to run the full import, we all share a sense of pride at how much work has gone into representing this data accurately, and in an easy to use form.