LMI For All

Documentation & Development

User Tools

Site Tools


Sidebar

Start Pages

Team Pages

Upcoming Events

Apr 24 Modding Day
tech:tech_report_feb_2014

Feb 2014 Tech Report

Where are we?

Currently, our infrastructure is running on three servers, those being:

  • the data server, which stores the LMI data
  • the API server, which makes this data accessible to the public
  • and the collaboration server, which hosts this wiki, the website, and the developer community forum.

After our load tests towards the end of 2013 and subsequent improvements to the data server, we feel fairly confident that we can handle the query loads upcoming in the near future, at least. We have deployed API key verification, but aren't currently enforcing it. We also aren't tracking or limiting API usage in detail.

Data Considerations

Data import automation

Data import at the moment still requires some manual work, this can be better automated. Most of the process has been automated already, except for the initial raw import. We need to formalise the format of the data files that are fed to the automation system.

Consolidation of meta data

Data sources all have their own logic when it comes to meta data (regions, etc.). We want to consolidate those meta data better in the data model. This work was postponed due to the contest.

Data Cubes

A data cube is a way of storing data so that it is aligned along several dimensions and can be sliced and diced arbitrarily. This is a much more dynamic and flexible way of accessing data than what we're doing currently. Data cubes have the following advantages:

  • Users can compose their own queries.
  • Data cubes facilitate the use of SPARQL query endpoints, which are popular with the open data community1).
  • Data can be pivoted and sliced easily to potentially answer questions the API wasn't even built for.

They also have some disadvantages:

  • Data cube queries probably require more processing power than normal queries.
  • Data cube queries' inherent flexibility also requires that they are given in a special query language that will have to be learned by the developers.
  • Properly exposing a data cube to the public can be tricky and might require additional programming work.
  • It is unsure whether the ASHE estimations can be run from a data cube, so these might have to be precalculated in depth (= space-consuming)

Technical Considerations

API Keys / Tracking

An API key is a bit of data that uniquely identifies on behalf of which app a particular API request is running. This potentially allows things like:

  • tracking app usage in depth,
  • limiting apps to certain rates of queries over time,
  • payment systems where high usage apps pay us per 1000s of queries.

API keys are currently deployed and verified with every request, but no action is taken if the verification fails. We are also not limiting query rates. Enforcing API keys has a disadvantage in that they sometimes have to be cryptographically signed, which can be quite tricky to figure out for newbie developers.

Seperation of prodcution and development

We recommend setting up a seperate development environment, next to the current three server which will be promoted to production. The development environment should consists of database server and an api server. Both development servers can be lighter than their production counterparts, but should otherwise be the same.

Redundancy / Failovers

Redundancy in the context of an IT system means that it has superfluous capacity that can be activated in case of a spike in usage, or in case another part of the system fails. We currently have 3 servers, none of which is redundant. Having redundant system components allows the system to continue to function normally to the user even when some parts of it are down due to maintenance. If the budget allows, we recommend making at least the API server redundant and putting it behind a load balancer. This would allow API upgrades without downtime.

Backups and snapshots

We recommend storing regulare snapshots for each of the production servers to allow for fast recovery. A snapshot is a full image of the virtual server that can be used to instantly recreate the server.

Next to that we recommend setting up a dedicated backup server to store incremental backups of all files and databases. We can set up our own backup schedules.

Service Level Agreement (SLA)

A Service Level Agreement is a written document that specifies the stability and availability guarantees we make to our “customers”. It's a good idea to have one, even if it specifies we make no guarantees at all. I'm not sure how legally relevant this is in the UK, so maybe Colin or Raymond have more expertise here.

For the three production servers data, collab and api, we can buy Server Management services from our hosting partner. This includes:

  • 24/7/365 Coverage
  • Reactive uptime monitoring
  • Comprehensive SNMP Data Logging and monitoring
  • Network Management with 100% uptime SLA
  • Security Audit, Software Patching and Kernel updates on demand
  • Seamless migration assistance into your servers
  • Best-Effort Third Party Software Installations and Support
  • 20 minute average response time SLA

Future

Our aim is to deliver at the end of the project a scalable and maintainable system. With a proper seperation of development and production environments and disaster recovery by use of server snapshots and incremental backups.

1) but not necessarily easier to use for the average developer than what we currently do
tech/tech_report_feb_2014.txt · Last modified: 2014-02-26 13:37 by Raymond Elferink