Home About

Cabinet document parser

The knowledge produced by the HARMD can be found within large archives of text based documents. These documents consist of reports and recommendations produced by council officers, and the minutes and decisions of committee and cabinet1 meetings. These reports and minutes present a summary of activities undertaken and cover such things as; agreements for how residents should be consulted, procurement of repairs and maintenance contracts, planning decisions that determine if buildings should be demolished or repaired.

Figure 1: Lambeths calendar of meetings and decisions
Figure 1: Lambeths calendar of meetings and decisions

In the borough of Lambeth, details of these meetings can be found within an online calendar which documents local government process as far back as 1999. While documents earlier than this period can be obtained via the public records office, the provision of an web based interface allows the entire corpus of text to be automatically downloaded through use of simple programming scripts. This enables these documents to be grammatically searched, analyzed and interrogated for knowledge produced by the HARMD.

Unpacking the URL

Figure 2: URL breakdown with synonyms and techno-socio interpretations
Figure 2: URL breakdown with synonyms and techno-socio interpretations

The URL is the primary public means through which people and machines access documents that describe LBC meetings and events. The URL orders the distribution of knowledge via the familiar http://, a protocol and set of conventions which enable the transfer of documents from one machine, a web server, to others, such as a laptop or smart phone. The URL reveals specific geographic and material arrangements of machines and people involved with governmental process, administrative convention and technical negotiations.

As example, registration of a gov.uk domain name requires the permission of a committee of people within the Government Digital Service. The subdomain moderngov.lambeth.gov.uk2 points to a physical machine addressed via an IP3 number 91.216.55.115. It is reported that this machine may be found at Central London coordiantes “lat:51,4964 lon:-0.1224”4, as indicated by a geolocation service. We also that discover the management of this machine, or its connectivity, involves a commercial entity ‘Virgin Media Limited’.

Figure 3: Obtaining geolocation information via the CLI
Figure 3: Obtaining geolocation information via the CLI

The next segment of the URL mgCalendarMonthView.aspx appears to indicate the name of a script used to generate the HTML used to display the calendar within a browser window. The .aspx suffix highlights this script may be written in VBSCRIPT, or C# programming code, however the HTML source contradicts this and reveals the website may been developed using Drupal, an open source technology based on the PHP programming language using .php files. It appears an underlying technology is being masked, potentially to connect an older technology to a newer interface. A quick search for “moderngov, drupal, aspx” returns www.moderngov.co.uk who appear to be a consultancy providing web applications for wide variety of local authorities across Britain. This opens potential for writing a single programming script which is able to automatically download and analyze documents across a large section of local authorities.

7 Gigs and counting

Finally, the M=3&DD=2013 section of the URL specifies the month and year to display within the browser. Simply incrementing the month and year, then downloading the web page associated that URL provides access to a list of every LA meeting date, where each date links to a list of associated documents for that meeting. I wrote a script to download these documents, which I ran on my laptop while I worked. Over the course of a few hours 7.1 gigabytes of documents were downloaded, though there may be a substantial number of additional documents still to be downloaded, as an error checking script indicated that a significant number of pages were not parsed.

Figure 4: A list of documents collated by year and month then ordered by file size
Figure 4: A list of documents collated by year and month then ordered by file size

  1. The leader and cabinet model was introduced following the Local Government Act 2000, which was devised under Tony Blairs’ Labour Government. The elected leader of the council assigns a portfolio, such as ‘housing’ or ‘finance’, to each cabinet member. Decisions may be delegated to an individual members, or taken by the cabinet as a whole. Scrutiny committees can also be called upon by local councilors whose role is to review the decision making process.

  2. In this instance moderngov.lambeth.gov.uk is a subdomain, a subsidiary website of the main lambeth.gov.uk website which is hosted on different server equipment.

  3. Each domain name has an associated Internet Protocol (IP) address which resolves to a specific machine on the Internet. The way hostnames are resolved to their mapped IP address is called Domain Name Resolution.

  4. The coordinates provided are highly unlikely to be the true location of the server, as they are derived from information obtained from multiple sources such as Regional Internet Registries, details of Internet Service Provider, or activities detected as originating from from a specific IP.