Looking at the language used about, and around, data portals
Ask someone involved in the world of open government data what is meant by a ‘data portal’ and you might get an answer by example: “A platform like data.gov.uk” (or whichever Data-Dot is prominent in their context). Data portals are the data websites that hundreds of national and local governments launched in the 2010s, often as the flagship component of an open data project or announcement. At a minimum they provide a list of datasets, supporting data discovery and access. More developed, they might provide visualisations, apps and all manor of interactive features.
But, step outside the open government data field, and the term data portal might be attached to other kinds of data platform. Platforms with a family resemblance to the government Data-Dots, but sometimes built around quite different assumptions. And, if you look at the features and functions of any individual data portal, you might find similar features in another platform described under quite a different name: data repository, marketplace, catalogue or cloud.
Understanding how portal terminology has evolved in the open government space, and mapping the neighbouring terms that might be used describe overlapping technical and organisational concepts, can help us to step back and understand the agendas driving portal development, and to draw on ideas from outside the relatively narrow mainstream of ‘open government data portal’ thinking.
Presence of selected terms in a corpus of books indexed by Google Books. Presented using the Google NGrams viewer. Created November 2021.
The diagram above, showing the changing use of key data portal related terms over time in published English language books (Source: Google Ngram Viewer), illustrates an interesting trend. Prior to 2000, ‘Data Catalogue’ was the term of art for a collection of metadata designed to support data discovery and access (albeit, the data catalogues of the 1980s were commonly thick printed books, containing tables of narrowly spaced opaque codes, and listing scientific observation datasets that scientists could write off to request. It’s only in the 2000s that the term ‘data portal’ gains traction, partly as a programming term of art, and partly through growing focus on scientific and geographic data portals.
A data catalogue from 1979: Selective Guide to Climatic Data Sources - Volume 55.
In the published literature, ‘Open Data Portal’ only overtakes ‘Data Catalogue’ sometime after2010. Indeed, the Open Knowledge Foundation site now known as DataPortals.org that lists over 500 of the things, was originally DataCatalogs.org1. It’s also notable that, when data.gov first launched it made a distinction between providing a data catalogue, and providing data access tools.
Open data portal also quickly gains ground in our Ngram visualisation over a term that has been trundling along in niche usage since the late 1990s: Information Asset Register. Initially driven by public sector information reform agendas, and receiving a recent minor boost in relation to GDPR compliance, an Information Asset Register, unlike a catalogue produced for public consumption, is designed to capture all the data (and structured information) resources held by an organisation. It may be much for internal use as to support external data access. However, when a number of data portals (including data.gov.uk) experimented with hosting details of as-yet-unreleased data alongside public data, the line between IAR and Open Data Portal looked like it could become blurred. The language of ‘register’ is also evident in portal-like resources like the International Aid Transparency Initiatives’ IATI Registry and Amazon Web Services’ Open Data Registry, though more in the context of external parties ‘registering’ their datasets for others to use.
Interestingly, few of the software tools used to run data portals actually self-identify as such. CKAN is styled as “The world’s leading open source data management system” (see the CKAN designers’ description of a DMS here); Socrata as a “Data Platform”, “Finance Cloud” and “Data Insight” tool; DKAN an “Open data platform”; and Data.world and Magda.io both focus on their role as catalogues.
When looking at the ‘open data portals’ used by National Statistical Organisations (NSOs) in 2021, Paris21 and Open Data Watch find a quite different software stack in use- less generalist data catalogue, and more specialised interfaces for accessing particular statistical macro-data. This reflects the definition of data portal they adopt of "a web-based, interactive data and metadata platform with databases modelled for specific data types and domains such as microdata, macrodata or geospatial data”, which they acknowledge is different from the terms use in other areas of the open government data world.
Example of a scientific data portal, highlighting individual records over datasets.
We see a similar framing in the scientific world, where data portals are more commonly centred on one particular kind of data point: such as the Natural History Museum’s data portal which highlights the number of individual collection records (11.7m) it provides access to, or the European COVID-19 Data Portal which aggregates data on over 4 million individual viral sequence. The COVID-19 Data Portal itself is fed by a ‘federated archive’ of sensitive patient research data, and a set of local data hubs, provided by the portal to local units in order to facilitate meta-data collection and provide local analytical capacities. This mixed model, of a central portal, aggregation of data from federated systems, and provision of tools back to specific localities or specialist communities appears increasingly common. For example, the Global Biodiversity Information Facility is piloting creation of ‘hosted portals’ for partners which provide a curated window onto the overall GBIF collection of data. When it comes to managing datasets with different schema and structures, the research community is more likely to talk about data repositories: platforms that focus on storage, long-term preservation, meta-data capture and citation of datasets.
And if we turn to the private sector, we see that around the same time open government data portals were gaining traction, a number of firms launched ‘Data Marketplace’ offerings, including Microsoft’s Azure Data Marketplace in 2010 backed by the OData specification for common API access to data, and DataMarket launched in 2008 and acquired by Cliq in 2014. Whilst some data markets, particularly around specific industries such as advertising, or related to specific kinds of sensor networks, might fully manage the trade in data, others ultimately focus on providing a catalogue of public and proprietary data, and then brokering negotiations between data owners and potential users.
Can we say then what a data portal essentially is? Probably not.
Screenshot from the launch version of data.gov.uk
However, in thinking about portals as platforms for citizen engagement it might be useful to look back at how data.gov.uk was described when it first launched. I’ve scoured the launch version of the site, and I can’t find the word portal, catalogue or register anywhere. Instead, it is described as “a way into the wealth of government data”, and opens with the statement “We’re very aware that there are more people like you outside of government who have the skills and abilities to make wonderful things out of public data. These are our first steps in building a collaborative relationship with you.” This framing, reflecting the emergence of UK open data work from the Power of Information Review, with its emphasis on both transparency and participation, points to the portal as opening more than data: providing a route for engagement with government. As we’ll see when we turn to the literature, that impression, that the portal is part of a bigger transformation, plays a big part in dissatisfaction with the dataset lists and catalogues that persist today.
Beyond the portal?
We point to lots of thing as portals. But one person’s portal is another’s repository. Having access to the wider terminological landscape can help us to think about what we particular portals to be: management system, analysis platform, catalogue, register, marketplace or data store?
Each of these terms also has its neighbours. Heading towards the portal as datastore might lead us to think about the ‘data lakes’ or ‘data mesh’, whilst looking at the data registry, could lead us into the neighbourhood of enterprise metadata management.
In short, we need to think carefully about the ideas we invoke when talking of portals in order to make sure we’re not talking at cross-purpose, and to make sure we are drawing on learning and idea from the widest field possible.
In the next post I focus on ‘portals as technology’, taking the set of software tools commonly used to deliver ‘open government data portals’, and offering a brief genealogy of their development over the last decade. In a later post I’ll be focussing on ‘portals as a vision’, exploring how academic writers in particular have constructed an idea of what the open government data portal could or should be (and how they have generally failed to live up to this potential).