Skip to main content

Evidence and insights: other findings from research

The second post of a two-part literature review looking at open data portals

Published onDec 10, 2021
Evidence and insights: other findings from research
·

This is the second of two posts looking at data portal related research, and seeking to draw out relevant findings for future portal development.

The research community has been extremely active in developing assessment methods, creating metrics, carrying out case studies, and otherwise scrutinising the development and realisation of open data portals. As an applied computing topic with readily available data and potential ‘impact’, that is perhaps unsurprising. More surprising, is that at an initial glance, relatively little of the research seems to have had a long-term impact on the portal landscape. Proposed metrics, or prototype data management or presentation features, rarely seem to have become part of the default portal stack, and I’ve not found evidence of significant University spin-outs driven by portal research.

However, there is a lot to be drawn from the literature if you are willing to go digging (and can bypass the paywalls around some of it). In this post I’ve tried to organise and summarise some of what I’ve discovered from two days spent with my literature shovel.

Portal features in focus

What features should a data portal have? Lnenicka and Nikiforova [2021] uses the concept of ‘transparency-by-design’ to deliver a list of more than 30 features and functionalities a data portal should have, dividing responsibility for the successful operation of the feature between the portal (as code), and the data publisher (data management, or organisational data practice layer, to use the hourglass model introduced in my last post). Interesting, whilst they go on to highlight calls in the literature for portals to integrate visualisation and analysis tools that support non-experts to work directly with published data, and to develop feedback features in service of portals as a transparency tool, when Lněnička et al. [2021] went on to ask domain experts to prioritise portal features, analysis, feedback and crowdsourcing tools were seen as the least important features. As Zhu and Freeman [2019] put it, “To help users understand and engage with data, cities may decide if OGD portals should provide tools to advance data literacy and user education, or opt to leave these issues to intermediaries.” highlighting the tension between portals as comprehensive data access tools, or just one layer in a wider set of data access tools and services.

Both Máchová et al. [2018] and Osagie et al. [2017] offer usability frameworks for evaluating open data portals, describing features from the perspective of a user wanting to engage with data. Whilst Máchová et al. [2018] frames this in terms of portal features (e.g. “Portal provides visualization and analytics capabilities to gain information about a dataset”), Osagie et al. [2017] offer usability statements, such as “the data and charts are simple and easy to read”. Supported through the same Horizon2020 programme as Osagie et al. [2017], Hogan et al. [2017] point to the need to understand which “barriers to accessing, understanding, and using open data” can be overcome through technology, and which need action at the level of organisational practice, or training and engagement. Ruijer et al. [2017] offer a set of user personas, and report on a persona and scenario-based design approach to developing requirements: though again pointing to potential features of wider portal-linked tooling, rather than necessarily articulating requirements for core portal code.

Ultimately, the appropriate features for portals will depend on their envisioned role. OECD [2018] outline a number of different organisational models for thinking about portals, from data-centred portals to deliver ‘Data as a Platform’ (DaaP), through to a ‘Government as a Platform’ (GaaP) approach which adds a requirement that portals be data-driven (hosting data and supporting data crowdsourcing as well as providing access to datasets elsewhere) and user-driven (portals acting as collaborative online spaces for the open data ecosystem and collective knowledge). Nikiforova and McBride [2021] remind us that companies, citizens and government all have different requirements. For example, an API may be important to private sector, whereas download more important to citizen. Therefore, any prioritisation of features will involve a choice about the kinds of users to prioritise.

It is notable, however, that a ‘participation maturity model’ (see last post / Alexopoulos et al. [2017]) is inherent in most suggested feature lists and assessment frameworks for data portals. Zhu and Freeman [2019] for example organise portal features in terms of their contribution to access, trust, understanding, engagement with or integration of data, and participation, following a common model of building towards participatory features. Overall, the value in literature on portal features may be less in the long-list of technical or content features they generate, and more in exploring the gap between idealised and actual portals that we encounter in the next section. This said, it is useful to capture a few specific critical observations from papers that address general and specific portal features and functions:

  • Lourenço [2015] highlight the short-live SpendReports on Data.gov.uk, which detailed which government departments had provided up to date spending disclosures, as an example of how portals can be “designed to clearly identify ‘what should be reported’ and, by comparing with what is effectively disclosed, ‘what is not being reported’ and ‘by whom’”. This points to a particular role for portals as tools for managing the quality and coverage of various domain-specific data.

  • Gebre and Morales [2020] argue that “Datasets do not ‘speak for themselves’ because they require context for analysis and interpretation.”, suggesting much more should be done in portal design and implementation to provide contextual descriptions and meta-data.

  • Weber and Yan [2017] argue that a greater focus on capturing feedback could help improve data quality - though their work points to an important distinction between feedback (to portal management or dataset owners) and discussion (users sharing their experience about a dataset).

  • Yang et al. [2015] and Pinto et al. [2018] explore the use of dataset categorisation across portals, suggesting that divergent use of categories can make it difficult for users to locate data both within and across portals. As well as taking their recommendation of developing harmonised data category structures, we might also ask whether the category structures of data portals map to wider vocabularies used in government knowledge management - offering potential connection points between the management of government data and other forms of government information.

  • Seoane and Hornidge [2021], focussing on academic data portals, raise the question of who is in the room when portals are designed, and whether their features and orientation place undue emphasis on quantitive data. Notably, little literature appears to consider the integration of data portals within the wider public-facing government web estate, or to look at features that might integrate data management alongside other content management, or that might support data citation practices in government report-writing to support more contextualised datasets.

Benchmarking portal performance

A lot of papers develop and apply frameworks for assessing portals. In a recent systematic review of the literature Šlibar et al. [2021] identify 86 portal-assessment studies from the last decade: 40 using primarily qualitative data collection (e.g. researcher assessment of portals), and 46 with mixed method approaches, generally also drawing on portal meta-data to generate portal and dataset level metrics. They identify common recommendations from these papers, including a significant number calling for more action to ‘Integrate participation and collaboration elements’ (12 papers), to ‘Bring value to different stakeholders’ (15), and to improve the alignment of portal features and contents with ‘government programmes, open government data principles [and] standards’ (13 papers).

Portal benchmark studies may (a) offer metrics to judge whether portals are well-placed to deliver against some defined goal, such as supporting transparency, participation or innovation [Máchová and Lnénicka 2017; Lafortune and Ubaldi 2018; Lourenço 2015]; (b) prototype or provide methods and tools that might be used by portal managers to monitor the quality of (meta-)data on the portal, and creating competitive pressure for better (meta-)data regardless of the end use of the data [Marmier and Mettler 2020; Bhandari et al. 2021; Kubler et al. 2016]; (c) identify portal features that are either under-developed, or under-used, and which the authors suggest should be included or improved to support greater realisation of value form data Thorsby et al. [2017]; and (d) seek external explanatory factors for the success or failure of portals [Sáez Martín et al. 2016; Chatfield and Reddick 2017; Thorsby et al. 2017]. Additionally, a number of wider open government data benchmarks incorporate an assessment of portal features or quality as part of their assessments [Lnenicka et al. 2022; van Knippenberg 2020; Aquaro 2018].

There are signs that researchers are increasingly looking to develop more domain-specific evaluation tools, with Venkatachalapathy et al. [2020] offering a ‘Data Portal Evaluation Rubric’ for transportation agency-related data portals and datasets, and Wu et al. [2021] putting forward a model for evaluating health-related datasets and their meta-data particularly in light of data-demand resulting from COVID-19. Nikiforova [2020] use the COVID pandemic response to assess how quickly case data made it onto data portals, responding both to specific requirements around health data, and exploring the more general question of data timeliness.

The overall impression from the portal evaluation literature is that of a portal landscape that has focussed on quantity over quality, and where the features that researchers argue are important for realising value from data are often lacking. The critique comes at a number of levels:

  • In their evaluation of German data portals Wenige et al. [2021] highlight lack of meaningful dataset descriptions. OECD [2018] picks up this theme, pointing to poor meta-data quality, particularly contextual meta-data on datasets (as opposed to file format or licensing information for example), though also giving good practice examples from the Canadian data portal linking to reports, laws and articles related to catalogued datasets, and Spain’s portal providing links to specific laws that can explain the legal context of datasets.

  • OECD [2018] also review the wider features available on OECD data portals - including assessing the prevalence of data request options (21 out of 34 governments), user feedback tools (25 our of 34, but only 15 that make comments visible, and just 2 that report how quickly feedback is responded to), and crowdsourcing tools (just France, Mexico and Austria are reported to offer ways for non-government actors to contribute data, though 10 countries overall have options for users to contribute visualisations).

  • Nikiforova and McBride [2021], in their 41-portal usability study, point to weak social features on portals, concluding that “the poorest aspects from a usability perspective were most commonly related to more social aspects of OGD portals, dissemination of OGD use cases, or interaction between OGD users and OGD providers.”. They go on to recommend that “governments and OGD maintainers should focus as well on developing OGD ecosystems and interaction on their OGD portal.”. A similar point is found in Máchová and Lnénicka [2017].

  • Zhu and Freeman [2019] in a ‘User Interaction Framework’ review of 34 US city-level portals, finds that “cities lack a clear understand-ing of users’ interest in different types of municipal data”, and that portals were particularly weak with respect to participatory features.

Notably, these critiques are often much more about the processes around portals (management, content and engagement) than about portal technology: and the solution to them is likely to lie predominantly in people and processes, rather than portal tech.

What future for automated evaluation?

Considerable work has also taken place on the automated identification [Correa et al. 2020], assessment and benchmarking of portal contents [Nogueras-Iso et al. 2021; Neumaier et al. 2016], including data structure [Antony and Salian 2021] and timeliness [Neumaier and Umbrich 2016; Atz 2014] - and on creating e-usable research infrastructures for tracking data portals and metadata over time [Correa et al. 2020; Weber et al. 2020]. In theory, this should provide the basis for new tooling to support portal management, although it’s notable that integrated portal metrics like the ‘5 stars of open data’, or richer dataset-level assessments like Open Data Certificates, have struggled to achieve sustainable implementation.

The portal as an organisational process

A number of papers speak to the organisational dynamics of delivering and maintaining data portals and programmes, and portal features that can support this [e.g. Abella et al. 2019; Zuiderwijk et al. 2012; De Donato et al. 2018; Perez et al. 2020].

In 2012, Zuiderwijk et al. [2012] outlined a set of ‘socio-technical impediments of open data’ including a list of barriers to data being posted on data portals, ranging from low levels of public sector understanding, to a lack of mechanisms to ensure data is used by government, no processes for dealing with user input, and a focus on the publishing process, as opposed to considering data use. A number of studies[e.g. Parycek et al. 2014; Perez et al. 2020] document the practical process of implementing portals in particular localities, pointing to portal implementation as much more than a technical task. Sánchez-Nielsen et al. [2021] turn the focus back onto the technical architecture, arguing that as portals “in many cases, originated as part of a politically driven open data initiative” they have not been developed to be sustainable, and instead alternative frameworks are needed to ensure sustainable ongoing and timely data publication.

At this point, it would be valuable to turn to the wider literature on (open) data initiatives, as opposed to open data portals to explore the specific role portals may play alongside other interventions. Although doing this in full is beyond the scope of this current rapid review, I’ve observed that the literature has increasingly turned to focus on issues of data governance and quality in recent years - moving from the question of how to get data onto a portal, to the question of how to make sure the data provided is fit-for-purpose, and how to use portals as tools to support internal data management reforms.

When Sabri et al. [2019] explored the published policies of 20 national open government data portals they found that “while ensuring the quality of data shared on the portals is important, little attention is given in explicitly providing quality criteria in the portal for data contributors” in terms of data accuracy, trustworthiness or completeness. They suggest then, that while portal policies to date have focussed on machine-readability, public access, timeliness, file format and re-usability (licenses), there is a need for greater commitment to quality at the content layer. Reis et al. [2018] proposes that more needs to be done to build on established data governance frameworks in the management of portals, potentially through incorporating tools into common portal software to better support work on issues including: quality and consistency; policies and standards; security & privacy; compliance and retention and archiving - alongside access to data.

Perez et al. [2020] use the concept of an ‘urban data hub’ to examine data portals and the organisational relationships behind them in a number of cities, including London, Singapore, Geneva, and Paris, and looking at how they fit into a wider Information Management landscape. Drawing on interviews they make a number of related recommendations including: Expand the enterprise-wide data warehousing capability of the organisation; Invest in a Chief Information Officer; Ensure business practices are aligned with technology design; Update the existing Client Relation Management software; Adopt more sophisticated Geographic Information System (GIS) software; Invest in more internal data analytics capacity; and Improve internal standards for data collection, storage and retrieval.

OECD [2018] provides insight into practices of OECD governments have used managing their data portals, including data quality certification and automated data cleaning in Korea, and the ‘Data Squads’ taskforce model in Mexico to engage with departments to improve priority datasets. They also highlight the importance of tooling to manage broken links. In ‘From Repositories to Switchboards: Local Government as Open Data Facilitators’, Anastasiu et al. [2020] provide a vision of bi-directional exchange of information between local government, civil society, academia and business: where the focus is on matching data with use-cases through an ongoing ‘double-diamond’ iterative approach. They note, however, that “the realisation of such a model must be accompanied by a widespread shift in mindset: from local government as service provider towards collaborator”.

Luthfi et al. [2020] highlights the importance of incentives for those whose data is to be made available through portals, and the need for knowledge-sharing about what works when publishing data. This hints at the need for portals to better serve internal data custodians: both with support to publish, and feedback that can incentivise their work. There are useful questions to be asked here about the role of metrics in both maintaining public official engagement with data portals, and supporting investment in work to improve data quality. Degbelo et al. [2020] report on an approach to capture better usage data through pilot use of API keys for dataset access, implementing access logging, and providing a dashboard of dataset use to government employees.

There has also been interest in using the data available on portals to map opportunities for improved coordination around data inside government. For example, Adel Rezk et al. [2017] apply Named Entity Recognition (NER) and Natural Language Processing (NLP) to the content of Data.gov.ie to explore similarities between the data of different publishers on the platform, with a view to identifying opportunities for different departments or agencies to collaborate and advance cross-organisational standardisation of data.

Technical innovation

This brings us to a fourth cluster of papers, primarily from computer science, that outline experimental technical improvements that might be made to portals, including to improve metadata quality [Bogdanović et al. 2021; Tygel et al. 2015], deliver better search results [Au et al. 2016] or data visualisation [Folmer et al. 2019; Osagie et al. 2017], aggregate and augment metadata for improved single and cross-portals search [Pelucchi et al. 2018; Castelo et al. 2021] or provide real-time data quality information [Bhandari et al. 2021].

For example, the CommuniData project has prototyped alternative interfaces for data search, piloting chatbot based search [Neumaier et al. 2017] and looking inside datasets to extract geographical locations [Neumaier et al. 2018; Heil and Neumaier 2018] to drive improved geographic data search. González-Mora et al. [2021] have demonstrated the potential of a voice interfaces onto the European Data Portal to increase accessibility of portals to those with visual impairments.

Going back even before the earliest data.gov.x portals, there has been interest in applying semantic web technologies to enrich government data [Alani et al. 2007], and although the idea of Linked Open Government Data hasn’t broken through at scale, work to ‘lift’ open data portals and their contents into the semantic web remains ongoing [e.g. van der Waal et al. 2014; de Figueiredo et al. 2021; Aloufi 2019].

On a different track, one particularly interesting study [Koesten et al. 2020], from the Human Computer Interaction field, takes a critical look at the summary data provided on data.gov.uk, and drawing on a diary-study, lab experiment and crowdsourcing input to propose a template for better textual descriptions of datasets. The study points to work by Kacprzak et al. [2019] to analyse search terms used against major open government data portals, which suggested that portal search is “currently used in an exploratory manner, rather than to retrieve a specific resource”. Related work, linked to Machine Learning, around datasheets for datasets [Gebru et al. 2020], and Data Nutrition Labels [Chmielinski et al. 2020; Holland et al. 2018] may also be relevant when considering the design of data summarisation interfaces.

It is notable that data portals are much more likely to include ‘data explorer’ components with basic visualisation features, than they are to provide text-based summarisation. We’ve not located significant research on the utility of these interfaces, although Osagie et al. [2017] point to some the difficulty users had with pivot chart interfaces built into a new prototype data portal aiming to directly help users answer key questions. This takes us back though to the essential tension: should a portal provide data, or provide answers? And if it only provides data, how far should be offer a gateway to other support that builds the capacity of users to turn that data into the answers they need.

The engagement layer

The literature review strategy for this paper (searching keyword searches on ‘open data portal’, plus following citations) didn’t turn up as much on approaches for engaging citizens with open data as might be relevant for plotting the future of data portals as a tool of civic engagement. This is a gap we’ll need to fill in future phases of work - as undoubtedly there’s lots in the literature on participatory (open) data practice to draw upon, even if it does not directly reference portals. However, the key message for portal design appears to be the importance of putting data in context, with Degbelo et al. [2020] outlining the potential of contextual data use tutorials, and Gascó-Hernández et al. [2018] highlighting the importance of context-aware training. General automated visualisations or data tutorials appear less effective to realise data re-use than tailored support.

This does raise questions of whether it’s possible to identify particular ‘high (social) value’ datasets where investment in such contextualised presentation has a return, or whether the cost of contextualisation can be brought down by standardisation that allows context-specific interfaces to be shared between portals.

How can we better connect research and practice?

When I was running the Open Data in Developing Countries research network with the Web Foundation, we established the ‘Open Data Research Symposium’ as a fringe event of the International Open Data Conference (IODC) with the explicit goal of better interfacing social and mixed-methods research on open data with policy debates. I’ve noticed for a number of years the presence of open data tracks at computing conferences, but these appear to act more as a showcase of applied computing research, rather than a means to get findings into the future design of technology or initiatives. I’m left wondering whether there might be meaningful interventions to better translate the wealth of portal research into improved practice.

Bibliography

Abella, A., Ortiz-de-Urbina-Criado, M., and De-Pablos-Heredero, C. 2019. The process of open data publication and reuse. Journal of the Association for Information Science and Technology 70, 3, 296–300.

Adel Rezk, M., Ojo, A., and Hassan, I.A. 2017. Mining Governmental Collaboration Through Semantic Profiling of Open Data Catalogues and Publishers. In: Collaboration in a Data-Rich World. 253–264.

Alani, H., Dupplaw, D., Sheridan, J., et al. 2007. Unlocking the Potential of Public Sector Information with Semantic Web Technology. The Semantic Web, Springer, 708–721.

Alexopoulos, C., Diamantopoulou, V., and Charalabidis, Y. 2017. Tracking the Evolution of OGD Portals: A Maturity Model. Electronic Government, Springer International Publishing, 287–300.

Aloufi, K.S. 2019. Generating RDF resources from web open data portals. Indonesian Journal of Electrical Engineering and Computer Science 16, 3, 1521–1529.

Anastasiu, I., Foth, M., Schroeter, R., and Rittenbruch, M. 2020. From Repositories to Switchboards: Local Governments as Open Data Facilitators. In: S. Hawken, H. Han and C. Pettit, eds., Open Cities | Open Data: Collaborative Cities in the Information Era. Springer, Singapore, 331–358.

Antony, S. and Salian, D. 2021. Usability of Open Data Datasets. Conceptual Modeling, Springer International Publishing, 410–422.

Aquaro, V. 2018. United Nations E-Government Survey 2018: Gearing e-government to support transformation towards sustainable and resilient societies. United Nations, New York.

Atz, U. 2014. The tau of data: A new metric to assess the timeliness of data in catalogues. CeDEM14 Conference for E-Democracy and Open Government, 147–162.

Au, V., Thomas, P., and Jayasinghe, G.K. 2016. Query-Biased Summaries for Tabular Data. Proceedings of the 21st Australasian Document Computing Symposium, Association for Computing Machinery, 69–72.

Bhandari, S., Ranjan, N., Kim, Y.-C., et al. 2021. An Automatic Data Completeness Check Framework for Open Government Data. Applied Sciences 11, 19, 9270.

Bogdanović, M., Veljkovic, N., Frtunic Gligorijevic, M., Puflovic, D., and Stoimenov, L. 2021. On Revealing Shared Conceptualization Among Open Datasets. SSRN Electronic Journal.

Castelo, S., Rampin, R., Santos, A., Bessa, A., Chirigati, F., and Freire, J. 2021. Auctus: A Dataset Search Engine for Data Augmentation. arXiv:2102.05716 [cs].

Chatfield, A.T. and Reddick, C.G. 2017. A longitudinal cross-sector analysis of open data portal service capability: The case of Australian local governments. Government Information Quarterly 34, 2, 231–243.

Chmielinski, K.S., Newman, S., Taylor, M., et al. 2020. The Dataset Nutrition Label (2nd Gen): Leveraging Context to Mitigate Harms in Artificial Intelligence. NeurIPS 2020 Workshop on Dataset Curation and Security, 7.

Correa, A.S., Melo Jr., A., and Silva, F.S.C. da. 2020. A deep search method to survey data portals in the whole web: Toward a machine learning classification model. Government Information Quarterly 37, 4, 101510.

De Donato, R., Ferretti, G., Marciano, A., et al. 2018. Agile production of high quality open data. Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age, Association for Computing Machinery, 1–10.

de Figueiredo, G.B., de Faria Cordeiro, K., and Campos, M.L.M. 2021. LigADOS: Interlinking Datasets in Open Data Portal Platforms on the Semantic Web. Metadata and Semantic Research, Springer International Publishing, 73–84.

Degbelo, A., Granell, C., Trilles, S., Bhattacharya, D., and Wissing, J. 2020. Tell me how my open Data is re-used: Increasing transparency through the Open City Toolkit. In: Open Cities| Open Data. Springer, 311–330.

Folmer, E., Beek, W., Rietveld, L., Ronzhin, S., Geerling, R., and Haan, D. den. 2019. Enhancing the Usefulness of Open Governmental Data with Linked Data Viewing Techniques. Proceedings of the 52nd Hawaii International Conference on System Sciences, 2912–2921.

Gascó-Hernández, M., Martin, E.G., Reggi, L., Pyo, S., and Luna-Reyes, L.F. 2018. Promoting the use of open government data: Cases of training and engagement. Government Information Quarterly 35, 2, 233–242.

Gebre, E.H. and Morales, E. 2020. How “accessible” is open data? Analysis of context-related information and users’ comments in open datasets. Information and Learning Sciences 121, 1/2, 19–36.

Gebru, T., Morgenstern, J., Vecchione, B., et al. 2020. Datasheets for Datasets. arXiv:1803.09010 [cs].

González-Mora, C., Garrigós, I., Mazón, J.-N., Casteleyn, S., and Firmenich, S. 2021. Open Data Accessibility Based on Voice Commands. Web Engineering, Springer International Publishing, 456–463.

Heil, E. and Neumaier, S. 2018. Reboting.com: Towards Geo-search and Visualization of Austrian Open Data. ESWC 2018: The Semantic Web: ESWC 2018 Satellite Events.

Hogan, M., Ojo, A., Harney, O., et al. 2017. Governance, Transparency and the Collaborative Design of Open Data Collaboration Platforms: Understanding Barriers, Options, and Needs. In: A. Ojo and J. Millard, eds., Government 3.0 Next Generation Government Technology Infrastructure and Services: Roadmaps, Enabling Technologies & Challenges. Springer International Publishing, Cham, 299–332.

Holland, S., Hosny, A., Newman, S., Joseph, J., and Chmielinski, K. 2018. The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards. arXiv:1805.03677 [cs].

Kacprzak, E., Koesten, L., Ibáñez, L.-D., Blount, T., Tennison, J., and Simperl, E. 2019. Characterising dataset search analysis of search logs and data requests. Journal of Web Semantics 55, 37–55.

Koesten, L., Simperl, E., Blount, T., Kacprzak, E., and Tennison, J. 2020. Everything you always wanted to know about a dataset: Studies in data summarisation. International Journal of Human-Computer Studies 135, 102367.

Kubler, S., Robert, J., Le Traon, Y., Umbrich, J., and Neumaier, S. 2016. Open Data Portal Quality Comparison using AHP. Proceedings of the 17th International Digital Government Research Conference on Digital Government Research, Association for Computing Machinery, 397–407.

Lafortune, G. and Ubaldi, B. 2018. OECD 2017 OURdata index: Methodology and results.

Lnenicka, M., Luterek, M., and Nikiforova, A. 2022. Benchmarking open data efforts through indices and rankings: Assessing development and contexts of use. Telematics and Informatics 66, 101745.

Lnenicka, M. and Nikiforova, A. 2021. Transparency-by-design: What is the role of open data portals? Telematics and Informatics 61, 101605.

Lněnička, M., Machova, R., Volejníková, J., Linhartová, V., Knezackova, R., and Hub, M. 2021. Enhancing transparency through open government data: The case of data portals and their features and capabilities. Online Information Review 45, 6, 1021–1038.

Lourenço, R.P. 2015. An analysis of open government portals: A perspective of transparency for accountability. Government Information Quarterly 32, 3, 323–332.

Luthfi, A., Janssen, M., and Crompvoets, J. 2020. Stakeholder Tensions in Decision-Making for Opening Government Data. Business Modeling and Software Design, Springer International Publishing, 331–340.

Máchová, R., Hub, M., and Lnenicka, M. 2018. Usability evaluation of open data portals: Evaluating data discoverability, accessibility, and reusability from a stakeholders’ perspective. Aslib Journal of Information Management 70, 3, 252–268.

Máchová, R. and Lnénicka, M. 2017. Evaluating the Quality of Open Data Portals on the National Level. Journal of theoretical and applied electronic commerce research 12, 1, 21–41.

Marmier, A. and Mettler, T. 2020. Developing an index for measuring OGD publisher compliance to good practice standards: Insights from opendata.swiss. Information Polity 25, 1, 91–110.

Neumaier, S., Savenkov, V., and Polleres, A. 2018. Geo-Semantic Labelling of Open Data. Procedia Computer Science 137, 9–20.

Neumaier, S., Savenkov, V., and Vakulenko, S. 2017. Talking Open Data. https://arxiv.org/abs/1705.00894.

Neumaier, S. and Umbrich, J. 2016. Measures for Assessing the Data Freshness in Open Data Portals. 2016 2nd International Conference on Open and Big Data (OBD), 17–24.

Neumaier, S., Umbrich, J., and Polleres, A. 2016. Automated Quality Assessment of Metadata across Open Data Portals. Journal of Data and Information Quality 8, 1, 2:1–2:29.

Nikiforova, A. 2020. Timeliness of Open Data in Open Government Data Portals Through Pandemic-related Data: A long data way from the publisher to the user. 2020 Fourth International Conference on Multimedia Computing, Networking and Applications (MCNA), 131–138.

Nikiforova, A. and McBride, K. 2021. Open government data portal usability: A user-centred usability analysis of 41 open government data portals. Telematics and Informatics 58, 101539.

Nogueras-Iso, J., Lacasta, J., Ureña-Cámara, M.A., and Ariza-López, F.J. 2021. Quality of Metadata in Open Data Portals. IEEE Access 9, 60364–60382.

OECD. 2018. Open data portals: Enabling government as a platform. OECD, Paris.

Osagie, E., Waqar, M., Adebayo, S., Stasiewicz, A., Porwol, L., and Ojo, A. 2017. Usability Evaluation of an Open Data Platform. Proceedings of the 18th Annual International Conference on Digital Government Research, Association for Computing Machinery, 495–504.

Parycek, P., Hochtl, J., and Ginner, M. 2014. Open Government Data Implementation Evaluation. Journal of theoretical and applied electronic commerce research 9, 2, 80–99.

Pelucchi, M., Psaila, G., and Toccu, M. 2018. Enhanced Querying of Open Data Portals. Web Information Systems and Technologies, Springer International Publishing, 179–201.

Perez, P., Pettit, C., Barns, S., Doig, J., and Ticzon, C. 2020. An information management strategy for city data hubs: Open data strategies for large organisations. In: Open Cities| Open Data. Springer, 289–309.

Pinto, H. dos S., Bernardini, F., and Viterbo, J. 2018. How cities categorize datasets in their open data portals: An exploratory analysis. Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age, Association for Computing Machinery, 1–9.

Reis, J.R., Viterbo, J., and Bernardini, F. 2018. A rationale for data governance as an approach to tackle recurrent drawbacks in open data portals. Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age, ACM, 1–9.

Ruijer, E., Grimmelikhuijsen, S., Hogan, M., Enzerink, S., Ojo, A., and Meijer, A. 2017. Connecting societal issues, users and data. Scenario-based design of open data platforms. Government Information Quarterly 34, 3, 470–480.

Sabri, N.A.M., Emran, N.A., and Harum, N. 2019. Government Open Data Portals: A Measurement of Data Veracity Coverage. International Journal of Innovative Technology and Exploring Engineering 8, 12.

Sáez Martín, A., Rosario, A.H.D., and Pérez, M.D.C.C. 2016. An International Analysis of the Quality of Open Government Data Portals. Social Science Computer Review 34, 3, 298–311.

Sánchez-Nielsen, E., Morales, A., Mendo, O., and Chávez-Gutiérrez, F. 2021. SuDaMa: Sustainable Open Government Data Management Framework for Long-Term Publishing and Consumption. IEEE Access 9, 151841–151863.

Seoane, M.V. and Hornidge, A.-K. 2021. The Politics of Data Portals in Inter- And Transdisciplinary Research. Journal of Information Systems and Technology Management 17, 0.

Šlibar, B., Oreški, D., and Begičević Reep, N. 2021. Importance of the Open Data Assessment: An Insight Into the (Meta) Data Quality Dimensions. SAGE Open 11, 2, 21582440211023178.

Thorsby, J., Stowers, G.N.L., Wolslegel, K., and Tumbuan, E. 2017. Understanding the content and features of open data portals in American cities. Government Information Quarterly 34, 1, 53–61.

Tygel, A., Auer, S., Debattista, J., Orlandi, F., and Campos, M.L.M. 2015. Towards Cleaning-up Open Data Portals: A Metadata Reconciliation Approach. arXiv.

van der Waal, S., Węcel, K., Ermilov, I., Janev, V., Milošević, U., and Wainwright, M. 2014. Lifting Open Data Portals to the Data Web. In: S. Auer, V. Bryl and S. Tramp, eds., Linked Open Data – Creating Knowledge Out of Interlinked Data: Results of the LOD2 Project. Springer International Publishing, Cham, 175–195.

van Knippenberg, L. 2020. Open Data Maturity Report 2020. Publications Office, LU.

Venkatachalapathy, A., Sharma, A., Knickerbocker, S., and Hawkins, N. 2020. A Rubric-Driven Evaluation of Open Data Portals and Their Data in Transportation. Journal of Big Data Analytics in Transportation 2, 2, 181–198.

Weber, N. and Yan, A. 2017. Integrating user feedback with open data quality models. Proceedings of the Association for Information Science and Technology 54, 1, 824–826.

Weber, T., Mitöhner, J., Neumaier, S., and Polleres, A. 2020. ODArchive Creating an Archive for Structured Data from Open Data Portals. The Semantic Web ISWC 2020, Springer International Publishing, 311–327.

Wenige, L., Stadler, C., Martin, M., Figura, R., Sauter, R., and Frank, C.W. 2021. Open Data and the Status Quo – A Fine-Grained Evaluation Framework for Open Data Quality and an Analysis of Open Data portals in Germany. arXiv:2106.09590 [cs].

Wu, D., Xu, H., Yongyi, W., and Zhu, H. 2021. Quality of government health data in COVID-19: Definition and testing of an open government health data quality evaluation framework. Library Hi Tech ahead-of-print, ahead-of-print.

Yang, H.-C., Lin, C.S., and Yu, P.-H. 2015. Toward Automatic Assessment of the Categorization Structure of Open Data Portals. Multidisciplinary Social Networks Research, Springer, 372–380.

Zhu, X. and Freeman, M.A. 2019. An evaluation of U.S. Municipal open data portals: A user interaction framework. Journal of the Association for Information Science and Technology 70, 1, 27–37.

Zuiderwijk, A., Janssen, M., Choenni, S., Meijer, R., and Alibaks, R.S. 2012. Socio-technical Impediments of Open Data. Electronic Journal of e-Government 10, 2, pp156–172.

Comments
1
Ivan Begtin:

Hi!

Thanks a lot, I’ve found a lot of ideas to think about.

It would be even better if it could be possible to analyze ideas and suggestions as feature matrix table with products, existing and proposed features.

Since I do both open data and commercial data products I could say that I see that open data ecosystem is big, but data science ecosystems evolves much much faster. It’s not just about enterprise meta-data catalogues and portals, it’s about modern data stacks and hundreds of open source and commercial tools and services like dbt, Snowflake, Meltano, Trino, Hadoop and so on. A lot of innovations happens in this data science ecosystem.

A few examples:

  • Collibra data products include intelligence tools to determine data fields types automatically

  • we teach our algorithms in DataCrafter (russian lang) project to automatically document missing data

  • DataIku provides complex (expensive) collaboration space to improve data management

  • Bit.io helps to convert dataset to the Postgres SQL database instance

There are several meta-data and data standards outside open data ecosystem:

  • Open Metadata - data discovery metadata for enterprise meta-data catalogs

  • data formats like Apache Orc, Parquet, Avro and many others used by data scientists commonly and not published on the open data portals

And much more other examples exists too.

If not to look on these tools and innovations closely than most likely open data portals slowly will be replaced by portals for developers and data catalogues that will be just small public part of government/organization data ecosystem.