ESRC – Operationalising Scaled Production and Sharing of Synthetic Data
Closing Date: 09/05/2023
Funding available to evaluate the use of low-fidelity synthetic versions of datasets.
The Economic and Social Research Council (ESRC) is providing funding to evaluate the use of low-fidelity synthetic versions of datasets held securely within:
- UK Data Service
- Office for National Statistics (ONS) Secure Research Service
- Other trusted research environments (TREs).
The successful team will need to:
- Identify a collection of low-fidelity synthetic versions of secure datasets that are currently available for researchers to access for inclusion in the evaluation. Datasets for the analysis should include but are not limited to synthetic versions of:
- Annual Survey of Hours and Earnings
- Grading and Admissions Data for England
- Ministry of Justice Data First datasets
- National Pupil Database and Longitudinal Education Outcomes, when these become available
- Hospital Episode Statistics
Proposals can also include the creation of new synthetic data in cases where applicants can justify a need for the purposes of evaluating systems-wide operationalisation more generally.
- Evaluate the broad set of costs associated with creating synthetic data for data owners and TREs including initial and ongoing costs (for example, updates).
- Evaluate different models for sharing synthetic data, including implications for data owners or data providers in resourcing sharing. This could include (but is not limited to):
- data production
- ingest and curation procedures
- metadata sharing
- discoverability through the use of existing data catalogues
- Evaluate efficiencies for data owners and TREs when synthetic data are available, including but not limited to:
- impact on the TRE resources in terms of for example, time spent responding to researchers’ requests for information about the data
- impact on secure environment usage load, run times, etc.
- uptake of different synthetic datasets by researchers, and influence this has on the demand for the real data
- Evaluate the use of low-fidelity synthetic data on researchers’ experience of carrying out research using secure administrative or social survey data, including but not limited to:
- utility of the synthetic data for users to understand the data, as well as scope research questions, in advance of applying for access to the real data
- impact on quality of applications to access data for example, success rate of project applications submitted through the UK Statistics Authority Research Accreditation Service project approval times, and any other impacts on the project accreditation process
- utility of the synthetic data to develop and test code outside of the secure environment, either while waiting for access to the real data, or after access has been granted
- To make recommendations for further scaled production and sharing of low-fidelity synthetic data which are acceptable to data owners and to the public, including identifying opportunities for automation to increase efficiency. Although the focus of the project should be on low-fidelity synthetic data, the evaluation should also reflect on how the operationalisation of high-fidelity synthetic data might fare and what additional considerations might be needed. Note that additional research is not expected, but more the provision of an informed response based on the exploration undertaken for these objectives.
The successful team is expected to work with the ADR UK Strategic Hub (including the communication and engagement, and the programme management office teams) and ESRC data and infrastructure team. The team will communicate the work to the public and relevant stakeholders and facilitate meaningful engagement with relevant communities.
|Funding body||UK Research and Innovation (UKRI)|
|Category||Economic and Social Research|
|Fund or call||Fund|