Planning for Data Collection
What will this topic cover?
This topic forms part of a wider learning pathway and is designed to help you explore fundamental digital skills and review how you can use them to enhance your daily working practices and approaches. This learning topic, within the Intro to Data literacy pathway, focuses on the elements that you need to consider when planning to collect data.
This includes thinking about the information you need to gather, whether you are collecting new data or utilising a database, methods and approaches as well as some best practice guidance.
By the end of this topic, you will be able to:
• Understand the dangers of data silos.
• Understand the importance of planning for data collection.
• Identify and reflect on the storage and GDPR requirements of data.
• Understand the potential challenges with gathering data from external sources.

How to use this topic page
This topic page is split up into different sections. Each section has a step and an activity to complete. These include scenarios and links off to instructions to try elements for yourself. Each learning unit also has a reflective section to think about how this will be used within your own practice.
Step 1: Data Silos
What is a Data Silo?
A data silo is a store of data that is isolated away from centrally gathered data. This separate store of data may in some cases be useful, for example if gathering feedback on a service or gathering research information. However, there are some instances where information is accidently duplicated or can’t be found through central university systems. This can lead to inefficiencies, inconsistencies, and potential data integrity issues. To ensure effective data management and use, we recommend only collecting data for unique, targeted, and localised requirements.
Why are data silos challenging?
Data Silos can be difficult to avoid and quite often are created as colleagues aren’t aware of the data that is available via public or centralised resources. Data silos cause challenges with:
Collaboration:
Data silos cause challenges with collaborating between different departments. By keeping data within your own area, it means that others may miss out on valuable insights by not being able to access this data.
Inefficiency:
When data is siloed or separated, other departments across the University may store the same information separately. This can lead to duplication of efforts, wasted resources and can potentially, dependent on the data you are collecting, lead to asking the same group for the same information from multiple sources. This can cause frustrations in both students and staff.
Inconsistent data:
Data silos and keeping your own version of a snapshot of data or that is used for tracking leads to discrepancies in data. This often means central systems and local systems hold different records and this can make it difficult to get a clear picture of what the data represents.
Limited insights:
Since siloed data cannot link with wider data sources easily, it can lead to analysers finding it difficult to get accurate predictions or strategic directions since they are missing the holistic overview.
Activity 1: Scenario
Look at this scenario below, which is based on common practices within the higher education sector.
Samaran works as an administrator and focusses on student progression through the University. To help Samaran with their job, they have decided to download centralised data and build a way to track information to help them manage the student journey and with their timeline. They use it to store grades, outcomes and tracking between modules. An academic accesses the student’s information through the central provisioned system and uses the information to inform the student of an upcoming change which impacts their progression.
What challenges will Samaran, the academic and the student potentially face with this approach?
Samaran, the academic, and the student may face several challenges with this approach. Data accuracy and integrity could be a significant issue, as any errors in the centralised data could lead to incorrect tracking and decision-making. Data privacy and security are also critical concerns, as sensitive student information must be protected from unauthorised access and breaches. Additionally, system integration challenges may arise if the central system does not seamlessly integrate with other university systems, potentially causing delays or inconsistencies. For the academic, interpretation of data might be a challenge if the system is not user-friendly or if there is a lack of training on how to use it effectively. Finally, the student might face communication issues if they are not adequately informed about how the system works or how changes in their progression are determined. Addressing these challenges requires robust data management practices, strong security measures, effective training programs, and clear communication channels.
Step 2: Planning for Data Collection
Before gathering data it’s important to think of the requirements for the data and the potential sources that this may be gathered from. We will talk more about our internal data systems in Topic 3: Our Data systems. However, having an overview of what is available at the University is a vital first step to see if the data has already been collated.
Some key questions that you may want to consider for planning data.
- Does the data already exist within our own systems?
- What goals do I want to accomplish with this data collection?
- Exactly what data am I asking for?
- How am I going to analyse the data?
- What ethical considerations may be involved with collecting this data?
- What end point am I trying to achieve/demonstrate?
- Do I need to anonymise the data?
- Do I need to monitor impact or revisit this data at a later point in the year?
- If I need to revisit the data how will I make sure that the data is comparable to ensure consistency?
- What approach do I need to consider for collecting data and which will be my key target groups?
- If I am going to collect data how am I going to protect and store that data (GDPR)?
Creating a data collection plan
We would recommend creating a data collection and storage plan before collating information, this may still be needed even if you are gathering data for internal sources with identifiable data. We will break down some of the questions above to help you plan out your data collection.
What data do you need to collect and why?
This question is often a good first starting point, quite often, with multiple data sources available, does the data already exist or can it be combined with a new question set to gather the information that you need. By identifying the purpose and intended output of your data collection, it will make it easier to gather the correct data. You usually only have one opportunity to collect data so all the questions need to be appropriate to your intended output.
What requirements do you have?
We explored in Topic 1: An Introduction to Data, the differences between quantitative and qualitative data. Sometimes one or a mixture is more appropriate dependent on your intended output. It’s worth thinking about your intended output and which question types are more appropriate. Quantiative data is a lot easier to analyse but will only give you surface level reflections, where as qualitative data will enable you to have a wider insight but will take longer to analyse. However, knowing what mix of questions you are aiming for as well as the types of answers you are looking for will help you plan the data analysis in a later stage.
What data collection method are you going to choose?
Here are several ways to collect data, including surveys, interviews, and existing data sets. All of these elements come with their own challenges and need to be considered. .
Surveys – these are usually good for a quick touch point, can be done online or in person and since they have a set question set don’t require much in depth analysis.
Interviews – Interviews often require collation of information, discussions with the interviewers to ensure that they don’t influence feedback with a bias and usually requires categorizing and sorting through hours of data to come to a conclusion.
Existing data – Correlating existing data can be a way to save time and use data that has already been identified. However, we have to be careful that the data is used for appropriate reasons and that the data we are using is required for your particular need. We also have to check the accuracy of the data to ensure that we aren’t making links between elements that don’t exist or are misleading.
How are you going to store the data?
In line with General Data Protection Regulation (GDPR) legislation (web), we have to be careful when gathering data across the University. Some of the elements to consider is
- How is the data being collected?
- Who is the data being collected from?
- What will the data be used for?
- Where will the data be stored?
- How long the data will be stored?
- Who will have access to it.
- When will it be deleted?
- Is it anonymised?
The data itself needs to be stored safely within the University environment and should only be accessed by people who have a need to access the data. This can be done with in the Microsoft 365 environment using sharing permissions. We would recommend looking at the topic Collaborating Securely and Effectively within Microsoft 365 (web | lincoln.ac.uk) to get an overview of how this can be achieved.
Activity 2: Scenario
Look at this scenario below, which is based on common practices within the higher education sector.
Calliope has decided to gather data on student grades and progression rates based on different backgrounds. They have sent out a survey to gather the information from a small handful of random students from each. They have decided to map out the data to see if there is a difference between progression rates in different groups.
What concerns would you have over this approach?
There are several elements that Calliope must consider:
- Does this data already exist within the University, so is it being duplicated?
- How are they going to protect the data gathered and is it appropriate to ask for it?
- Does the small number of students cause results which can be generalised across a wider group?
- Did you think of any other wider concerns?
Realistically this data should be held in central systems and should be available through elements such as a dashboard. Calliope shouldn’t need to go to students to ask for this personal information and since this information is protected could cause issues with GDPR. They also didn’t look at the number of students they would need to make the results suitable for the wider population. Data like this needs to look at on a bigger anonymised scale to draw conclusions more accurately. We have listed some of the concerns for this data collection approach, although there are many wider concerns that Calliope will need to address with this method.
Step 3: Reflection
What have I learnt from this learning topic?
This step is designed to help you think about what you have learned and how this applies to your own practice and context. Learning Activity 3 will ask you some questions to help you with this reflection.
Activity 3: Reflect
Use the following questions to help you think about your own practice.
- What are your current data practices and can they be improved?
- What steps do you need to take to start implementing any changes?
- Do the changes affect others? How will you get their buy-in?
- How will you commincate the changes to others?
Return to Learning Pathways