Best Practice

Ethical Data Collection in Research | Principles & Procedures

The research ethics of data collection in qualitative research can be tricky to navigate, so this article outlines what to keep in mind in order to collect data in an ethical manner.

Roehl Sybing

Content creator and qualitative data expert

Introduction
Understanding human subjects research
Why are there ethical implications in data collection?
Adhering to ethical data collection
Further reading

Introduction

There is a whole host of ethical considerations involved in data collection practices that qualitative researchers should keep in mind before any data is collected. These ethical considerations are aimed at preventing harm to and ensuring the well-being of research participants that researchers are responsible for.

While ethical data collection touches on physical welfare, ethical practices in social science research also have to do with protecting research participants' personally identifiable information and making sure their participation in research remains confidential if necessary.

This article details the rationale for ethical research as well those ethical considerations specific to social science research so that researchers can stay informed about the need for ethical practices in data collection.

The ethics of data collection are required knowledge for all qualitative researchers.

Understanding human subjects research

Researchers in the social sciences have particular ethical considerations different from those in the physical, material, and natural sciences. Biologists have to contend with the ethics of taking animals away from their habitats and conducting experiments on them, while geologists and climate scientists try to study and collect data from natural environments without adverse effects. Social scientists work primarily with people, creating particular challenges that must be negotiated to uphold ethical principles regarding how data should be treated during the course of research.

Information about people is a particularly valuable resource, and the researcher as the most important instrument of data collection wields immense power when they are in possession of that information. When you work directly with people to gather data on their knowledge, opinions, and perspectives, you are essentially asking them to share information about themselves that could be used against them if the data is improperly handled. For their sake and for the sake of researchers who rely on the trust of research participants, the ethics of data collection should be comprehensively respected and upheld.

Why are there ethical implications in data collection?

There is a history of unethical practices in research that require researchers to engage in critical reflection about how they engage in collecting data. One of the most infamous examples of unethical research is the Tuskegee Experiment, where researchers not only failed to intervene in treating Black men suffering from syphilis but also directly misled them about their health, ultimately causing the deaths of more than a hundred research participants under their observation. This study is often cited alongside the Stanford prison experiment and the Milgram shock experiment as evidence for the need for moral obligations involved in conducting ethical data collection.

While the above experiments focus on the adverse effects of unethical medical research, there is an acknowledgment that non-invasive social science research can also be harmful to research participants if conducted without ethical principles. The risks involved with unethical data collection techniques in observational research often have to do with the mishandling of sensitive information or improper interaction within the research context. Improper data usage has implications for research participants' mental health as well as damages to their reputation in society.

The consequences of mishandling qualitative data are especially profound with respect to research involving vulnerable populations. Critical research in particular focuses on power dynamics and inequities affecting marginalized and otherwise disadvantaged groups of people.

Research in anthropology, sociology, medical research, and education has included the study of migrants, people displaced by war, hospice patients, and members of the LGBTQ community. Needless to say, such people are likely to be reluctant to volunteer their voices in research for any number of sensitive reasons, and the risks involved in mishandling data from such potential research participants can be varied and devastating.

Even with studies that do not primarily collect data from vulnerable populations, there are various cultural and social considerations about the use of private information. Asking for a stranger's phone number or email address, for example, is invariably frowned upon in any setting, let alone for the purposes of data collection.

Moreover, qualitative research often probes for more sensitive, more personally identifiable information such as medical or service details, where exposure could have detrimental effects to one's career prospects, reputation, or social standing.

Medical research today has strict ethical guidelines to prevent harm to participants. Photo by CDC.

Navigate the analysis process with ATLAS.ti's tools

Download a free trial today to see how you can identify critical insights from your data.

Free Trial

Adhering to ethical data collection

Understanding data ethics is a matter of discussing how data is collected and what data collection practices can prove detrimental to research participants' well-being for any number of reasons.

Protection from harm

Keeping the welfare of research participants in mind is the most fundamental objective on which all other ethical concerns are founded. For as long as a researcher is collecting data from the field, they are responsible for the well-being of the research participants they are studying.

The scope of this responsibility starts with ensuring that your data collection practices do not put participants in direct jeopardy. In experimental research, this means avoiding influences that might harm one's physical or mental health such as adverse medical treatments, dangerous physical activity, and verbal abuse.

Research where this sort of harm might be considered necessary usually undergoes the most stringent ethics review procedures and cannot be conducted without thoroughly informed consent from research participants.

In terms of naturalistic inquiry, this harm can also arise from damage to reputation, breach of privacy, or other effects caused by the mishandling of data. A good example of this reputational harm can be found in organizational research, where employees speaking negatively about their bosses may face repercussions if their data was shared publicly with identifiable information such as names and addresses.

Data collection practices that put research participants in jeopardy should be carefully scrutinized. Photo by Hu Chen.

Informed consent

Needless to say, the potential for harm will make research participants less likely to participate in a research study. In the Tuskegee experiment, Black men had given their consent after being given assurances of access to what is understood now as inadequate medical treatment. Had they known about the nature of the treatment they would receive, they would have been less likely to participate.

As a result, ethical data collection requires research participants to do more than simply consent to participate in a research project. In the post-Tuskegee era of research, clear and informed consent involves an understanding of the procedures involved in data collection, data usage, and data sharing. Researchers are required to know that their participants have a full awareness of the study and the risks involved.

Obtaining informed consent is thus one of the most important components of human subjects research. In most cases, the informed consent process involves having participants read through a detailed consent form that outlines their involvement in the study, the potentially adverse effects stemming from their participation, and the measures that researchers will take to protect the data collected in the study.

During the informed consent process in most human subjects research, the researcher explains the contents of the consent form and answers any questions that participants might have. Only when there is a full understanding of the study that the participants sign the consent form.

Informed consent is critical to primary data collection with research participants. Photo by Adeolu Eletu.

Data handling

Responsible data practices should be observed during the entire course of the study. Especially when data collection involves audio or video recording, researchers need to be mindful of what data will be used in published research dissemination and what data needs to be anonymized of any personal details that might lead to the identification of research participants.

For researchers in the European Union, the General Data Protection Regulation (GDPR) provides research participants with abundant data protection, limiting researchers to the dissemination of anonymous data and requiring them to secure data through encryption, secure storage, and other measures that limit access. Even outside of the European Union, institutions expect researchers to handle data along the same lines as those prescribed by the GDPR.

This includes other common sense measures such as secure disposal of data (e.g., destroying data rather than simply throwing printed data or electronic media in the trash), keeping data on local storage as opposed to cloud-based servers, and working with data in places that would be outside of public view. These procedures might seem too menial to mention in a research design, but rigorous ethics review often requires the researcher to acknowledge and detail these measures to ensure that data is being properly handled.

Researchers should be able to thoroughly explain how they properly handle data. Photo by Yura Fresh.

Privacy and confidentiality

Anonymous data and data privacy are core components of a research design that ensures research participants are protected from the adverse effects of data collection on reputation and status. In an era of strong passwords and online aliases, research participants generally expect that their information is protected and won't fall into the wrong hands.

In most cases, data ethics require researchers to anonymize data. In interview transcripts and surveys, any personal details such as names and addresses should be removed or modified to ensure the identities of research participants aren't revealed in research dissemination.

The use of pseudonyms is a common practice in qualitative research to facilitate the conversion of raw data to anonymous data. Human subjects research may identify participants by a code number (e.g., "Participant 34") or a different name (e.g., "Mark" instead of "Michael"). These codes take the place of real names in data excerpts in research papers or presentations.

Where personal information is essential to following up with participants for further data collection, ethics review may require a research design to include a codebook that is stored separately from the data. This codebook matches pseudonyms to real names and other details and is protected with the highest security measures such that other data is less likely to provide identifying information if that data is leaked.

Keep in mind that it is not always possible to ensure perfect anonymity, even with pseudonyms in place of direct identifiers. In ethnographic research studying small populations, for example, those with connections to the observed research participants may find it easy to determine the identities of those discussed in the data. Anyone who can access the anonymized data can make assumptions about research participants based on indirect identifiers such as workplace, salary, or age. The informed consent process should make this likelihood of identification clear to research participants to ensure a full awareness of the risks of participation in the study. Identifiers should also provide a certain level of ambiguity (e.g., a wide salary or age range) to mitigate the risk that audiences can identify specific people.

Finally, researchers are expected not to share raw data outside of research contexts. Research participants who give informed consent trust that their information or answers won't be shared on social media or other public forums without their express permission.