Primary Document Families: An Essential Procedure for Data Exploration and Analysis
Author: Ricardo B. Contreras
In this article, I will discuss primary document family (PD family) organization in ATLAS.ti 7 for Windows. Appropriate document organization is crucial for an optimal data analysis process with ATLAS.ti. You will hear about PD families as a concept, the process of creating families manually as well as the process of importing families from a spreadsheet, and the role of Super PD families.
Why should you pay attention to Primary Document Families?
In my experience as a trainer, I have noticed that people often go directly into coding; ignoring aspects related to document organization. Although coding is no doubt a fulfilling process, it might not always be a good idea to rush into it without first paying attention to initial organizational tasks. By skipping these tasks, the potential for an effective data exploration and analysis process is unnecessarily restricted.
Primary document families: A definition
Primary document families are groups of a project’s documents that represent attributes relevant for analysis. For instance, if it is relevant to analyze the data across the age of participants, their gender and ethnicity, or according to the site of data collection, then it makes sense to group the project documents in families according to those attributes. Please note that one document can be a member of multiple families and deleting a family does not affect the integrity of the documents that belong to it. While families can be created at any time during the analysis process, I recommend that you create PD families early in your project. They will allow you to start interrogating your data across documents early on — normally a good idea since qualitative data analysis normally follows an iterative rather than lineal process. Once you have created your PD families, you can interrogate your project in different ways, such as:
- Using the Query Tool: What did female residents of the Eternal Springtime neighborhood say about access to public transportation?
- Using the Code-PD Table tool: How much did male residents of the Sunnyside neighborhood say about access to public transportation, in comparison to male residents in the Eternal Springtime neighborhood?
- Using the Code Co-occurrence Tree and Table: In what context did the women interviewed in the Eternal Springtime neighborhood discuss access to public transportation? Did they associate access to public transportation with economics, culture, or politics? If so, what is the meaning behind those associations? How many times did they associate those concepts with each other?
Having a good PD family system facilitates the formulation of questions such as these because it allows data exploration across groups of participants and documents. A good primary document family system will allow you to set the stage for effective data exploration and analysis. Start early on, ideally as you set up your project. Although your documents can be organized in multiple ways according to their characterizing attributes, not all of those attributes might be relevant from an analytical point of view. That is why I recommend that you begin by determining what is and what is not relevant for you. A table like the one below can help.
Please note: In the name of the PD family, each attribute is separated from its value with a double colon (::), e.g. “Gender::Female.” This nomenclature facilitates the process of importing and exporting PD family tables from and to spreadsheets.
Creating PD families manually
In the Primary Document Manager (PD Manager), select a set of documents and drag and drop them into the “Families” side panel. Once you drop the documents there, a window opens up and that is where you write down the name of the family. For example, “Site::Sunnyside.”
Once the PD families have been created, you can see them in the PD Manager. If you select one of the families in the side panel, the documents that belong to it will be shown on the right hand side, as follows:
Importing a primary document table from a spreadsheet
Instead of creating them manually inside of the ATLAS.ti project, PD families can be created by importing a spreadsheet containing the documents’ characterizing attributes. I recommend that you first add the documents into the project. Following, export the primary document structure, open the spreadsheet and enter into it the document attributes. Finally, import the spreadsheet (complete with document attributes) back to ATLAS.ti. As a result, the project documents will be automatically grouped into families according to the attributes specified in the spreadsheet. See below the three steps to follow.
Step 1. Export the PD family table
After the documents have been added to the project, the primary document table can be exported as either a comma/semicolon-separated value (CSV) file or a tab-delimited (XLS) file, both of which can be read by Excel and OpenOffice Calc. See below.
Step 2. Add the document attributes to spreadsheet
Once you have exported your PD table, you can enter attributes and their values directly into the spreadsheet:
- Categorical variables will receive a hash (#) as the column header prefix, e.g. #Gender
- Yes/no variables (dichotomous) will receive a “1” for Yes/Presence and a “0” for No/Absence
- All missing values must have a 0 in the corresponding spreadsheet cell.
See the figure below.
Step 3. Import the PD family table
Next, import the table into ATLAS.ti. As a result, you will be able to see the PD families in the PD Manager. Note that if you need to import new families later, just export the table again, insert the additional data, and re-import it. The figures below show the menu to import the PD family table and the families already created and shown in the PD Manager.
Please note: This is not the only way to import PD families from spreadsheets. Instead of starting by exporting the original primary document structure, as I suggest in this article, you may decide to start by creating the spreadsheet from scratch and then importing it. However, I think that starting by exporting the spreadsheet with the PD family structure is easier and safer since you avoid making mistakes in the way columns and documents are named. (Pages 245-248 of the ATLAS.ti manual describe the PD family table importation process. Access it here).
Working with Super PD families
Super PD families allow you to further group your documents by combining families using Boolean operators. For example, if you want to explore what women from the Sunnyside neighborhood had to say about accessing public transportation, you would combine the “Site::Sunnyside” family with the “Gender::Female” family using the AND Boolean operator. (The AND Boolean operator retrieves the data that lies at the intersection of two subsets, in this case “Sunnyside neighborhood” and “Women”). As ATLAS.ti allows you to combine PD families, when creating them you need to only think about the very basic attributes that characterize your documents; do not try to create families that represent all possible combinations. Super document families can be created in two ways: in the PD Manager (using the Venn diagram symbol) and in the Super Document Family Manager.
A. Creating Super PD families in the PD Manager
- Click on the Venn diagram in the PD Manager to change it from OR to AND (if you want an AND combination)
- Select two or more families that you want to combine holding down the Ctrl-key, right-click and select “Create Super Family.” The resulting Super PD family will be available on the PD Manager and can be used to explore data across documents.
B. Creating Super PD families in the Super Document Family Manager
Alternatively, you may create super document families using the Super Family Tool, which can be accessed through the PD Family Manager. Through this tool, PD families can be combined using the OR, AND, XOR (Exclusively OR) and the NOT Boolean operators. The figures below depict the process of creating super document families in the Super Family Tool.
About the Author
Ricardo B. Contreras is an applied cultural anthropologist with a Licenciatura degree from the Universidad de Chile and master’s and doctoral degrees from the University of South Florida. He is the President of Ethnographica Sociocultural Research and director of the Training & Partnership Development division of ATLAS.ti Scientific Software Development GmbH, as well as director of the company’s Office for the Americas. Ricardo holds an adjunct research position in the Department of Anthropology at East Carolina University. His research lies at the intersection of migration and community health. His latest publications can be found in the edited book “(Mis) Managing Migration”, edited by David Griffith and published by SAR Press in 2014. Ricardo can be contacted at “ricardo.contreras at atlasti.com”.