How to Make the Best of Codes in ATLAS.ti
Author: Dr. Susanne Friese
In this article, I write about codes, all the things they can be, all the things they ought not to be, and the various blind allies one can find oneself trapped in if coding data just happens without first having planned how to approach it.
Codes are labels that usually are linked to selected pieces of data. You may also want to think of them as tags. How you name them and on which level of abstraction you apply them is up to you. ATLAS.ti does not give further help in this matter, nor is there an intention to prescribe a certain way of coding. This is where methodological knowledge comes in, or at least some thoughts about how one wants to approach analysis.
Do you want to code inductively or deductively, or is your intent to apply a mixture of both? Is your aim simply to describe the data, to retrieve code frequencies and coded segments per code, and perhaps to run a few group comparisons? Are you interested how the discourse enfolds in your data? Do you want to apply a grounded theory approach? Is your study about people’s life stories, or do you want to conduct an actor-network analysis? The choices are plenty. Important is that you make an informed decision and do not simply let software features guide your analysis (see also Woolf and Silver, 2017).
What does this have to do with the entity “code” as offered by ATLAS.ti? The code is simply the object you use to label segments in your data. Methodological speaking a code can be a category, it can be a theme, it can be an attribute, a property, a dimension, a sub code, or more generally something higher or lower order. For organizational purposes, it might be placeholder. Some users also use them as “memos”, because codes can be linked to each other using named relations, which memos cannot (Woolf, 2015). And so on. Codes can thus fulfill many purposes in an analysis and it is best, if you apply them in a strategic manner rather than finding out at some point in your analysis that you ended up in a blind alley. If this happens, users either abandon the software and return to paper and pencil; or they print out all coded segment and continue their analysis on paper; or they express the wish to export everything to Excel; or write an email to me or another consultant and ask for help.
Blind Alley One: Too Many Codes
An example for a blind alley is generating too many codes, which in my book Qualitative Data Analysis with ATLAS.ti I refer to as code swamp (Friese, 2014). Too many codes can mean anything from 500 codes that were generated coding the first two interviews, up to 8000 or more. In the current version of the software, a ‘red flag’ is raised if this happens, even if unintendedly. The red flag is disguised in form of slow performance, e.g. if you want to group or merge codes. I am sure that the developers will soon solves this technically and the user will no longer notice any performance issues. This however does not mean that it is purposeful to generate that many codes. If you generate a lot of codes, this usually means that you apply each code only once or only a few times and that each of the code labels serve as a description of the data segment they code. ATLAS.ti offers a better choice for this purpose, namely quotation names (cf. Maietta, 2009); Woolf, 2014). A quotation is a piece of data that you have marked, like using a highlighter when reading an article. Often it is coded, but it does not need to be coded. It is an entity of its own. The default name consists of the first 60 characters when working with text documents, and the data file name when analyzing other media formats. This name can be changed. Instead of applying a code whose label describes the quotation content, you can simply create a quotation and label it by renaming the default name. In ATLAS.ti Win you would do this by opening the Quotation Manager alongside the document. In the Mac version, you use the inspector for this task.
Blind Alley Two: Avoiding Overlapping Codes
Another blind alley is to avoid overlapping codes because you have read somewhere that codes need to be distinct. This is correct when it comes to defining codes. Each code needs to be different from any other code and it should be clear how and when to apply them. This however does not prevent that two or more codes can be applied to the same or overlapping data segments. Only if this is the case, you can run code cooccurence analyses and see connections in your data. For instance, you may code for context and for actions conducted by different actors at a given point in time. This means to apply codes from four different categories: one that has sub codes about all the relevant contexts that occur in the data, one that holds all mentioned actors, one whose sub codes describe the various actions, and one category holding the time dimension. This allows for instance to relate actors and the various actions given a certain context or time. Or as shown in Figure 3, activities that were evaluated as either positive or negative within a given time.
Blind Alley Three: Developing a Category on the Wrong Level
Related to this is the issue of developing a category on the wrong level. There are two indicators for it if this happens. One is that your list of sub codes for a category grows longer and longer. If you have a category that has 20 or so sub codes, it is worthwhile to take a closer look. Let us call this category FRUITS. Sub codes at first were apples, pears, and bananas. Overtime you added green apples, red apples, and yellow apples, the same for pears and green and yellow for bananas; further you added apples: soft texture, apples: firm texture, pears: soft texture, pears: firm texture, bananas: soft texture, and bananas: firm texture. ‘FRUITS’ then no longer is the best higher order category. A better solution would be:
More examples are provided in the paper by Friese (2016).
A second indicator for a category being on the wrong level is very long code names and multiple use of the last part of the code label. Figure 5 shows an example. “fam: don’t have children” and “fam: have children” are not the best main categories to use here, as this results in a repetition at the various sub code levels.
Preferable is a category for attribute codes like having or not having children, whether the person is male or female, married or single, etc. In addition, you apply a thematic code, i.e. a code that indicates what this person is talking/writing about. Note, attribute codes only need to be applied if there are multiple speakers or actors within a document, as is the case with focus group or social network data. In a coded document this looks as follows:
Blind Alley Four: Using Code Groups as Categories
In writing the text above I have already introduced the use of codes for categories and sub codes. Due to reasons, I further detail below, the recommendation is to represent the various types and level of codes within the code list itself (and not to use code groups, smart codes or smart groups for it). A code can be used at various methodological levels, it can serve as category but also as a sub code, a dimension, and many other things. The way to differentiate codes is via their labels. The system I for instance have developed is to write all categories in capital letters, sub codes in small letters starting with a prefix that references the category; all attribute codes start with a hashtag, and all codes that cannot (yet) be sorted in to a category start with an asterisk (*). See table below.
Table 1: Syntax for the meanings of tags on the various levels (Friese, 2016)
|concept||Small letters, black||
In group – outgroup
|category||Capital letters, colored||WAR EXPERIENCE|
|sub-code||Small letters, colored like all other codes in the category||
War experience: inconsistencies
War experience: killing
War experience: survival
|concepts in developing a code schema||Small letters, prefixed by special character (*), black||
*about the enemy
*about being drafted
|dimension||Small letters, prefixed by special character, colored||
|Socio-demographics, i.e. if you code attribute of actors, group interviews / focus group data / comments of different people on a blog, comments on YouTube videos||Small letters, prefixed by # or any other special character, grey||
A common misperception is to use code groups and smart groups as categories or “codes” on a higher organizational level. This is not a good idea as a code can be a member of multiple code groups (or smart groups). This defeats the purpose of developing distinctive categories. Further, codes groups cannot be linked to each other, because a code can be a member of multiple groups and without this restriction circular, illogical relations were likely to occur (see Figure 7).
The purpose of code groups is to serve as filters. You may create a code group for each category and all its sub codes, if you want a filter at the category level. But you may also want to create filters that hold sub codes of three different categories, because you need this filter to explore one of your research questions. Thus, code groups and smart groups, as an easy way to combine already existing groups, are very useful tools throughout the analysis process. However, they are not the different components of a hierarchical coding system. If you want to build a hierarchical coding system, the recommendation is to present the various levels of your hierarchy all within the code list making use of the code labels for the various levels. This also simplifies querying your data at later stages of the analysis process. I will follow up with more detail how best to make use of code groups, smart groups, and smart codes in a future article. In the meantime, look at the Children & Happiness sample projects as a good practice example for how to build up a code system.
About the Author
Dr. Susanne Friese started working with computer software for qualitative data analysis in 1992. Her initial contact with CAQDAS tools was from 1992 to 1994, as she was employed at QualisResearch in the USA. In following years, she worked with the CAQDAS Project in England (1994 – 1996), where she taught classes on The Ethnograph and Nud*ist. Two additional software programs, MAXQDA and ATLAS.ti, followed shortly. Susanne has accompanied numerous projects around the world in a consulting capacity, authored didactic materials and is one of the principal contributors to the ATLAS.ti User’s Manual, sample projects and other documentations. In 2014 her book “Qualitative Data Analysis with ATLAS.ti” was published with SAGE publications.
Friese, Susanne (2016). CAQDAS and Grounded Theory Analysis. Working Papers WP 16-07
October 2016. (MMG Working Papers Print).
Friese, Susanne (2014). Qualitative Data Analysis with ATLAS.ti. London: SAGE.
Maietta, Ray C. (2009). The life of an ATLAS.ti quotation. ATLAS.ti Library.
Woolf, Nickolas (2015). Different Ways to Write about larger Themes or Concepts in ATLAS.ti. ATLAS.ti Research Blog.
Woolf, Nickolas (2014). Using quotation names for coding: An illustration from Grounded Theory. The ATLAS.ti Research Blog
Woolf, Nicholas and Silver, Christina (2017, forthcoming). Qualitative Analysis Using ATLAS.ti: The Five-Level QDAsm Method. London: Routledge.