Creating a Coding Scheme with ATLAS.ti by Susanne Friese
‘Coding means that we attach labels to segments of data that depict what each segment is about. Through coding, we raise analytic questions about our data […]. Coding distils data, sorts them, and gives us an analytic handle for making comparisons with other segments of data.’
(Charmaz, 2014: 4)
‘Coding is the strategy that moves data from diffuse and messy text to organized ideas about what is going on.’
(Richards and Morse, 2013:167)
Coding is a core function in ATLAS.ti that lets you ‘tell’ the software where the interesting things are in your data. Coding in a technical sense simply means assigning a label to a data segment. A better-known term these days is tagging. The goal of tagging is to find the things you tagged using the tag name. The software uses the words ‘code’ and ‘coding’, as almost all the other CAQDAS do. My guess is that this is because of the popularity of grounded theory at the time when the first programs were developed in the late 1980s and early 1990s. Coding in CAQDAS, however, is very different from grounded theory coding in the methodological sense (see Friese, 2016 and 2019). If you are more comfortable with the idea of tagging, in what follows simply replace the terms ‘code’ and ‘coding’ in your mind with ‘tag’ and ‘tagging’.
A code in ATLAS.ti can be a simple description, a concept, a category, a subcategory or a wildcard that modifies a link in a network. The software itself does not dictate how to use a code. It only provides this entity as an item in the toolbox. In this article I give you some guidance on how to use the ATLAS.ti toolbox to build a coding system that helps you with advancing your analysis. To get started, I would like to play a virtual puzzle with you. You probably have played a puzzle before at some point in your life. You can apply the skills you learned when playing puzzles to coding qualitative data.
Imagine you are sitting at a table. On the table there are 1,000 parts of a jigsaw puzzle with the picture side up (Figure 1). Now it’s your turn. Your task is to put the puzzle together. How do you go about it?
Most people would answer that either they begin with the corners and the edges or they sort by colors or shapes. Let’s begin with the corners and edges. Why do you think that most people begin like that? These pieces are easy to recognize since they have at least one straight edge. When it comes to analyzing a project in ATLAS.ti, I likewise recommend you begin with what is easiest. In the second article of this series, I wrote about how to make use of already known characteristics, incorporating them into the file names for easy sorting and ordering of the documents within the project. I also showed you how to create document groups like male, female, location, age, etc. By starting to work on a project in this way, you have literally framed it (Figure 2).
As soon as the frame is laid, the next task is to examine the other pieces in the puzzle. You could try to find those parts that belong together. But that’s probably tedious. A better strategy is to sort the parts by color and similarity in terms of what is visible on them (Figure 3). The sample puzzle here depicts a castle with a forest around it and a lake in the upper-right-hand corner (I know this as I have seen the lid of the box!).
The next step is to take a closer look at one of the piles. Let’s look at the pieces of the puzzle that look like they belong to the castle. Approaching this strategically, one looks for puzzle pieces that belong to a part of the castle, like the roof, the towers, the windows or the battlements. In other words, one segments the castle into sub-units (Figure 4). This process is repeated for all pre-sorted stacks until everything can be put together to complete the puzzle.
Pre-sorting the parts of the puzzle into piles of similar elements is like coding the data by major themes. Segmenting the piles into smaller sub-units is like building subcategories. Even if you are not a seasoned puzzle solver, just growing up you acquired skills in everyday life that you can apply to constructing categories and subcategories. You do it every day and you learned the technique a long time ago when discovering the world as a child. You may have first realized that a certain animal is a dog and then used the word ‘dog’ for the various kinds of dogs. Later you learned that they are beagles, boxers, golden retrievers, poodles, mongrels, etc. Developing subcategories for your data is not much different.
If you have collected lots of quotations under a common label, the next step is to look through them, as I did with the castle pieces. After reading or looking at a few quotations, you will quickly notice where the commonalities are. Having coded the data, the software makes it easy to retrieve all segments that belong to one topic and to take a second, closer look. The aim is to develop subcategories so that you can bring some order to the pile and differentiate the various aspects of the topic area you are looking at.
Differences between playing a puzzle and coding qualitative data
Having pointed out the similarities between playing a puzzle and coding qualitative data, I should say that there are also some differences. Qualitative data are not yet broken down into parts. This happens in ATLAS.ti when you create quotations. This can be done while you code, or as a distinct analytic step before coding. This is for instance very useful when working with video data.
Often when playing a puzzle, the individual parts are initially sorted into larger clusters. When coding data, the main topics may not be so obvious, and it is a process to develop them. Sometimes one encodes at the level of subcategories, or even lower, at the descriptive level. And only over time does it become clear which codes belong to a higher-order category.
If you generate lots of codes, i.e. if you quickly have 500 or more codes, be aware that you are basically only naming each piece of the puzzle. You are not sorting and organizing your data yet. If you notice that you are doing this, you should pause and stop coding. Analyze which of the codes can be aggregated so that more data segments can be collected in them. Technically, this means you will be merging codes. The goal is to sort and organize the data rather than just naming each element of the puzzle.
When you play a normal puzzle, the lid shows you what the finished puzzle should look like in the end. By comparison, in a qualitative research study, we usually do not have a template that shows what the result should be. The researcher probably has some ideas based on existing literature. But the answers to the research questions will only emerge through the analysis process. There is a certain kind of puzzle that is similar in strategy to the qualitative analysis approach: the so-called WASGIJ puzzle (this is ‘jigsaw’ backwards). The finished puzzle does not match the picture on the box. Instead, the player must assume the role of one of the persons on the cover and put himself in the place of that person. The final image corresponds to what this person sees from his or her perspective. The solution thus results from the process of putting together the parts.
There are also jigsaw puzzles without a template for advanced players with experience. Those puzzles are comparable to a project where it is difficult to find existing literature or earlier research on the subject matter. Thus, you may have a challenging time developing detailed research questions based on previous knowledge, and the only option you have is to go into the field and start collecting data. Grounded theory studies often are like this. Like a puzzle without a template, such a project is proportionately more difficult than one which is guided by research questions. First ideas for coding can be derived from research questions, from theories, from the literature or from the interview guideline. Ideas for coding in the grounded theory sense can also emerge from the data, but this is not so easy for a beginning researcher. As Kelle and Kluge note:
novices in the field of social research have a particularly tough time following recommendations like: “let theoretical concepts emerge from your data material”. For them, such attempts might result in drowning in data material for months (2010: 19).
Therefore, I recommend that you do not start with the most difficult kind of puzzle (i.e. methodical approach), if qualitative data analysis is new to you. A thematic analysis is easier than a grounded theory study, just to give an example.
First steps in developing a code system
Unless you want to code deductively using an existing framework, keep an open mind when you begin to code your data, notice as many things as you can and collect them via coding. If you feel that it is important to read all the data first and to write down notes on a piece of paper before you create codes in ATLAS.ti, then this is a suitable way to proceed. If, however, after reading some of your data, you already have some ideas for codes, then go straight ahead and start coding in ATLAS.ti. Do whatever feels most natural to you.
At first, you will generate lots of new codes; in time, you will reuse more and more of the codes that you already have, and you won’t need to create new ones. You have reached a first saturation point. In technical terms, this means you will drag and drop existing codes from the Code Manager or navigation panel onto the data segments. At this stage, you have roughly described the various elements in the data. As soon as you reach this point – that is, when you no longer add new codes (or only a few) and mostly use drag-and-drop coding – it is time to review your coding system. If you do it at a much later stage it will need more work, because then you will have to go through all the documents again to apply newly developed subcategories and recheck all other codings. I recommend that for this first phase you work on those documents in your data that are most different. This way it is more likely that you come across the bandwidth of topics.
Let’s assume you have taken your first round of coding up to this point. Those coders who naturally develop a mix of descriptive and abstract codes will have around 100 codes, depending on the project. Smaller student projects may hold around 50–70 codes. The cleaning up and restructuring of a first code list is done within the software. When you do it on paper, you need to apply the changes inside ATLAS.ti in a second step.
This is also an appropriate time to export your project (see article 2 of this series). This preserves the original coding and allows you to compare it later with more advanced versions of your project. In this way you can, for example, describe in your method section how you got from A to B and C in your project.
If you have noticed a lot of things – let’s say you already have 300 or more codes after coding a few interviews – your codes are probably very descriptive. Coders of this type are referred to in the literature as splinters (Guest et al., 2012; Bernard and Ryan, 2010). If you are splinter, you need to stop coding new data at this point, review your coding and begin to merge your codes. As a splinter, you may find it difficult to let go of your codes through merging for fear of losing something. I can assure you that this is not going to happen. After merging and reorganizing your codes, you will have a single code that might hold ten quotations in their original form. This is far better than ten codes that only summarize one data segment each (= one piece of the puzzle). The need to push codes from a descriptive to a conceptual has also been described by Corbin and Strauss:
One of the mistakes beginning analysts make is to fail to differentiate between levels of concepts. They don’t start early in the analytic process differentiating lower-level explanatory concepts from the larger ideas or higher-level concepts that seem to unite them. … If an analyst does not begin to differentiate at this early stage of analysis, he or she is likely to end up with pages and pages of concepts and no idea how they fit together. (2008: 165)
This also applies to computer-aided analysis, although it is no problem for the computer to manage 1,000 or more codes. Instead of being conducive to your analysis, a high number of codes prevents further analysis. Creating too many codes is one of the dangers you will encounter on your journey through the qualitative data landscape.
Building categories and subcategories
The first categories that you develop are likely to be provisional, as they are based on very little coding. With more coding, they are likely to change and develop further. I like Saldaña’s idea of first cycle and second cycle coding (Saldaña, 2013: 8). The idea of the cycle fits the nature of the N-C-T model, where you have seen that qualitative analysis is cyclical rather than linear. First cycle coding, according to Saldaña (2009: 45), refers to those processes that happen during the initial coding. These are the ideas you notice and collect when you begin the coding process. Second-cycle coding is the next step. From experience, I would like to add that there are at least a third and fourth cycle of coding as well. When coding data in ATLAS.ti, the aim of this process is to develop a structured code list based on a subsample of your data. Once you have developed a first structure, you can apply the codes to the remaining data. You will likely continue to make changes to the code list and refine the structure the more you code. But this is OK.
Other authors also describe the coding process in a similar way (see, e.g., Bazeley, 2013; Bazeley and Richards, 2000; Charmaz, 2006; Fielding and Lee, 1998, Kuckartz, 1995; Richards, 2009; Richards and Morse, 2013; Silver and Lewins, 2014). Richards (2009), for example, refers to it as a catalog of codes. As advantages of a well-sorted catalog she mentions speed, reliability and efficiency. The problem, as I see repeatedly in my everyday work, is the translation of this process into mouse clicks and the technicalities of it in a software environment. Even if users know the technical aspects of coding, on the one hand, and read the useful tips, on the other, they often find it difficult to apply these skills. It is not so difficult, but neither is it self-explanatory. Therefore, the following two video tutorials walk you through the process of category and subcategory building.
Naming the codes in your code system to build a hierarchical structure
The code list in ATLAS.ti is linear and by default sorted in alphabetical order. Therefore, you need to tweak the code labels to structure the list. I usually add a prefixe followed by an underscore or a colon. It is important that you separate the prefix that names the main category from the subcategory name. This way, all subcategories are automatically sorted under the main category name.
You see in Figure 6 that there are a few codes that could not be classified into a category, since it was too early to decide. The code ‘study critique’, for instance, may turn into a category of its own at some point.
If there is a third-person perspective, there is likely to be a first-person perspective as well if you read on. For those codes not to “get lost” in the categories, I add an asterisk (*) in front of the code name as a little trick to force them to the top of the list due to the alphabetical ordering. This way, it is easier to spot where you still have codes that do not belong anywhere.
For the main category name, I use capital letters to distinguish between types of codes like category or subcategory. Sometimes you will need to play around with the prefixes so that the codes are in the order that you want them to be, with the main category name on top. In order to prevent that the code label gets to long taking up lots of screen space, I recommend using abbreviations (see Figure 6).
To distinguish codes by type or methodological level, I have developed a naming convention (see Table 1). Codes in upper-case represent categories, codes in lower-case with a prefix signify subcategories, codes with a hashtag are attribute codes and so on. The categories in upper-case also have the effect that they stand out like a title.
Table1: Syntax for different types and levels of codes
|What||Naming in ATLAS.ti||Example||Level/type|
|Category||Upper-case letters/colored||EFFECTS POS||level|
|Subcategory||Lower-case letters / same color as category code||
Effects pos: personal growth
Effects pos: gives meaning
|Unsorted concepts||Lower-case letters prefixed with an asterisk (*), no color||
|Dimension||Lower-case letters prefixed with a special character like a forward slash, colored by category||
|Socio-demographics, names if speakers, actors, locations, organizations, etc. need to be coded||Lower-case letter prefixed with a hashtag (#) or @ for speakers, gray||
Bazeley, Pat (2013). Qualitative Data Analysis: Pratical Strategies. London: Sage.
Bazeley, Pat and Richards, Lyn (2000). The NVivo Qualitative Project Book. London: Sage.
Bernard, Russel H. and Ryan, Gery W. (2010). Analysing Qualitative Data: Systematic Approaches. London: Sage.
Charmaz, Kathy (2006/2014). Constructing Grounded Theory: A Practical Guide Through Qualitative Analysis. London: Sage.
Corbin, Juliet and Strauss, Anselm (2008/2015). Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory (3rd and 4th ed.). Thousand Oaks, CA: Sage.
Fielding, Nigel G. and Raymond, M. Lee (1998). Computer Analysis and Qualitative Research. London: Sage.
Friese, Susanne (2019). Grounded Theory Analysis and CAQDAS: A happy pairing or remodeling GT to QDA? In Tony Bryant and Kathy Charmaz (eds.), chapter 11. The SAGE Handbook of Grounded Theory. London: Sage.
Friese, Susanne (2016). CAQDAS and Grounded Theory Analysis. MMG Working Paper 16-07
Guest, Greg, Kathleen M. MacQueen, and Emily E. Namey (2012). Applied Thematic Analysis. Los Angeles: Sage.
Kelle, Udo und Kluge, Susann (2010). Vom Einzelfall zum Typus: Fallvergleich und Fallkontrastierung in der qualitativen Sozialforschung. Wiesbaden, VS Verlag.
Kuckartz, Udo (1995). Case-oriented quantification, in U. Kelle (ed.), Computer-Aided Qualitative Data Analysis: Theory, Methods and Practice. London: Sage. pp. 158–66.
Richards, Lyn (2009, 2ed). Handling qualitative data: a practical guide. London: Sage.
Richards, Lyn and Janice M. Morse (2013, 3ed). Readme first: for a user’s guide to Qualitative Methods. Los Angeles: Sage.
Saldaña, Jonny (2009/2013/2015). The Coding Manual for Qualitative Researchers. London: Sage.
Silver, Christine and Lewins, Ann (2014, 2ed). Using Software in Qualitative Research: A Step-by-step Guide. London: Sage.