CAQDAS and the New Data Challenge
Ricardo B. Contreras
Note from the editor: Although this article does not provide recommendations for the use of ATLAS.ti, which is the purpose of our Best Practices articles, we decided to publish it because it represents a contribution to the conversation about how computer assisted qualitative data analysis software (hereafter CAQDAS) faces the challenges imposed by the emergence of new kinds of data, particularly those data derived from online sources.
In this article we discuss the new data types that have become available in recent years and how CAQDAS, including ATLAS.ti, have changed to respond to this challenge. As the new data are often more sizeable in volume Kahneman’s theory of fast and slow thinking is adapted as framework for analyzing these, but of course also including conventional data types. A list of ATLAS.ti tools is suggested that can be used for ‘fast’ and ‘slow’ analysis. This is followed by a comparison with Big Data analysis. Besides the hype about big data, the quest for meaning and the need for human interpretation still exist. In addition to providing tools for the analysis of larger data sets, CAQDAS still has a very important role to play in the analysis of small or thick data and in facilitating in-depth exploration of sociocultural processes.
New data, new tools?
CAQDAS has originally been developed to support the process of qualitative data analysis. Given the time it takes to conduct a qualitative data analysis, sample sizes in qualitative projects have been rather small. Mark (2010) investigating 560 qualitative interview studies found that the sample size ranged from 1 to 95 with a median and mean sample size of 28, and 31 respectively.
In recent years, the data landscape has been changing. It may not even be necessary to go out and collect data for a research project since the data is already available. It only needs to be downloaded and added to a CAQDAS package for analysis. This applies to social media data like Twitter or articles and contributions published online. Also new are the direct import options for open-ended questions from online surveys, data collected in and by reference managers, and data stored with Evernotes. The upcoming new Windows and Mac versions of ATLAS.ti will also allow for the analysis of these data types.
What can be expected from these new features? For example, when importing documents from your favorite reference managers like EndNotes, Mendeley or Zotero, you will be able to use ATLAS.ti to gain a better in-depth understanding of those articles that you already have identified as being valuable for your literature review chapter. You can begin to write critical appraisals and compare and contrast the various theories used, references made, or results obtained by different researchers that published in different journals or at different times. ATLAS.ti will help you find quotes related to specific coded themes that you want to use in your writing; or it can give you an overview of topics by journal, author or over time.
If you want to use some Twitter data as part of a project containing data from different methods of data collections, such as interview data, observational field notes, photographs and archival research data, ATLAS.ti is a good choice as it allows you to analyze all of these data in one single project (hence, facilitates triangulation). analyze stand-alone analysis of Twitter data with ATLAS.it is of course also possible. Especially, ifyou want to go deeper in the exploration of meanings and relationships represented in the Twitter posts .
An example of this is shown in Figure 2 below. There, a network view based on imported Twitter data is shown that displays two automatically generated links, namely mentions and location. These type of links are already available in the data and therefore ATLAS.ti can make use of them. Up until now, links always had to be established by the analyst including the relation type that describes the link.
When importing twitter data, you can choose to let ATLAS.ti code hashtags, “mentions”, locations and languages. In addition, code-code links for author-location and author-mentions can automatically be created. This allows you to gain a quick overview of the various hashtags used, by whom the messages were sent, from which geographical location, and how many and which tweets were retweeted. The Code Manager will give you a quick overview of the types of topics and their frequencies, the Code-Cooccurence Explorer and Table show you which topics are frequently mentioned together, how various topics are spread geographically and by language, and which topics are more mentioned than others. The network function (see Figure 1) visualizes that data and thereby provides some first quick insights as well.
Based on this, you can go a step further and visit the quotations behind the various themes (hashtags) and begin to gain a deeper understanding of the discourse. A hashtag by itself does not reveal the meaning behind it. Even though the messages are short, their intention can be manifold. They may be informative and reflect what is already summarized by the hashtag. They may however also be full of irony, thereby reversing the meaning the hashtag conveys. We will only know when taking a closer look. Often twitter posts refer to other material. By following up the link, we can add this material to ATLAS.ti as well and include it into the ongoing analysis – and soon we will be in the middle of a deeper analysis that will help us to better understand the first overview tables and networks that are generated by the software.
Given that more data now is available that has not been specifically collected by a researcher for a particular study, it will be quite useful to have some tools available that allow for some first quick inspection followed up by further in-depth analysis (Friese, 2016). This is precisely what ATLAS.ti will allow with Twitter data.
Fast and slow thinking – a new framework for new data
These two ways of approaching analysis (a first quick inspection and a “slow” in-depth follow up) are inspired by the writings of Kahnemann (2011) who described two modes of thinking – fast thinking (system 1) and slow thinking (system 2):
System 1 runs automatically and System 2 is normally in a comfortable low-effort mode…. System 1 continuously generates suggestions for System 2: impressions, intuitions, intentions, and feelings. …. When System 1 runs into difficulty, it calls on System 2 to support more detailed and specific processing that may solve the problem of the moment (p. 23).
As more and more data become available, it is impossible for System 2 to analyze it all. Therefore, there is a need to develop new tools that appeal to System 1’s way of thinking, namely being fast and intuitive. Based on many experimental studies, Kahnemann and his colleagues found out that System 1 decisions, judgments and evaluations are in many situations good enough, but are easily influenced by illusions and biases. Therefore System 2 is necessary as a corrective instance, although it needs more incentives to begin to work, takes more efforts and energy and is much slower. As Anselm Strauss pointed out in Legewie et al. (2004), research is hard work, it’s always a bit of suffering. We would like to add – but it is also very rewarding and exciting if you put in some effort and stick to it. In the preface of the ATLAS.ti version 4 manual Strauss wrote: ATLAS.ti will not “perform miracles for your research – you will have to have the ideas and the gifts to do exceptional research.” This still applies today. What ATLAS.ti can do for you is to provide the tools that support you to deal with the variety of data that now has become available.
The following are fast-thinking tools in ATLAS.ti:
- Code-Cooccurence Table, can also be used in combination with filters
- Code-Document Table
- Word Cruncher, can also be used in combination with filters
- Auto coding
- Import neighbor option in networks – can be quotations, codes, memos, groups / for codes, for quotations, for documents, memos and for all groups
- import co-occuring codes in networks
The following are tools that are coming soon and that fall within the ‘fast thinking’ perspective:
- word clouds and word lists for selected quotations, or word clouds for the content of one or more selected codes
- auto coded twitter data
- networks based on twitter data
- automatically grouped documents from reference managers
- and more – we will keep you updated in upcoming newsletters.
Slow-thinking tools in ATLAS.ti are:
- Follow up analysis in the query tool /query option in the Quotation Manager (Mac) based on the results produced by the various tables
- Query tool / query option in the Quotation and Code Manager (Mac) – as these are researcher driven and the results are quotations that should be read and interpreted by the researcher
- reading the data behind the numbers provided by the tables
- Reading and checking through auto coded segments
- developing a coding system
- Working on the quotation level, writing interpretations into quotation comments, building hyperlinks
- Memo writing
- Building conceptual networks
New Data and Big Data
Before we leave you to go and play with the system 1 and system 2 tools in ATLAS.ti, we would like to juxtapose New Data to the buzz word Big Data, as from time to time we observe that users want to add more and more data into an ATLAS.ti project to run machine driven analysis that is more suitable for tools in the area of Big Data Analytics.
According to the NIST Big Data Public Working Group (2015), Big Data “consists of extensive datasets-primarily in the characteristics of volume, variety, velocity, and/or variability-that require a scalable architecture for efficient storage, manipulation, and analysis” (page 5). Ebay’s main data warehouse for instance includes more than two petabytes of user data. You have never heard about petabytes? To give you an idea: One petabyte is:
1.000.000.000.000.000 = 1015bytes = 1000 terabytes.
In order to analyze data of this magnitude, it requires parallel software running on tens, hundreds, or even thousands of servers. To give you some names of software tools that are used for the analysis of such data: Hadoop, MapReduce, R or Spark, used by companies like Google, Facebook, AOL, Baidu, IBM or Yahoo. The outcome are numbers, and the visualization of such numbers. What increasingly is reported missing in Big Data analysis is “the human meaning behind the numbers” (Noyes, 2015). Valerie Strauss (2016) just recently called attention to the need to focus on what she refers to as ‘small data’. In discussing the political decisions that have been made based on PISA data , she points out that it is time to shed light on the educational processes as they happen in the classroom:
…to improve teaching and learning, it behooves reformers to pay more attention to small data – to the diversity and beauty that exists in every classroom – and the causation they reveal in the present. If we don’t start leading through small data, we might find out soon enough that we are being led by big data and spurious correlations.
Another interesting twist to terminology has been added by Wang (2013) with a wink to Geerz term “thick description” (Geertz, 1973). According to Wang, only Thick Data through the act of collecting and analyzing stories provide inspiration and produce insights. In this article, Wang gives examples of areas of collaboration between big data and thick data inquiries in business, a field in which qualitative inquiry has not been traditionally dominant but now, in the time of ‘big data’, is gaining acceptance. Among these are the cases of companies that have traditionally relied on market analysis for designing corporate strategy and insight generation, and are now relying more on ‘thick data’ to really understand what consumers really want. Lindstrom (2016), in the book Small Data: The Tiny Clues That Uncover Huge Trends also provides interesting case studies of the value of small data analysis in the business world.
Given the existence of online sources of data, CAQDAS projects can potentially manage large sets of data. But, still, CAQDAS does not analyze ‘big data’. If we look at the magnitude and scale of Big Data, the difference becomes obvious in terms of input, throughput and output. What CAQDAS in the world of new data is offering is the ability to support users in gaining a quick overview over larger (not big) data sets using system 1 tools. In addition, CAQDAS traditionally supports the closer look the various authors have been calling on, whether they call it small data or thick data or the need for human interpretation giving meaning to data. As it is obvious that there are limits to what a human interpreter can handle and manage even with the support of software, big data analytics are needed that count, structure, and visualize.
Small data or thick data analysis, in the other hand, provide insights into that which text statistics cannot illuminate: in-depth understanding of units of analysis such as meanings, representations, motivations and expectations. We would argue that small or thick data inquiry brings us back to the person or community, and keeps us from falling into the trap of ‘datafication’ or ‘dataism’ (see Lohmeier 2014:77 for a discussion of the challenges of over-focusing on big data). This is, precisely, the natural domain of CAQDAS and it is the area in which ATLAS.ti will keep on making key contributions to high quality data analysis.
To understand human behavior at a time when new kinds of data are produced and collected in massive quantities through online tools (e.g., Twitter and the like), there is no substitute for the detailed, process oriented and in-depth exploration that is only possible through qualitative methodology. ATLAS.ti accepts and embraces its responsibility in the field of CAQDAS to continue assisting researchers as they engage in qualitative research, embracing new data types in seeking to understand experiences, meanings, and motivations of behavior and of sociocultural processes in general. As ATLAS.ti moves into the future, new tools will continue to be developed to help researchers shed light on all kinds of data that can be analyzed fast – utilizing system 1 – and slow – putting to use our system 2 way of thinking.
 Similar considerations apply to Twitter data. If you are interested in analyzing ten thousand or more Twitter posts, you may want to take a look at tools specifically created for the analysis of Twitter data, such as the Chorus Project (click here).
 In 2002, Kahnemann was awarded the Nobel Memorial Prize in Economic Sciences (shared with Vernon L. Smith).
Friese, Susanne (2016). Qualitative data analysis software: The state of the art. Special Issue: Qualitative Research in the Digital Humanities, Bosch, Reinoud (Ed.), KWALON, 61, 21(1), 34-45.
Geertz, Clifford (1973). The Interpretation of Cultures: Selected Essays by Clifford Geertz. New York: Basic Books.
Legewie, Heiner & Schervier-Legewie, Barbara (2004). “Forschung ist harte Arbeit, es ist immer ein Stück Leiden damit verbunden. Deshalb muss es auf der anderen Seite Spaß machen”. Anselm Strauss im Interview mit Heiner Legewie und Barbara Schervier-Legewie [90 Absätze]. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 5(3), Art. 22, http://nbn-resolving.de/urn:nbn:de:0114-fqs0403222.
Lindstrom, Martin (20016). Small Data: The Tiny Clues That Uncover Huge Trends. New York: St. Martin’s Press.
Lohmeier, Christine (2014). The Researcher and The Never-Ending Field: Reconsidering Big Data and Digital Ethnography. In Martin Hand and Sam Hillard (eds.), Big Data? Qualitative Approaches to Digital Research (Pp. 75-90). Bingley, UK: Emerald Group Publishing Limited.
Mason, Mark (2010). Sample Size and Saturation in PhD Studies Using Qualitative Interviews [63 paragraphs]. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 11(3), Art. 8, http://nbn-resolving.de/urn:nbn:de:0114-fqs100387.
Noyes, Katherine (2015). Why big data isn’t always the answer: Qualitative data can provide deeper insight into customers, behaviors and trends. http://www.pcworld.idg.com.au/article/582402/why-big-data-isn-t-always-answer/
Strauss, Valery (2016). Answer Sheet. ‘Big data’ was supposed to fix education. It didn’t. It’s time for ‘small data.’ https://www.washingtonpost.com/news/answer-sheet/wp/2016/05/09/big-data-was-supposed-to-fix-education-it-didnt-its-time-for-small-data/
Wang, Tricia (2013). Big data needs thick data. Ethnography matters. http://ethnographymatters.net/blog/2013/05/13/big-data-needs-thick-data/
Ward, Jonathan Stuart and Barker, Adam (2013). Undefined by data: A survey of big data definitions. arXiv: 1309.5821v1 [cs.DB]. 20 Sept 2013. http://arxiv.org/pdf/1309.5821.pdf
National Institute of Standards and Technology. U.S. Department of Commerce (2015). Big Data Interoperability Framework: Volume 1, Definitions. http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1500-1.pdf
About the authors
Dr. Susanne Friese started working with computer software for qualitative data analysis in 1992. Her initial contact with CAQDAS tools was from 1992 to 1994, as she was employed at QualisResearch in the USA. In following years, she worked with the CAQDAS Project in England (1994 – 1996), where she taught classes on The Ethnograph and Nud*ist. Two additional software programs, MAXQDA and ATLAS.ti, followed shortly. Susanne has accompanied numerous projects around the world in a consulting capacity, authored didactic materials and is one of the principal contributors to the ATLAS.ti User’s Manual, sample projects and other documentations. In 2012 / 2014 her book “Qualitative Data Analysis with ATLAS.ti” was published with SAGE publications.
Dr. Ricardo Contreras is an international consultant on qualitative methods and computer-assisted qualitative data analysis, with an emphasis on ATLAS.ti. He has a doctoral degree in applied anthropology from the University of South Florida. He directs the training division and the office for the Americas of ATLAS.ti Scientific Software Development Gmbh.