June 23, 2020

Integrating High Volume Psychographic Coding into Quantitative Studies

Integrating High Volume Psychographic Coding into Quantitative Studies

Written by: Dr. Mark Szabo

An age-old challenge with surveys is the limitation placed on the number of open-text questions that one can practically implement. As researchers, we often limit, consciously or otherwise, the number and scope of open-text responses in quantitative studies because we know how much effort it takes in the analysis phase.

With the ubiquityand ease of machine learning, those days may be over.

We recently ran a public engagement survey for a large municipal infrastructure project and got more than we bargained for. We had hoped for 3000 responses and ended up with over 14,000. This article will outline how we tackled the challenge using natural language processing to code large volumes of qualitative input and re-integrate it into the quantitative studies for psychographic significance testing. 

Here’s how we didit.

1. Plan out thepsychographics as well as the demographics.

Running a qualitative approach without a framework is like running a quantitative study without a hypothesis. You can find nuanced insights by examining your survey data against psychographics as well as demographics. It's nice to know how different age groups answer a question, but it's also very handy to understand how people with a specific thought, feeling or attitude answered it as well.

For our infrastructure project, we were able to analyze what the public wanted from the project based on what the project meant to them as individuals. We like digging into "meaning" questions because that's a useful framework for garnering the psychographic input we want.

2. Teach the AIhow to code for you.

We exported the qualitative data to the machine learning platform MonkeyLearn. We then trained the machine how we wanted to code and tag the data, using a sample of the overall data. For our project, we coded about 400 answers, which the machine later applied to all 14,000. This workflow helps reach a better coding and tagging result.

As you train the machine, you're also refining your own coding model because MonkeyLearn allows you to change, test and iterate your coding as you go. This is important if you're using grounded theory to build the thematic analysis from scratch. When you get to the point that the machine can predict your coding approach, then you know that your tags are appropriate to the data. This process also lends itself to group coding (including client input!), for cases where you want to further reduce intersubjective variability.

We ran the modelon all 14,000 questions, and the output tagged each question based on how wetrained the machine. After a few iterations, we were able to get the predictivemodel to over 93% accuracy.

3. Re-integrate the data into your survey.

The next trick was to re-integrate the coded qualitative data into the survey. We were using SurveyGizmo, but the process would be the same for other platforms. We created a new multi-select question in the survey, and the answers aligned with the themes that were developed in the AI coding. In this case, we had about 12 themes that the model helped create, so the new question had 12 possible answers. We then imported the data back into SurveyGizmo, making sure to match the responses to each respondent's ID tag, so we could tell who answered what.

4. Slice and dice with the psychographic themes.

The rest was theusual straightforward analysis. We were able to take the 12 psychographicthemes and run significance tests on the rest of the survey, in the same way,one uses demographic data.

For situationswhere you have a large volume of qualitative data, this approach can behelpful. We would often prefer to do a cluster segmentation using latent classor k-means, but this can be a viable alternative to generating insights. Ipredict this will become easier, as survey platforms increase their ability toleverage AI for qualitative analysis.

In either case, the key is to have a robust psychographic framework, or you will end up doing a fishing expedition in your quantitative studies. We like using “meaning,” but there are many other approaches.

Happy hunting!

Dr. Mark Szabo
Director, Insights & Engagement