Processing free text is the domain of artificial intelligence, a discipline that we nonprofit data scientists are learning now. The new processing program, Python, even has a package called Beautiful Soup which parses websites, the text within them, and the HTML tags marking any variety of content. The R program also has a package called SentimentAnalysis which assigns a sentiment to different words by using the package’s dictionary, called a lexicon.
But we can do some of this analysis without an artificial intelligence program. Here are some steps to work your way into trying out text parsing, starting with the easy stuff and working toward the sophisticated stuff.
- **Bag-of-Words Model:** This theory involves splitting text into single words, assigning a predetermined value to each parsed word, and then counting them up. Your version of this could be a word cloud, which we at Staupell have done using Excel and Tableau. Assigning meaning to the words requires a separate lexicon, which you can then use to add value to your word cloud.
- Phrase parsing: Called NGRAMS in some programs, parsing text into 1-, then 2-, then 3-word phrases allows for finding those phrases which indicate the sentiment that you’re looking for. I use WEKA to process text this way, but IBM’s Text Analytics software also intuitively identifies phrases depending on the lexicon (dictionary) that you use. I have even used SQL to do the work. When I have worked with IBM’s product, I have set my lexicon for client satisfaction, but the product can create a custom lexicon, so that phrases like, “promoted to” can be marked with a “career” tag.
- Looking for specific triggers. This exercise can even be done in Excel using the “match” function. Words like, “sold”, “gave”, or your organization’s name along with a quote can be identified. If you are using Python or R, use regular expressions to find them. Then flag them.
Try some of these tricks and see what you can glean from contact reports. And let us know what you find.