Voyant Tools


Voyant Tools:

I stumbled upon Voyant Tools in a class on Digital Rhetoric. As an example of technology assisting analysis, we were tasked with uploading an entire book from Project Gutenberg and having the software analyze it. The word clouds generated from one my favorite texts surprised me and caused me to start using the software whenever I’m stuck on what to analyze in a text.

For fun, I just uploaded the entirety of Henry James’s The Turn of the Screw into Voyant. I haven’t read this text, but it’s on my to do list the moment I have spare time for reading. The interface for Voyant appears below with this text being analyzed



The tools I’m most familiar with are the word cloud, bubbliness and correlations, which is basically a different kind of collocation analysis.

Word cloud:

The world cloud is exactly like the one that Facebook generates for users. It shows the most commonly used words as the largest, following in size with less commonly used words. This visual is interesting because it allows the researcher to see the frequency in a different light. In this case, “little” is the focal word, but we can also see the other most frequent words in comparison by size of font.

The term bar underneath the word cloud refers to how many words will be included. If one slides the bar to the halfway point, about 250 terms will appear. The bar maxes out at 500. At this point for The Turn of the Screw, some terms in the cloud only appear 6 or 7 times, making this massive word count less useful. However, longer works might find a larger window more useful.

Terms:

In same toolbar as the Word Cloud, one can change the view to a term frequency. This allows the researcher to see the entire list of words in the text by frequency. Words of equal frequency appear in alphabetical order.

The term tool is also useful in that the researcher can click on multiple terms and compare their location within the text. Clicking on “Mrs.” And “Grose” shows an almost identical trend line, suggesting that the words are collocates. This can be interesting for related search terms, like “eyes” and “face”. While these lines are kept similar for most of the graph, there is a point at the beginning of the text where “face” spikes as a term, while “eyes” drops. If I were writing about this, that would be a point of research.

Links:

The links tool is convenient for frequent collocates. “Little” is the most used word in the text. By hovering over that word, the lines are connected to the common collocates in the text.

Summary:

In the summary section, the researcher can find interesting information about the hard data of the text. In this case, the data shows that there are 42,827 words in the text with 4,496 of those being unique word forms. The fact that Voyant Tools parses out unique word forms is one of the main reasons I use it so often. A unique word form is one that is not a commonly used article, conjunction of basic part of speech. Running The Turn of the Screw through AntConc will produce the word “the” being used over 1,000 times, but this is of no interest to the researcher. “The” is going to be found constantly in works done in the English language.

Also of interest in the Summary sections are calculations for the average number of words per sentence, which is 17, and how dense the vocabulary is. This is calculated using the ratio of unique words to the total number of words. This correlates to how difficult the work is to understand.

As an example, The Turn of the Screw has a word density of .105, while the notorious Finnegan’s Wake has a density of 0.448, showing that Finnegan’s Wake is the more difficult of the two books.

Phrases:

This section returns the most commonly used phrases in the text using a search term. So, for “know” there are two instances of “know he”, while there are four for “know that”, meaning that the latter is a more commonly used phrase in the text. This is interesting because it can allow the researcher to see frequent phrases including search terms, but unfortunately the search term is required.

Contexts:

This shows the search term with the surrounding sentence in order from the beginning of the text. This works very similar to AntConc’s tool.

Bubblelines:

This is very similar to the concordance plot in AntConc. A search term is isolated in the text through a graph, but instead of lines, Voyant Tools uses bubbles. This is still useful, because it shows the word’s density within the text better than the line graphs from AntConc.

Correlations:

This is somewhat similar to collocation, but significantly different. The first term and second term are compared for appearing together or separately. Terms that have a correlation close to or equal to one appear frequently together, while those approaching zero represent an inverse correlation. As affair appears more often, produced appears more often. However, as the word “pool” appears, touch is absent.

Overall, Voyant Tools is a useful corpus tool. Unlike AntConc, the software doesn’t need to be downloaded onto a computer, but like AntConc, it is a free service. While I find that Voyant is most useful for stylistic concerns like commonly used language, or for topic generation, there are many other uses. I would argue that this one should be taught as part of the English program because it gives students tools for when they’re stuck on what to research, and allows them to see interesting correlations in rhetorical pieces.

Comments

Popular posts from this blog

Google's N-Gram viewer

Getting started