BYU COCA
BYU COCA
The BYU COCA is one of the most comprehensive and easiest to
use corpus tools in the world today. It requires a login, but is otherwise a
free software. Because it’s online, it doesn’t require a computer download,
making it accessible from any computer with an internet connection.
Personally, this is one of my favorite corpus tools. I use
it mostly to analyze newspapers in the US for race and gender portrayals.
Because newspapers follow the AP style guide, it is easy to search for terms
used consistently to refer to different groups of people.
The AP style guide requires that race should only be
included when that information is pertinent to the story. For example, when the
first person of a specific race or nationality is elected to a political
office, the race is included. The terms for race are standardized so that every
time a story references a person of Latinx descent, the term “Hispanic” is
used. That makes it easy to search for these subjects because the researcher
doesn’t have to also search for synonyms to get an accurate picture of their
portrayal by the media.
Corpus size and narrowing down
The BYU COCA is a massive corpus that contains newspapers,
magazines, spoken, and written language samples. Fortunately, there are tools
to shorten search parameters to specific uses. By clicking on “section” under
the search bar, you are given a choice of what sections to include in the
search. I often narrow my search to newspapers to research media portrayals of
different groups.
The corpus can also be narrowed down by a date range. This can
be useful for researching specific events, or comparing use over time. For
example, there was controversy over the handling of Hurricane Katrina that gave
rise to racial tensions. To research media portrayal by race, one would limit
the corpus to August 2005 through perhaps September 2005 to limit the results
to the aftermath of the hurricane. In addition, the corpus would limited to the
newspaper section. Then, following the AP style guide for the AP terms for
different races, one might look at the collocates for each racial term.
Previous research has shown that African Americans were more often associated
with looting, while Caucasians were associated with scavenging or salvaging. This
proves a negative portrayal that was biased against the African American race.
Frequency
The first tool in the BYU COCA toolbar is the list. This
will return the search term’s frequency across the selected sections. I have
limited my results to newspapers and searched for the term Wom*, so the BYU
COCA would return both instances of woman and women, and then searched again
for m*n so that it would return both man and men.
The results show that in the singular form, men are referred
to much more often than women, but in the plural form, women are referred to
much more often.
Chart
The chart shows how often a word is used in a graph form by
section. This can produce some surprising results. After searching for the
gender in the previous example, I opted to search for “man” for the chart tool.
The term was most commonly used in fiction, so I clicked on the fiction section
and found within the fiction section, it is most commonly used in movies.
For “woman”, fiction is also where the word has the highest
frequency, but in this case “woman” is used more in journal fiction than any
other type.
The chart tool can be useful for determining what words are
more commonly used in what situations and can be used for CDA research to
compare different marginalized groups to non-marginalized groups.
Context
By clicking on a specific section of frequency or clicking
on the toolbar, the researcher is brought to the context of each word. This a
table with every instance retrieved for the search term and provides the rest
of the sentence where the search term is found.
This is good for verifying results and adding depth to
research. A word may be found commonly in a specific text, but it may only be
used in a specific manner. That may change the interpretation of the results.
For example, the context for the word woman being found in
journals returns mostly results written by women. While an initial
interpretation without that context might suggest that journals contain more
female characters, the context makes this argument more complicated.
Collocates
Because the BYU COCA has a POS tagger, searching for
collocates through this software is often more rewarding than other software.
This tool allows a researcher to search for a subject term and only the verbs
that follow, or only the adjectives that precede.
This is useful for searching through newspaper for bias
because it can generate data that proves that X% of the time, X group is
portrayed negatively while X% of the time, X group is portrayed positively.
This allows the researcher to objectively state, with empirical data, that one
group is being portrayed negatively in comparison to other groups.
Compare
The comparison tool allows for what I’ve already described,
but in an easier manner. By searching two terms at the same time, both results
lists can be generated side by side. For this, I compared man and woman, and shortened
the results to preceding words by 3.
I believe HODA is a formatting mistake, as the context view
doesn’t make sense. This paints a somewhat disturbing view because of the fact
that “raping” is so high on the list for women, but there are fewer negative
collocates for men.
KWIC
This tool returns the most commonly used phrases surrounding
the term in order of frequency. I don’t personally use this one much, but it
can be used to see what phrases are associated with certain subjects.
The BYU COCA is an absolute godsend for linguists. While it
does require a login, the sign up is free and it’s available online so no extra
software needs to be downloaded. I do find myself frustrated with the website
at times because multiple searches being open in the same window causes the
software to log me out randomly. It can also be slow when analyzing large
amounts of data. However, the amount of functionality this software has,
coupled with user-friendly it is for researchers makes it an essential tool for
corpus linguists.



Comments
Post a Comment