Posts

BYU COCA

Image
BYU COCA The BYU COCA is one of the most comprehensive and easiest to use corpus tools in the world today. It requires a login, but is otherwise a free software. Because it’s online, it doesn’t require a computer download, making it accessible from any computer with an internet connection. Personally, this is one of my favorite corpus tools. I use it mostly to analyze newspapers in the US for race and gender portrayals. Because newspapers follow the AP style guide, it is easy to search for terms used consistently to refer to different groups of people. The AP style guide requires that race should only be included when that information is pertinent to the story. For example, when the first person of a specific race or nationality is elected to a political office, the race is included. The terms for race are standardized so that every time a story references a person of Latinx descent, the term “Hispanic” is used. That makes it easy to search for these subjects because the res...

Google's N-Gram viewer

Image
Google N-gram viewer This software is freely available through google, but its scope is very limited. It uses Google Books, which has books dating back centuries that are digitized and can be searched for specific terms. Because of this, it was possible to search through every book on Google Books for a search term and see when and how it was being used over time. This is a useful tool for linguists, but also for English scholars in a broader sense, because it shows the use of language over time. A linguist might use it to study language changes over time, but a researcher in literature might use it to compare how a term was being used in one text to how common the term is at that time, or how it is being used in other works of the same period. Searching and N-gram graphs. My first search on the N-gram viewer was for the word “teenagers” because I knew from my undergrad capstone that the concept is a fairly new one. As expected, the N-gram graph shows that the word does n...

Voyant Tools

Image
Voyant Tools: I stumbled upon Voyant Tools in a class on Digital Rhetoric. As an example of technology assisting analysis, we were tasked with uploading an entire book from Project Gutenberg and having the software analyze it. The word clouds generated from one my favorite texts surprised me and caused me to start using the software whenever I’m stuck on what to analyze in a text. For fun, I just uploaded the entirety of Henry James’s The Turn of the Screw into Voyant. I haven’t read this text, but it’s on my to do list the moment I have spare time for reading. The interface for Voyant appears below with this text being analyzed The tools I’m most familiar with are the word cloud, bubbliness and correlations, which is basically a different kind of collocation analysis. Word cloud: The world cloud is exactly like the one that Facebook generates for users. It shows the most commonly used words as the largest, following in size with less commonly used words. This vis...

AntConc

Image
Antconc: An introduction My first work with AntConc was in a class on world English varieties. As part of a major project, I was required to download one of the ICE corpora for a variety of English. Being a sucker for punishment, I opted for Canadian English because there isn’t much difference outside of accent and inflection. This gave me a good initiation to how the software works because I had to really search to find differences. This was around the time I was getting interested in linguistics so I started playing around with the software for fun. Being freeware, AntConc is great for classes because it provides an overview of general corpus tools but doesn’t add additional costs to students. However, it is also less user-friendly and harder to use without more experienced direction. The formatting for corpus samples can be difficult, because AntConc can only read .txt documents. This erases all formatting and causes some errors. When uploading books, I found that aste...

Getting started

Corpus linguistics has a long and rich history with a great number of misconceptions. The term itself first appears in the 80's, but early analog methods of corpus linguistics have been around much longer. In essence, corpus linguistics is a means of studying a sample of words using a computer. For example, the BYU COCA compiles a massive amount of text from media outlets, such as Scientific American, or The New York Times. From that sample, we can extract every instance of word being used in the media compiled, and draw conclusions about that word's representation (or lack thereof). The beauty of corpus linguistics, as I mentioned in my previous post, is that it provides empirical data that is more objective than other means of study in linguistics. This allows for stronger arguments and more effective proofs. For example, it can be argued that minorities are portrayed negatively in media. It is far more effective to be able to say that out of the 800 times that black people ...

Corpus Linguistics introduction

Going into English was a decision I made a very long time ago, and never looked back until I entered grad school. In a lot of ways, literature is the reason I'm so successful now. Growing up where I did, there were three pastimes: get high, get drunk, or get pregnant. I built a fourth option of play video games and read books. When I ran out of books to read, I read encyclopedias and studied dictionaries. This saved my early education. Being from an area that was a mixture of methlab trailer parks and high value ranch land, teachers would peg a student for success or failure by the first week. Because I always had my nose in a book and taught myself a few years of schooling at home, I guess I was assumed to be from the ranch land. Not that being white didn't help immensely. Growing up, I became aware of the inequities of the world quickly. My friends who were just as smart as me were pegged for failure for no real reason. I grew up wanting to fix that, and becoming a teacher ...