Wednesday, June 20, 2012

DIY Corpus Linguistics--Using AntConc

AntConc is free, very powerful and easy-to-use software. 22 years ago, when I did my MA in the UK, I paid 75 pounds (maybe 200 dollars in today's money) for DOS software from Longman that could only do a few of the many things that AntConc does. What a blessing it is that Laurence Anthony and Waseda University are willing to give away this marvelous software for free. 


Let's see how we can use AntConc to analyze C-Collodi's Adventures of Pinocchio, a public domain (free, uncopyrighted) novel. 


This is AntConc's startup screen

1. Click on Open File (or Ctrl-F)

2. Choose Pinocchio

#1 Make sure you've loaded the correct file; #2 Click on the Word List tab; #3 Click on Sort by Frequency; #4 Click on Start to make a list of all the words in this story

There are 40,000+ words in this story, but only 3,790 of them are different. The most common word is the, which is used 1941 times. Pinocchio is used  454 times (of course: this story is all about Pinocchio).

Go down the list to find words which are only used 9 times. Carpenter is one of these words (Pinocchio is a wooden boy who was made by a carpenter). Next, click on the Concordance Plot tab to see where the word carpenter is used.

There are 9 hits, almost all of them at the beginning of the story. This is when Geppetto the carpenter made Pinocchio out of a piece of wood. There is one more hit at the 2/3 point.

Click on the line to see the last sentence in this story with the word carpenter.

Here is that sentence.

Click on the Collocations tab, type in the word fairy in the search box, and click on Start to see which words collocate with fairy. Good is used 11 times, 10 times on the left "Freq (L)" and 1 time on the right "Freq (R)." The word little collocates (= is used together with fairy) 7 times, always on the left (the writer only says "little fairy." He never says "fairy little".)