Thursday, June 21, 2012

Corpus Linguistics -- Why It's Useful (雙語字典不可靠)

Free is good, right? Many lazy students like to use free online dictionaries to learn English. This is a good way to harm yourself by memorizing mistakes.

Here is a page from a well-known online English-Chinese dictionary:

Is this so-called "dictionary" reliable? Notice: It's not really a dictionary. It's only a glossary. Please think: If it's free, who will check for mistakes? If you find a mistake, can you complain? Can you get your money back?

Let's check COCA for sentences with lover:

Did you notice how lover collocates with attack, beat up, kill, murder, shoot? Is that what people do to 情侶?Please remember this English proverb: "You get what you pay for" (一分錢一分貨)! There are many excellent English-English learner's dictionaries to help you learn English. Spend a little money and time to learn how to use them. Please don't waste your time with unreliable bilingual dictionaries (不可靠的雙語字典).

Wednesday, June 20, 2012

Corpus Linguistics With COCA: 'Eat Dinner' vs 'Have Dinner'

Each time a word appears in a concordance, it is called a token (= an example). Which is more common, "eat dinner" or "have dinner?" Use COCA to find the answer:

Make sure these 3 settings are correct


Add caption

DIY Corpus Linguistics--Using AntConc

AntConc is free, very powerful and easy-to-use software. 22 years ago, when I did my MA in the UK, I paid 75 pounds (maybe 200 dollars in today's money) for DOS software from Longman that could only do a few of the many things that AntConc does. What a blessing it is that Laurence Anthony and Waseda University are willing to give away this marvelous software for free. 


Let's see how we can use AntConc to analyze C-Collodi's Adventures of Pinocchio, a public domain (free, uncopyrighted) novel. 


This is AntConc's startup screen

1. Click on Open File (or Ctrl-F)

2. Choose Pinocchio

#1 Make sure you've loaded the correct file; #2 Click on the Word List tab; #3 Click on Sort by Frequency; #4 Click on Start to make a list of all the words in this story

There are 40,000+ words in this story, but only 3,790 of them are different. The most common word is the, which is used 1941 times. Pinocchio is used  454 times (of course: this story is all about Pinocchio).

Go down the list to find words which are only used 9 times. Carpenter is one of these words (Pinocchio is a wooden boy who was made by a carpenter). Next, click on the Concordance Plot tab to see where the word carpenter is used.

There are 9 hits, almost all of them at the beginning of the story. This is when Geppetto the carpenter made Pinocchio out of a piece of wood. There is one more hit at the 2/3 point.

Click on the line to see the last sentence in this story with the word carpenter.

Here is that sentence.

Click on the Collocations tab, type in the word fairy in the search box, and click on Start to see which words collocate with fairy. Good is used 11 times, 10 times on the left "Freq (L)" and 1 time on the right "Freq (R)." The word little collocates (= is used together with fairy) 7 times, always on the left (the writer only says "little fairy." He never says "fairy little".)





Semantic relations--Gradable Opposites

Dead and alive are complementary opposites. A living thing is either dead or alive. It can't be both (except maybe viruses 濾過性病毒: Are they dead or alive?). Except as a joke, we can't say that an animal is very dead or slightly dead.  

Gradable opposites are different. Wet and dry are a pair of gradable adjectives. A thing can be soaking wet, very wet, or slightly wet; It can also be bone dry, parched, extremely dry or drier.

In American English, delicious is usually not gradable, but tasty is gradable. We don't usually say "very delicious" (Chinese English: if you do a COCA search for "very delicious," you will find that there are very few examples [probably foreigner English], but "very tasty" is quite common). That's why we don't say "Is it delicious?" (this sounds rather strange to English speakers' ears), but it's OK to say "Is it tasty?" Very tasty, extremely tasty, not so tasty, and tasteless are also OK.

Semantic relations--Complementary Opposites

The word antonym is made of two parts:

ant- (anti-) means opposite
-onym means name
so antonyms are words which have opposite meanings.

If we think about antonyms, however, we see that we have a problem: What does "opposite" mean? Some words seem to fit together: if you have one, you must have the other. This is called complementarity. The Yin Yang symbol on the South Korean flag is a beautiful example of complementarity. The cat picture below looks similar (So cute!). Do you see how they seem to fit together?

640px-Flag_of_South_Korea (Wikimedia).svg.png

Semantic Relations--Synonyms

Synonyms are words which have the same meaning. Of course, this is not completely true. There is almost always some kind of difference between two words. In the diagram at the bottom of this post, speak, say, and tell are synonyms of each other.


Semantic Relations--Hyponyms

Semantics deals with word meanings. There are many ways in which words can be semantically related. One of these is hyponymy.

Hypo- in Greek means "under"
and -onym means "name"
so a hyponym is an "under name."

is-a shows a semantic relationship.

"X" is-a "Z" and "Y" is-a "Z"

"X" and "Y" are hyponyms of "Z."
= The meaning X and Y is included in the meaning of Z.

Here are some examples:

{ABC...XYZ} and {abc...xyz} are all letters, so {ABC...XYZ} and {abc...xyz} are hyponyms of "letter." Capital letters and lower-case letters, vowel letters and consonant letters are all letters, and none of them is more important than any other letters, so they are all drawn with the same shapes and arranged in a circle:

Hyponyms--Letters

Mothers and fathers are parents.  So are grandmothers and grandfathers, stepmothers and stepfathers. However, the words "mother" and "father" are closer to the typical, everyday meaning of "parent," so I didn't use the same shapes for all of these words.

In the picture below, octagons (= 8-sided figures) are closest in shape to circles, so these shapes represent "mother" and "father." Triangles (= 3-sided figures) are much farther away from the typical, everyday meaning of "parent," so they represent the words for stepparents.

Sunday, June 17, 2012

Corpus Linguistics Introduction: COCAS and The Babel English-Chinese Parallel Concordancer

Corpus linguistics uses computer software (concordancers) to look at very large samples of real language. These samples are called corpora (singular = corpus). A corpus is a collection of texts. Some corpora only contain one genre: spoken English, newspaper English, scientific English. Other corpora try to use samples from many different types of language use. Some corpora are bilingual: two languages side by side.

balanced corpus of American English











balanced corpus contains texts from many different genres. A good example of a balanced corpus is COCA, the Corpus of Contemporary American English.



Parts of the Brain (Typographic Art)

Here is a very interesting way to present the parts of the brain.
Compare this picture with the blank drawing of the brain lobes.

labguest, CC-BY-SAThe Brain Typography-3302264930_945507f26c.jpg

Stroke: What Should You Do?

[Updated April 17, 2015]
Remember this: ANYONE can have a stroke, even young people. MOST important: rush to
get treated with special stroke medicine (unfortunately, not every hospital can do this correctly).

Strokes happen when blood vessels in the brain break. If oxygen stops coming to that part of the brain, it will die. This can cause paralysis (part of the body can't move), aphasia or other problems. Many of these problems can be greatly reduced if a stroke patient gets the correct treatment very soon (within the first few hours). This is what FAST is about.

What does FAST stand for? Use this 3 minute video to help you remember:

http://www.youtube.com/watch?v=YHzz2cXBlGk
The words to Stroke Heroes Act FAST appear below

Parts of the Brain (What Does What)

What are the most important parts of the brain? Which parts of the brain are used for speaking, understanding and remembering? Can you label the lobes yourself? Use the song and these pictures to help you learn:

http://vimeo.com/26067401

Here are the lyrics (words). The most important parts are in red.

This is a song about parts of the brain 
I'm singing it to memorize the names 

The ideas here may be simplistic 
but matching meaning and rhyme is a tough logistic

The Cerebral Cortex has four main lobes 
With names from the nearby skull bones

Frontal does the thinking
Occipital deals with vision 
Parietal senses objects and 
Temporal listens

Inside these lobes there's specialties like 
Broca's Area, which produces speech. 

Wernicke's Area handles language comprehension
and the Motor Cortex is for moving with intention.

The Sensory Cortex handles perception
of touch, pain, temperature and proprioception.

There's two outer brain parts that are distinct
They may seem separate, but everything's linked

The Cerebellum does balance & coordination
and has our memorized-movement archive

The Brainstem sets heartbeat & respiration
and other things that we need to survive

The brain's inner parts are unique
Cut the Corpus Callosum to take a peek

The Thalamus handles signal routing 
and the Amygdala's emotions can have you shouting.

The Hippocampus does our long-term memory saving
and the Hypothalamus makes our sex and food cravings.

The Anterior Cingulate Cortex learns from mistakes
and in controlling movement, the Basal Ganglia is the brakes.

The brain parts list is much longer, indeed 
But for my class assignment this is all I need

Author's comments:


There's so much here packed into 3 minutes, that I recommend repeated viewings/listenings for anyone wanting to use this song as a memory/learning aid. The song is actually pretty catchy if you listen enough, although there's no exact repetition like most pop songs.


Questions to think about:

What do we call the two halves of the brain? Which parts of the brain are most important for language? The brain is plastic. What does that mean? What happened to Sarah Scott? Who helped Sarah Scott to recover? What happened to Jill Bolte Taylor? Why was Dr. Taylor so excited about it?


Wednesday, June 6, 2012

Replacing Curly Quotes With Straight Quotes in Microsoft Word

If you have mistakenly typed your plaintext with Microsoft Word, you will find that it is full of curly quotes and ridiculous spaces. Fear not! You can easily convert your text to plain ASCII text by saving it as TXT. Here's how:




Romance Corpus submission reminders

Your contribution to the romance fiction mini-corpus and your report are due on June 7th.

You should make a file folder. The name of the file folder should be:
1) your student number,
2) your name in Chinese, and
3) the title of your story.
Example: "9821471011廖冠勳 Don't Ever Leave Me"

Inside the folder you should put FOUR files (A~D). The name of each file should include:
1) your student number,
2) your name in Chinese, and
3) the title of your story (OR "Linguistics Report--" and the title of your story).
Example: "9821471011廖冠勳 Linguistics Report--Don't Ever Leave Me"

A) & B) Your report (linguistic analysis & story analysis) in two formats, DOC and PDF (convert your Microsoft Word document to PDF format using Open Office/Libre Office or one of these online conversion services: www.pdfonline.com/convert-pdf/, www.doc2pdf.net/ or www.freepdfconvert.com/). DO NOT submit DOCX files!

C) Your corrected and reformatted story PDF, with description in italics. Make sure the text is centered and that you have used the correct font size, probably Arial 8 or Arial 9.

D) Your cleaned-up and corrected ASCII text in TXT. There should be no page or picture numbers, no Xs and the text should be properly capitalized and punctuated. ASCII text means you did NOT use Microsoft Word to type your text, so there are NO CURLY QUOTES/apostrophes and strange spaces).



Here are some mistakes to avoid:

Multiple problems, including lack of spaces after punctuation

Mistakes corrected

The font is too large, so the pictures are completely covered. This student also forgot about syntax: the sentences are not correctly segmented.This makes them hard to read.

Notice how line 4 starts with "but stifle my emotions." This splitting follows English syntax rules. This is also the way we would cut the sentence up when speaking. These five lines are description, so they are in italics because they need to look different from dialog.