Thursday, June 21, 2012

Corpus Linguistics -- Why It's Useful (雙語字典不可靠)

Free is good, right? Many lazy students like to use free online dictionaries to learn English. This is a good way to harm yourself by memorizing mistakes.

Here is a page from a well-known online English-Chinese dictionary:

Is this so-called "dictionary" reliable? Notice: It's not really a dictionary. It's only a glossary. Please think: If it's free, who will check for mistakes? If you find a mistake, can you complain? Can you get your money back?

Let's check COCA for sentences with lover:

Did you notice how lover collocates with attack, beat up, kill, murder, shoot? Is that what people do to 情侶?Please remember this English proverb: "You get what you pay for" (一分錢一分貨)! There are many excellent English-English learner's dictionaries to help you learn English. Spend a little money and time to learn how to use them. Please don't waste your time with unreliable bilingual dictionaries (不可靠的雙語字典).

Wednesday, June 20, 2012

Corpus Linguistics With COCA: 'Eat Dinner' vs 'Have Dinner'

Each time a word appears in a concordance, it is called a token (= an example). Which is more common, "eat dinner" or "have dinner?" Use COCA to find the answer:

Make sure these 3 settings are correct


Add caption

DIY Corpus Linguistics--Using AntConc

AntConc is free, very powerful and easy-to-use software. 22 years ago, when I did my MA in the UK, I paid 75 pounds (maybe 200 dollars in today's money) for DOS software from Longman that could only do a few of the many things that AntConc does. What a blessing it is that Laurence Anthony and Waseda University are willing to give away this marvelous software for free. 


Let's see how we can use AntConc to analyze C-Collodi's Adventures of Pinocchio, a public domain (free, uncopyrighted) novel. 


This is AntConc's startup screen

1. Click on Open File (or Ctrl-F)

2. Choose Pinocchio

#1 Make sure you've loaded the correct file; #2 Click on the Word List tab; #3 Click on Sort by Frequency; #4 Click on Start to make a list of all the words in this story

There are 40,000+ words in this story, but only 3,790 of them are different. The most common word is the, which is used 1941 times. Pinocchio is used  454 times (of course: this story is all about Pinocchio).

Go down the list to find words which are only used 9 times. Carpenter is one of these words (Pinocchio is a wooden boy who was made by a carpenter). Next, click on the Concordance Plot tab to see where the word carpenter is used.

There are 9 hits, almost all of them at the beginning of the story. This is when Geppetto the carpenter made Pinocchio out of a piece of wood. There is one more hit at the 2/3 point.

Click on the line to see the last sentence in this story with the word carpenter.

Here is that sentence.

Click on the Collocations tab, type in the word fairy in the search box, and click on Start to see which words collocate with fairy. Good is used 11 times, 10 times on the left "Freq (L)" and 1 time on the right "Freq (R)." The word little collocates (= is used together with fairy) 7 times, always on the left (the writer only says "little fairy." He never says "fairy little".)





Semantic relations--Gradable Opposites

Dead and alive are complementary opposites. A living thing is either dead or alive. It can't be both (except maybe viruses 濾過性病毒: Are they dead or alive?). Except as a joke, we can't say that an animal is very dead or slightly dead.  

Gradable opposites are different. Wet and dry are a pair of gradable adjectives. A thing can be soaking wet, very wet, or slightly wet; It can also be bone dry, parched, extremely dry or drier.

In American English, delicious is usually not gradable, but tasty is gradable. We don't usually say "very delicious" (Chinese English: if you do a COCA search for "very delicious," you will find that there are very few examples [probably foreigner English], but "very tasty" is quite common). That's why we don't say "Is it delicious?" (this sounds rather strange to English speakers' ears), but it's OK to say "Is it tasty?" Very tasty, extremely tasty, not so tasty, and tasteless are also OK.

Semantic relations--Complementary Opposites

The word antonym is made of two parts:

ant- (anti-) means opposite
-onym means name
so antonyms are words which have opposite meanings.

If we think about antonyms, however, we see that we have a problem: What does "opposite" mean? Some words seem to fit together: if you have one, you must have the other. This is called complementarity. The Yin Yang symbol on the South Korean flag is a beautiful example of complementarity. The cat picture below looks similar (So cute!). Do you see how they seem to fit together?

640px-Flag_of_South_Korea (Wikimedia).svg.png

Semantic Relations--Synonyms

Synonyms are words which have the same meaning. Of course, this is not completely true. There is almost always some kind of difference between two words. In the diagram at the bottom of this post, speak, say, and tell are synonyms of each other.


Semantic Relations--Hyponyms

Semantics deals with word meanings. There are many ways in which words can be semantically related. One of these is hyponymy.

Hypo- in Greek means "under"
and -onym means "name"
so a hyponym is an "under name."

is-a shows a semantic relationship.

"X" is-a "Z" and "Y" is-a "Z"

"X" and "Y" are hyponyms of "Z."
= The meaning X and Y is included in the meaning of Z.

Here are some examples:

{ABC...XYZ} and {abc...xyz} are all letters, so {ABC...XYZ} and {abc...xyz} are hyponyms of "letter." Capital letters and lower-case letters, vowel letters and consonant letters are all letters, and none of them is more important than any other letters, so they are all drawn with the same shapes and arranged in a circle:

Hyponyms--Letters

Mothers and fathers are parents.  So are grandmothers and grandfathers, stepmothers and stepfathers. However, the words "mother" and "father" are closer to the typical, everyday meaning of "parent," so I didn't use the same shapes for all of these words.

In the picture below, octagons (= 8-sided figures) are closest in shape to circles, so these shapes represent "mother" and "father." Triangles (= 3-sided figures) are much farther away from the typical, everyday meaning of "parent," so they represent the words for stepparents.

Sunday, June 17, 2012

Corpus Linguistics Introduction: COCAS and The Babel English-Chinese Parallel Concordancer

Corpus linguistics uses computer software (concordancers) to look at very large samples of real language. These samples are called corpora (singular = corpus). A corpus is a collection of texts. Some corpora only contain one genre: spoken English, newspaper English, scientific English. Other corpora try to use samples from many different types of language use. Some corpora are bilingual: two languages side by side.

balanced corpus of American English











balanced corpus contains texts from many different genres. A good example of a balanced corpus is COCA, the Corpus of Contemporary American English.



Parts of the Brain (Typographic Art)

Here is a very interesting way to present the parts of the brain.
Compare this picture with the blank drawing of the brain lobes.

labguest, CC-BY-SAThe Brain Typography-3302264930_945507f26c.jpg

Stroke: What Should You Do?

[Updated April 17, 2015]
Remember this: ANYONE can have a stroke, even young people. MOST important: rush to
get treated with special stroke medicine (unfortunately, not every hospital can do this correctly).

Strokes happen when blood vessels in the brain break. If oxygen stops coming to that part of the brain, it will die. This can cause paralysis (part of the body can't move), aphasia or other problems. Many of these problems can be greatly reduced if a stroke patient gets the correct treatment very soon (within the first few hours). This is what FAST is about.

What does FAST stand for? Use this 3 minute video to help you remember:

http://www.youtube.com/watch?v=YHzz2cXBlGk
The words to Stroke Heroes Act FAST appear below

Parts of the Brain (What Does What)

What are the most important parts of the brain? Which parts of the brain are used for speaking, understanding and remembering? Can you label the lobes yourself? Use the song and these pictures to help you learn:

http://vimeo.com/26067401

Here are the lyrics (words). The most important parts are in red.

This is a song about parts of the brain 
I'm singing it to memorize the names 

The ideas here may be simplistic 
but matching meaning and rhyme is a tough logistic

The Cerebral Cortex has four main lobes 
With names from the nearby skull bones

Frontal does the thinking
Occipital deals with vision 
Parietal senses objects and 
Temporal listens

Inside these lobes there's specialties like 
Broca's Area, which produces speech. 

Wernicke's Area handles language comprehension
and the Motor Cortex is for moving with intention.

The Sensory Cortex handles perception
of touch, pain, temperature and proprioception.

There's two outer brain parts that are distinct
They may seem separate, but everything's linked

The Cerebellum does balance & coordination
and has our memorized-movement archive

The Brainstem sets heartbeat & respiration
and other things that we need to survive

The brain's inner parts are unique
Cut the Corpus Callosum to take a peek

The Thalamus handles signal routing 
and the Amygdala's emotions can have you shouting.

The Hippocampus does our long-term memory saving
and the Hypothalamus makes our sex and food cravings.

The Anterior Cingulate Cortex learns from mistakes
and in controlling movement, the Basal Ganglia is the brakes.

The brain parts list is much longer, indeed 
But for my class assignment this is all I need

Author's comments:


There's so much here packed into 3 minutes, that I recommend repeated viewings/listenings for anyone wanting to use this song as a memory/learning aid. The song is actually pretty catchy if you listen enough, although there's no exact repetition like most pop songs.


Questions to think about:

What do we call the two halves of the brain? Which parts of the brain are most important for language? The brain is plastic. What does that mean? What happened to Sarah Scott? Who helped Sarah Scott to recover? What happened to Jill Bolte Taylor? Why was Dr. Taylor so excited about it?


Wednesday, June 6, 2012

Replacing Curly Quotes With Straight Quotes in Microsoft Word

If you have mistakenly typed your plaintext with Microsoft Word, you will find that it is full of curly quotes and ridiculous spaces. Fear not! You can easily convert your text to plain ASCII text by saving it as TXT. Here's how:




Romance Corpus submission reminders

Your contribution to the romance fiction mini-corpus and your report are due on June 7th.

You should make a file folder. The name of the file folder should be:
1) your student number,
2) your name in Chinese, and
3) the title of your story.
Example: "9821471011廖冠勳 Don't Ever Leave Me"

Inside the folder you should put FOUR files (A~D). The name of each file should include:
1) your student number,
2) your name in Chinese, and
3) the title of your story (OR "Linguistics Report--" and the title of your story).
Example: "9821471011廖冠勳 Linguistics Report--Don't Ever Leave Me"

A) & B) Your report (linguistic analysis & story analysis) in two formats, DOC and PDF (convert your Microsoft Word document to PDF format using Open Office/Libre Office or one of these online conversion services: www.pdfonline.com/convert-pdf/, www.doc2pdf.net/ or www.freepdfconvert.com/). DO NOT submit DOCX files!

C) Your corrected and reformatted story PDF, with description in italics. Make sure the text is centered and that you have used the correct font size, probably Arial 8 or Arial 9.

D) Your cleaned-up and corrected ASCII text in TXT. There should be no page or picture numbers, no Xs and the text should be properly capitalized and punctuated. ASCII text means you did NOT use Microsoft Word to type your text, so there are NO CURLY QUOTES/apostrophes and strange spaces).



Here are some mistakes to avoid:

Multiple problems, including lack of spaces after punctuation

Mistakes corrected

The font is too large, so the pictures are completely covered. This student also forgot about syntax: the sentences are not correctly segmented.This makes them hard to read.

Notice how line 4 starts with "but stifle my emotions." This splitting follows English syntax rules. This is also the way we would cut the sentence up when speaking. These five lines are description, so they are in italics because they need to look different from dialog.







Wednesday, May 16, 2012

Romance Fiction Mini-Corpus, Step 3

After you receive your approved text back from your instructor, you should use PDF-XChange Viewer to start preparing a PDF file with searchable text.

0) Start with your original PDF file containing pure graphics (pictures, but no selectable text).

1) Use the Text Box Tool to create a giant text box in a text-free area of page 1. Make sure you have chosen the correct font size for the first speech bubble and center your text. Use these settings as default choices.

2) Paste the text of Page 1 into the giant text box.

3) Create several more text boxes, one for each speech bubble or block of descriptive text. Don't worry about the exact size or position. You will tweak the boxes when you finish.

4) Go back to the giant text box. Select the text of the first speech bubble. Press Ctrl-C to copy.

5) Press Ctrl-V to paste the text into the first speech bubble. 

6) Repeat Steps 4 & 5: Copy-Paste the text of each speech bubble or block of descriptive text

7) Finally, adjust the size and position of each speech bubble. When a line is too long, be sure to group related words into appropriate syntax groups (Noun Phrases, Verb Phrases, Preposition Phrases  etc.: NP, VP, PP etc.).

0) Start with your original PDF file


Using PDF-XChange Viewer (Customization)

Before you begin to enter text into the PDF file, you need to customize the PDF-XChange Viewer interface.

1) First, remove unnecessary toolbars. This will give you more space.

2) Then, add only the commands that you really need. This will make your work more efficient.


First, remove unnecessary toolbars























Romance Fiction Mini-Corpus, Steps 1~3 (Love My Dogs: Sample Text)

Step 1: Each student is responsible for typing up one short text from a collection of Golden Age Comics. The texts should be typed using a free text processor such as NoteTab (NOT Microsoft Word).


Step 2: Your text should be proofread for errors in punctuation, capitalization and other miscellaneous errors and uploaded page by page to the Romance Fiction Mini-Corpus web pages based on the first letter of your story title (i.e. "Love My Dogs" should be uploaded to the I-L section). Your teacher will do some some more processing and return the raw text to you.


Step 3: The raw text will be turned into a PDF file. You should use the Text Box Tool in the free version of PDF-XChange Viewer to produce a PDF file with crisp, clean searchable text segmented for improved readability: when lines are too long, use the syntactic structure rules you have learned to break them up long lines into noun phrases, verb phrases, prepositional phrases etc. (NP, VP, PP etc.).

Here is page 1 of the original purely graphical version of "Love My Dogs" (a three page love story), followed by the pure text version and the PDF version.
 "Love My Dogs" (page 1, graphic version)



Monday, April 30, 2012

Grice's Maxims (May 3, 2012)

On May 3rd, we will be taking a closer look at Grice's Maxims. We will also be looking for examples of implicature: Language Files 7.2.2 ~ 7.3.6.

Be sure to keep a pure text copy of your Romance Fiction Corpus text: that's why I recommended using NoteTab Light and IntelliComplete Server (both are free software) for typing up your assignment. Keep your text copy on a USB key or in the cloud (email).

Wednesday, April 25, 2012

Romance Fiction Mini-Corpus

In the final weeks of our course, we will be looking at one small part of computational linguistics, corpus linguistics. In corpus linguistics, we work with corpora (plural of corpus), so each of you will help to compile a mini-corpus of romantic fiction texts. Find the appropriate page in our blog and enter the text of your story. We will be using this genre-specific corpus to learn how CL can help us study English.

Monday, April 16, 2012

Easier Midterm Coverage

To make your job easier, the midterm will only cover Morphology and Syntax. You will be tested on Semantics when we have finished covering Pragmatics.

Linguistics Midterm: Syntax Section

This section will help you draw or interpret tree structure diagrams.
Be sure to do the exercises in Topics 2 and 4.

Syntax
Topics
Subtopic List

1
http://www.ucl.ac.uk/internet-grammar/phrases/phrases.htm
2
&The Basic Structure of a Phrase

3
More Phrase Types; Noun Phrase (NP); Verb Phrase (VP)

4
&Adjective Phrase (AP); Adverb Phrase (AdvP); Prepositional Phrase (PP)

5
Phrases within Phrases

Google Translate: Simplified to Traditional

If you have trouble reading simplified Chinese characters, please remember that Google Translate can work from English to Chinese but also from Simplified to Traditional:







Linguistics Midterm: Morphology Section

We will be using the following IGE pages to review basic concepts of grammar (the same topics we have covered in the Morphology and Syntax chapters of Language Files, 10th edition). This should not be considered "new" material. You should already have studied almost all of this during your first year of college (or even in junior high school). The pages that appear below give you a brief review.

Saturday, April 14, 2012

Internet Grammar of English, Chinese version

The Internet Grammar of English (IGE), is available in a  Chinese version written by the Chinese University of Hong Kong.


Wednesday, April 11, 2012

Internet Grammar of English, The -- Introduction

When you study difficult topics, reading a different explanation can sometimes be very helpful. The Internet Grammar of English (IGE) provides this sort of help. IGE gives you a slightly different explanation of phrase structure rules in English. It also gives you a few exercises to help you practice.

Start with the introduction. Pay special attention to the bottom half of this page:

(1) How the grammar is organized (= the topics and their sequence)

and

(2) The conventions they use (special ways of writing words and what they mean)


Thursday, March 29, 2012

Butterfly Crossword Puzzle

Crossword puzzles are very popular in English-speaking countries. They are a great way to build up mental flexibility. One of the keys to doing a crossword puzzle is to

1) FIRST do all of the easy words

2) THEN go back and try the "difficult" words. The easy words give you useful letters. This will help you find some of those difficult words.

An interactive version of today's puzzle is posted here (http://crossword.info/CuteTeacherGR/Butterfly_Crossword). Try it out!

If you prefer, there is a static version right below:


Across
1 It's on your shoulders (4) - 4 Spaces between things (4) - 7 Meat from a pig's leg (3) - 9 It covers [1 Across] (3) - 10 Mother (2) - 12 (men & women) go on ~ (= go out together) (5) - 14 A short form of LOL (2) - 15 A short form of "Employee Related Expense" (3) - 17 We use it for cooking: frying ~ (3) - 18 We use it to play baseball: the ~ hits the ball (3) - 19 Short form of trinitrotoluene (used to make bombs) (3) - 21 Help for people who are hurt is "first ~" (3) - 22 American money (7) - 23 Five and five (3) - 24 Uses water to clean a dirty floor (3) - 26 Like a table to sleep on (3) - 27 And other things (3) - 29 To say "yes" with your head. Sleepy people do this, too. (3) - 31 Used to cut trees (another spelling of "axe") (2) - 32 This person makes walls. Also a family name: Mr. ~ (5) - 34 "Hello" = "How do you ~?" (2) - 35 ~s like to catch mice (3) - 36 Short form of "Benjamin" (3) - 38 Don't open that ~! (4) - 39 Make an animal stop being afraid of you (4).

Thursday, March 1, 2012

Syntax Quiz (Language Files 5.1) Part A

Syntax Quiz (Language Files 5.1)

1 Word Order (3 sentences are OK. Which one is grammatically wrong?)
A My dog eats steak
B Eats steak my dog
C Steak, my dog eats
D Steak eats my dog

2 Lexical Categories (我很喜歡狗 3 translations are wrong.
Which one is correct? Can you say why?)
A I like dogs
B I very like dogs
C I like dog
D I very like dog

3 Agreement (3 sentences are wrong. Which one is correct? Can you say why?)
A 一匹牛
B 一張椅子
C 一把書
D 一條蛇

4 Q: What is a sentence? A: A sentence is a ~ of words
Fill in ONE correct word

Answers appear below the break


Syntax Quiz (Language Files 5.1) Part B

5~10 Constituency and hierarchical structure
There are 50 states(五十州) in the US. Each state has one governor(州長). Each state has many cities. Each city has one mayor(市長). The president of the US is more important than the governor of Alaska. The governor of Alaska is more important than the mayors of the cities in Alaska. The US government has a hierarchical structure, from top to bottom (president > governor > mayor).


Sentences work in roughly the same way. Sentence are more “important” than constituents, and constituents are more “important” than words.
 "My dog loves me" = [S [NP [DET My] [N dog]] [VP [V loves][NP [N me]]]]



Fill in the correct words in the following paragraph. Choose from this list:
city constituents country groups hierarchical larger sentence state syntax words

Just like the US government, a sentence also has a 5 [~] structure. Sentences are made of 6 [~]. The words in a sentence form 7 [~]. These groups are called 8 [~]. Small groups are part of 9 [~] groups. All of the groups of words in a 10 [~] form a hierarchical structure

Answers appear below the break:


Tuesday, February 28, 2012

Reminder: Test Schedule

Please don't forget this semester's new rules for quizzes & tests. There will be one or two quizzes/tests per class. If you arrive late or leave early, you will probably miss a test, and there will be NO makeups. Quiz/Test grades will be totaled up at the end of the semester and divided by one less than the total number of tests: test grade = (N * 100)/N-1. If you miss one test, you're still OK. If you miss more than one, your average starts to go down.

Wednesday, January 4, 2012

Xhosa: Miriam Makeba Song

You don't have to sing the song yourself (even your teacher has trouble!), but you should be familiar with the click sounds in this song because we studied them in class.

In this version she explains that she used to be unhappy when white people asked her: "How do you make that noise?" "It's not a noise. It's my language!" she would answer. Now she doesn't mind such comments.

Here is a clear recording of the song:

http://www.youtube.com/watch?v=2tSJ7L_IRBs

Here she explains the song on stage (but the recording is not so clear):



http://www.youtube.com/watch?v=3m_TEq2E4cs

Miriam Makeba speaks Xhosa.

Below a handsome young man from South Africa teaches us how to speak his language. What do we call those special sounds? Remember? We practiced them in class.

Lesson 1
http://www.youtube.com/watch?v=JZ6oe2U7AOA


Lesson 2 (Here he tells you what the sounds are)
http://www.youtube.com/watch?v=31zzMb3U0iY