Archive for the 'language' Category

Noam Chomsky is an interesting fellow. Most who have heard of him are surprised to hear that he is one of the most influential linguists of the 20th century. (Linguists, on the other hand, are often surprised to hear that he is widely known as a political activist.) But his contributions to the field of linguistics are wide and varied. Even more curious is the fact that some of these contributions have helped shaped the field of computer science.

Chomsky, in the 1960s, came up with a theory called transformational grammar, maintaining that humans have some sort of innate way of telling whether a given sentence is grammatical or ungrammatical, and that foundationally, this sense is common across all humans no matter what the language. As part of this idea, Chomsky also discussed formal grammars, which are mathematically precise ways of describing a “language”. Although Chomsky believed that formal grammars were unable to fully express the intricacies of human languages, his formal grammar theory has become the foundation of modern computer languages.

The central idea is that of a generative grammar: a set of rules for transforming strings. Chomsky was the first to formalize this idea back in the 50s. Under the formal definition, a grammar has exactly four pieces: a set of “nonterminal symbols”, a set of “terminal symbols”, a list of “production rules”, and a single “starting symbol” from the set of nonterminals. Let’s try a concrete example.

The set of nonterminals: <S>, <N>, <V>. The set of terminals: Bob, Jill, saw, kissed. The starting symbol: <S>. As for the production rules, they must be of the format: X → Y. My example grammar has only five production rules:

1. <S> → <N> <V> <N>
2. <V> → saw
3. <V> → kissed
4. <N> → Bob
5. <N> → Jill

What a production rule means is that if you have something on the left-hand side of a rule anywhere in your string, you can, if you so desire, change it to the corresponding bit on the right. For example, if I have the string “<N> <N> <N>” I can, using rule #4, change my string to become “<N> Bob <N>”. (I also could have changed any of the other two symbols using rule #4, or I could have used rule #5 instead.)

You can look at a formal grammar like this as something of a game: given the starting symbol and using production rules, what strings of terminals are possible? Here’s one possible string, with its full derivation:

1. <S> (starting symbol)
2. <N> <V> <N> (rule #1)
3. <N> <V> Bob (rule #4)
4. <N> saw Bob (rule #2)
5. Jill saw Bob (rule #5)

Now my string contains all terminal symbols, so my use of production rules must come to an end. (Production rules may have anything on the right-hand side, but the left-hand side must only contain nonterminals—which is where the name “terminals” comes from, because once your string contains only terminals, it is finished.)Another variety of the game consists of starting with a given string of terminals, and attempting to figure out the proper derivation for it. For example, can you give the derivation for “Bob kissed Jill”? How about “Bob saw”? Or “Jill kissed Jill”? (Hint: one of those strings has no valid derivation.)

Strings with a valid derivation—there’s some way to get to them via the starting symbol—are said to be “in the language described by the grammar”. Strings without a valid derivation—no matter what rules you apply, you can never get to them from the starting symbol—are said to “not exist in the language described by the grammar”. A “formal language,” then, is the list of all strings that a generative grammar can produce.

How many different strings are in the language produced by my example grammar? First one to post them all gets a cookie!

I discovered a delightfully prescriptivist web site the other day, called barelybad.com. In characteristically prescriptivist fashion, the site rails against such incorrect usages as “downhill from here,” or phrases such as, “noticed a suspicious parcel,” and terrible atrocities like, “green salad is part of a healthy meal” or “a nominal fee”.

The author continually says such hilariously stereotypically prescriptivist things as,

Yes, this is another one of those distinctions in which I don’t care what the dictionary says, because I’m right.

and

Losing a special, useful vocabulary word to misuse is never desirable.

Take a peek. Even the most prescriptivist among you might find something to chuckle at.

The name “roscivs” I’ve used since high school, where one of my Latin classmates gave it to me. (The “v” is pronounced like a “u”, just as it is in Latin.) I’d always assumed that he’d just made up the name, but in my recent googlings I’ve discovered an ancient Roman denarius with an inscription bearing the name of an “L. Roscivs Fabatvs”:

In case you’re wondering where “indessed”—the other half of my blog’s name—comes from, its story is not nearly as interesting. It came from GPW, a program which generates fake words that look as if they should be words by analyzing the frequency that certain combinations of letters exist in real English words (or any other language’s dictionary you analyze). The stated purpose of the program is for generated passwords that are easy to remember, but I typically use it for generating usernames, domain names, and the like.

Thus was born: indessed roscivs.

In the same poetic style as found
in the haiku, sonnet, and other forms
which limit the poet in some way,
this form needs seven lines, and, besides,
each line must contain exactly seven words.
Not only that, but every word must
have fewer or equal to seven letters.

(The kwansaba was invented during the 1995 EBR Writers Club workshop in St. Louis.[1])

I have a closet full of unpacked boxes.

Speaking of the Language Log, I discovered a very interesting post there recently. It had to do why the word “unthaw” means the same thing as “thaw”—another interesting curiousity of the English language.

When you read the above sentence about my closet, what did you picture? Did the boxes contain things, or were they empty? Does the following sentence change matters?

I have a closet with boxes still unpacked from the move.

Technically speaking, “unpacked” means “not packed”. So, the above sentence literally means, “I have a closet with not packed (i.e. empty) boxes left from the move.” But that’s not how most people read it. Why is this? It seems quite odd. I can’t do the same with other words, e.g. “I have a closet full of still undressed dolls.” This sentence can’t mean anything but that the dolls have no clothes on. But other phrases are murkier: “How many bottles of wine are still uncorked?” or perhaps, “I’ve opened nearly all of my birthday presents, but I still have one left unwrapped.”

There are many more examples at the Language Log. But the curious thing is—why does it work in some cases and not work in others? Does “unpacked” always mean “not packed” to some people, and sound odd in the above constructions? Or vice-versa, does “unpacked” always mean, strangely enough, “not yet unpacked” to some people? (Thus the sentence, “We’ve finished packing nine of the boxes, but the tenth is still unpacked.” would sound odd to them.)

Furthermore, why does it work with some words like “pack/unpack”, but not with other pairs? You can’t say, “I’ve switched off four of the five computers, but the last one is still unplugged,” to mean anything but not-plugged. You can’t say, “Nearly everyone had taken off their hat; only one head was left uncovered,” meaning that the head still had a hat on it.

What do you see when reading these sentences? Which meaning pops first into your mind?

I occasionally read a blog called The Language Log, which is for language nerds what Slashdot is for computer nerds. A few weeks ago I came across a post there (interestingly enough via a Slashdot comment) which had as its final paragraph:

It seems clear to me that nearly all strings of English words you can construct are ungrammatical. Try writing down any random sequence of words (a fully grammatical one if you want to bias things against my claim), either with repetitions or without, it doesn’t matter. With a very few peculiar exceptions, for any string of words you will find that almost every one of the orders in which those words can be arranged will be ungrammatical—exponentially many more are ungrammatical than are grammatical.

The readers were, of course, very curious about this seemingly throw-away comment about the “very peculiar exceptions”. What strings of words could you possibly rearrange arbitrarily and get grammatical sentences? One obviously this not.

Thankfully, the next entry revealed one of these peculiar exceptions: the buffalo sentence.

“Buffalo” is one of those peculiar words that has multiple different meaning. For example, it can refer to the city of Buffalo, New York (home of the Sabres, incidentally, currently the highest-ranked team in the NHL). It can also refer to the animal, the American Bison, either in the singular or the plural. And, finally, the word can be used as a verb meaninig “confuse, deceive, or intimidate”.

So, for some examples:

  • Have you ever seen the Buffalo NHL team?
  • Did you see the buffalo on their jerseys?
  • Did you see Martin Biron get buffaloed by that clever deke?

And now we can start combining them together:

  • I don’t think there are any Buffalo buffalo. They’re mostly found further west.
  • If one bison got tricked by another bison, do you think you could say that the first buffalo buffaloed the other buffalo?
  • If the first bison happened to be from New York, would you say that the Buffalo buffalo buffaloed the other buffalo?
  • If this occurrence was routine, would you say that Buffalo buffalo buffalo buffalo on a regular basis?

It turns out that not only the sentence with four instances of the word “buffalo” happens to be grammatical; in fact, all sentences containing only the word “buffalo” are grammatically correct, no matter how many words they contain. Wikipedia has a few charts and diagrams explaining how this can be the case and provides a few examples. In fact, you don’t even have to rely on the New York meaning of the word in order to produce infinite sentences this way. The previously-mentioned Language Log post even supplies paraphrases for the skeptical. Ah, the joy of homonyms!

Speaking of the Chinese New Year, I was doing a little reading on Wikipedia about the subject. In the article, I came across this passage:

According to legend, in ancient China, the nián (年), a man-eating beast from the mountains, could silently infiltrate houses to prey on humans. The people later learned that the nian was sensitive to loud noises and the color red, so they scared it away with explosions, fireworks and the liberal use of the color red. These customs led to the first New Year celebrations.

I was quite astounded to realize that I recognized the Chinese character for the beast: 年. It’s the exact same character that the Japanese use for “year” in the phrase ichi-nin-sei (meaning “first year student”), one of the few phrases I’ve learned the kanji for. And “nin” sounds suspiciously like “nián”, the name of the beast.

Sure enough, Wikipedia confirms that “the Chinese word for year is based on the arrival of this beast [Nián]”. Presumably the Japanese then borrowed the word for year from the Chinese, thus preserving this Lion-beast’s name in such mundane phrases as “first year student”.

Tonight was the first class of Japanese II! We reviewed places and learned a few new place-words, and learned the colors. (We also made origami hearts for Valentine’s day. Awww!)

Many people ask me why I wanted to learn Japanese. Well, first of all I love languages—I think they’re fun to learn, and they stretch my brain in all sorts of interesting ways. I love all the intricate nuances of grammar, too—the grammar of a language is like a twisty maze, similar to mazes I’ve done before, but different in new and strange ways, like a puzzle that I have to solve. I want to learn Russian in particular because of its complex grammar—it’s one of the few living languages with a complex case system. (Classical Greek and Latin are dead; written Arabic has some remnants of a case system but nothing complicated.)

The real reason I picked Japanese as the next language to learn, though, is because I strongly feel that, after a year or so of studying a language, pretty much the only way you can significantly improve is to actually travel to where the language is spoken and use it on a day-to-day basis. Well, I studied Arabic last—and unfortunately, the situation in the world is not such that it’s a good idea for an American to travel to most Arabic-speaking countries. (Furthermore, spoken Arabic differs greatly from country to country, and I learned the Egyptian variety.) Even if I got a Canadian passport (which I plan on getting), I still don’t think it would be that great of an idea. Uncle Sam wanted to get DW to go to the Middle East and speak Arabic for the government to the tune of $300k. It wasn’t worth it.

So I tried to think of what language is both interesting, and is spoken in a country that’s safe to travel to. Japanese was a perfect fit. And now, 私は日本語の学生です!

Your Engrish Lesson of the Day:

In case you’re having trouble reading the image, it says:

HDD Assembles Elucidation

1. Make an effort to press in the direction that arrowhead point the plastics lock button up. such as Picture A
2. Heading up the upper cover to turn over to rise. such as picture B
3. Cover up and down to separate then and completely. such as picture C

Is good with machine plank according to the right method conjunction the hard disk , lock the tight and HDD, cover the upper cover, can immediately trust the usage.

Somehow I managed to put the thing together regardless.