Humans like classifying things. There’s no doubt about it. We often end up puttings things into categories without even realizing it. Whether it’s music, colors, movies, food, or pretty much anything else we interact with on a daily basis, we’re consciously or subconsciously attempting to slice and dice the world around us into discrete, individual groups.
There’s a very good reason for this. I didn’t really understand it until I took a class on artificial intelligence—which is, at its core, the task of trying to get a computer to do the same sort of thing. Many people have a very romantic view of artificial intelligence, but what most people in the field are actually doing is simply taking a lot of data and trying to get a computer to classify it correctly. The usual name for this is “machine learning“—amusing, since at first glance the idea of “learning” and “classification” seem to be rather unrelated.
But the thing that I discovered in this class is that, if you classify everything perfectly—that is to say, every different thing is in its own separate bucket—then there’s no way to make any sort of inferences or predictions about things you haven’t seen before.
For example, let’s say I know that white stones, when thrown at a window, will break said window, and I know that black stones will also break windows—but I’ve never seen a green stone before. How do I know if a green stone will also break a window? I have to make some sort of classification or generalization based on what I’ve seen before. White stones and black stones are both part of the “small, hard thing” category, and things in that classification break windows. If I see a green stone, and can correctly classify it as a “small, hard thing” based on its attributes, I have successfully learned something additional about green stones that wasn’t present in the data I was originally given. Green stones break windows!
Unfortunately, whenever you make these sorts of predictions or inferences, you may get things wrong some of the time. If the only things you’ve seen break windows are green stones, green metal, and green-painted bricks, then you might mistakenly think that a green feather will break windows too. Similarly, if every Frenchman you’ve met has been rude and snotty, then you might mistakenly think that all Frenchmen are conceited and haughty. This is an unfortunate side effect of classification—but the alternative (to treat every person you meet as a completely separate individual, in their own group) is to never learn anything about people you’ve never met based on people you have met who are similar to them.
There’s another major difficulty with our human tendency towards classification. The real world isn’t discrete; it’s continuous. While some songs are obviously “rap” while others obviously “pop”, some are in between. While some hues are obviously blue and others are obviously green, some are more ambiguous. Is this film a comedy or is it a drama? Are tomatoes fruits or vegetables?
Recently I heard a discussion of whether “Mexican” ought to be considered a separate language from “Spanish”, or whether it was simply a dialect. Unfortunately, there is no linguistic criteria for distinguishing between a language and a dialect. There is a saying, “A language is a dialect with an army and navy“. Why can’t linguists draw some sort of line, saying, “Everything past this point is a separate language”?
The fact of the matter is that the language-dialect spectrum is a continuum. It’s much the same as with biological species. Most intermediate forms have died out, so to the classification-happy human brain, there are distinct categories that we can place “separate languages” into just like we can place “separate species” into. But then there are exceptions like Larus gulls and Ensatina salamanders for species, or Arabic for languages.
So is Mexican a different language from Spanish? Is American a different language from English? Are either on their way towards divergence into separate languages?
It simply depends on how you define your classifications, a totally arbitrary concept imposed upon a non-discrete continuum.
Entries (RSS)