Friday, September 4, 2009

C-sets and L-sets. Draft

C-sets. or Cantorian sets, are the usual sets of set theory. They were first (for practical purposes) described by Cantor in his work with infinity, which gave them a slightly shady reputation, enhanced by the discovery of an array of paradoxes in the earliest attempts to axiomatize the theory.  Eventually,number of axiomatizations that avoided the known paradoxes in various ways were devised.  All the various axiomatizations are incomplete, of course, since arithmetic is and can be defined within them. They also differ in various ways at their outer edges (is the next largest cardinal after Aleph-null, that of the set of natural numbers, R, that of the set of reals?). But they agree in the basic area we are concerned with, namely, simple set, their subsets, and simple operations on these. Briefly, a set is different from its members, in particular, {a} (the set whose only member is a) is different from a (the thing in it). Further, from a given set, {abc}, say. we can get other sets using only the members of the original set (its subsets): in this case {a}, {b}, {c}, {ab}, {ac}, {bc}, and, for completeness, the original set {abc} and the empty set{}, which has no members. The order in which the members are listed is not significant, but the fact that these members are in different sets will be: {{a}{b}} is different from {ab}, of course, and {{ab}{ac}} is different from {abc}, and so on. A set, once constructed, can now behave as an element in further sets, as just applied, and so, all of the subsets can be gathered together into a single set (called the power set because its size for a set with n members is 2 to the power n). In this way, quite large sets can be built from relatively small beginnings (the natural numbers, in one approach, are built up from the empty set -- about as small as you can get -- by taking its power set, {{}}, joining the two into a new set, {{}{{}}}, then combining this set with its members to form a set with three elements, and so on forever).

L-sets were developed first by Lesniewski and independently by Leonard and Goodman (and Quine).. They are mathematically more untidy that C-sets (there is no empty set, so a set with n members has only 2^n -1 subsets) and also less useful mathematically (it is hard to get bigger sets). But they prove to be more useful in representing many situations in language (and thus expanding logic a bit). For L-sets, {a} is the same as a, further, {{a}{ab}} collapses to {ab}, {{ab}{c}} = {{a}{bc}} = {{a}{b}{c}} = {abc}, and so on. That is, a set is always a set of its ultimate components, even if we talk about intermediate subsets.

In set theories, of course, sets are individuals (the values for variables, the referents of constants, and so on) and so, they have properties and enter relations. But, in the theory, these attributes tend to be either dummies or set-theoretic ones. Little is said about sets and ordinary properties and relations , "carries a piano", say, or "wins a race." But, if we have any intuitions about sets, they don't seem to be the sorts of thing that carry anything or even enter races, even though their members might be.

And yet, something formally very like sets do these thing in everyday language: "The boys carried the piano" need not mean that each carried it by himself; it might mean that they carried it together, acting formally as a set. (Each of the boys participated in carrying the piano, though each's exact role is unspecified.) And, of course, a team (looks like a set in a general way -- a number of things conceived of as together) can win a race even if only one member runs (the other participate by being on the same team, I suppose -- that is a characteristic of teams among sets). And it turns out in many cases to simplify thing quite a bit to take plural references as to sets, rather than somehow to each of the several individuals we take as making up the set. We can disambiguate if we need to, but often it does not matter (as the race case suggests). So long as the piano gets carried, we don't care how the boys do it. If we really want to specify that each carried it alone part of the way, then we can say so:'"each of the boys carried the piano."

While we could do this sort of thing with either C-sets or L-sets, L-sets seem the more natural. Bunches of things such as we have in mind don't seem to grow into bigger bunches by subdividing and recombining, Further, if we are mistaken about the referent being plural, the same pattern applies since {a}=a, the singleton set reduces to its member. Even the lack of an empty set, so damaging to the mathematical uses, proves useful here, since a reference to a something having a property which nothing in fact has automatically renders a sentence false for L-sets, but makes a reference to the null-set in C-sets and the null set has many properties, mostly irrelevant to whatever we were talking about or relevant but holding only through logical tricks -- neither desirable situations.

It should be noted that some people raise objections to using L-sets to treat languages. The objection runs that doing so compels the language to implicitly recognize the existence of such things as L-sets, since they are necessarily in the range of quantifiers in the language. While I am not sure why this is a problem, especially in a language, like English, which regularly interchanges L-set words like "bunch" and "group" with simple plurals, I give way to others' quest for ontological purity and say, that we ought not think of L-sets as something different (or over and above) the members considered together. This is, of course, very easy to do, given the transparency of L-sets. It requires some changes in the way we describe the logic of the situation, but only minor ones. In particular, we need to allow that quantifiers may take a number of instances simultaneously and together and that we have a way of pulling out particular instances from these. That being done, we can deal with plurals as really being about several things rather than one thing of which the several are members, which does seem more natural. The point is that the logics of the two approaches are exactly the same and even the two metalanguages, while not the same, are directly translatable the one into the other and so totally congruent. As an old-timer raised on set theory, I stick to the locution I am most comfortable with, but I try always to include the other reading as well.






Zipf's Wall -draft

Zipf's Law is a more precise formulation of the obvious (once you think about it) proposition that common words tend to be short and rarely used words long. The full version does the math and gives correlations between frequency and length (relative to the norm for the language). This is all, of course, descriptive of how natural languages in fact work, not a prescription of how they should work. But the underlying logic of language as a human instrument gives it some projective power.

In creating a language, then, one wants to keep this pattern in mind. In particular, one does not want to set the language up in such a way that common topics of conversation will inevitably involve words that are overlong for their commonality. This is not a problem for a language like Esperanto, which can borrow freely from the languages around it (with a few -- often ignored -- restrictions), nor even for Lojban, which makes enforced restrictions on borrowings but ones that cost only a syllable or two.

It is a problem, however, for languages with a fixed base of concepts and no means to add new items. Depending upon what the base is and the structure of the language, the problem of too long words can arise sooner or later. In this context, "word" should probably be "phrase," since many languages do not have a means to construct new words with fixed meanings (combinations of the basic concepts) but must do the work stringing words together. In any case, there will come a time when repeating a certain referring expression come to be felt to be too onerous and the cry goes up for a replacement. Aside from simply giving in to the plea and adding a new concept to the base pile, here are a few strategies to meet this issue.

1. Simplify the definition. If the concept dog is represented by something like "furry beast that we have around the house for protection and to play with and take hunting," we can surely trim this back to something like "beast that ...." with only one or two things in the gap. This definition is, of course, purely accidental, i.e., gives neither necessary nor sufficient conditions for being a dog, but the strict definition is going to be either the Linnean binomial or the biological description behind it, and that is likely too long also -- aside from likely not fitting into the language's patterns. So definitions are likely to be contextual and in that context finding an appropriate phrase is simplified. We need, perhaps, only to distinguish dogs from cats and so any short thing that does that will do.

2. Choose your base wisely. This is sorta ex post facto, but presumably you can go back and revise before too many people get too committed to the original. NSM offers a short list of concepts which are said to occur in all languages and with which all others can be defined. The definition process is complex, however, and does not lend itself to simple expression construction. Still, any starting point should be sure to cover those concepts. Swadesh's list of concepts you can be pretty sure to find expressed readily in any language is about four times the size of the NSM list. It is meant primarily as an entryway into a new language: you can be sure there are words for these, and once you get them they will enable you to ask about other things as well -- but there is no claim that everything else can be defined in terms of these. Basic English starts with a list about four times the length of Swadesh's but does claim to be able to say anything using only those words. But, as some examples will show, the problem of Zipf's wall is simply ignored (and so people don't use BE much). BE does also claim that its words are among the most commonly used in English and so provides a further guide: even though the list is too long for direct use (probably), if you have it covered with appropriately short words you are well on your way.

3. Invent a slang. This is a bit of a cheat, but one that natural languages use all the time. The version that is likely to occur to a language creator is apocopation, dropping out stuff. If "Geheimnis Staat Polizei" is something you say a lot, lop it down to "Gestapo." Everybody knows what you mean and you are not really introducing a new word, just saying the old one faster. American go for acronyms (initial letters of the underlying words), but our former enemies seem to prefer slightly larger chunks of the original (see above and "Ogpu" and "Sudoku") -- which is often more informative (a number of CYAs have rather conflicting agendas and "confusing" two of them can make for really bad jokes). The fullest form of this sort is the word-forming rules in Loglan and Lojban, although is not quite what is intended here, since the results are new words and even have definitions which could not be derived readily from the sources.

Of course, there are other forms that slang can take. One is frozen metaphor, where a word that does not mean what you want but can be connected with it in some poetic (or not so) context comes to stand for what you want (there is a nice rhetoricians' name for this but I can't remember it now; I'll ask my resident English major). Assuming that the picked up word is incongruous enough in the context where it is used for what you want, this will work -- perhaps with a little training (both sheep and cotton have been compared to clouds, so one might use "cloud" for either or both of them -- you don't pick sheep and you don't herd cotton and you don't do either to clouds (ah, but you do make cloth from both sheep and cotton, though still not clouds)).

Or the reference might be indirect, through an accidental or a causal intermediary. Rhyming slang is a good example of this: since "Bees and honey stand for money" so does "bees" alone. Along the causal line we have all the American terms for money, listing the essential it buys: "dough," "bread," "bacon," and so on.

4. Expand the meaning of some basic terms. This might be viewed as a case of the last sort, but it comes into play at a different level, as an official part of the language. Again, natural languages provide models, as when "times" went from several points or periods in time to multiplication.

Actual created fixed-morpheme languages use some combination of all of these dodges, as they must if they are to achieve their goal of saying all that needs to be said, but with limited resources. Critics from the outside tend to take these moves as a proof that the program of such languages cannot be accomplished, rather than noting the ingenuity of language (and language creators, of course) to deal with situations as they come to the fore.