Phonetic symbols in Calibri and Cambria

December 29, 201114 yr

Here are, side by side, a zero and an empty set ( $\emptyset$ ) in the Computer Modern font (the default in TeX and LaTeX).

Thus the Computer Modern \emptyset is not quite a slashed zero. I don't like it, instead I use \varnothing given by the amssymb package that gives me the slashed circle. On the other hand, using the slashed circle for a linguistics null symbol looks inappropriate. In any case, shouldn't linguists have their own symbol?

December 29, 201114 yr

One thing that I like about the SIL fonts (Charis SIL & Doulos SIL) and Gentium that I haven’t seen much support for elsewhere are the slightly larger and heavier variants of the apostrophic modifier letters, U+02BB – U+02BD and U+02EE. Most fonts simply duplicate the shapes of the quotation marks, but the modifier letters are properly alphabetic letters rather than punctuation.

Best I can remember, these are raised and turned commas. In a number of fonts today, the apostrophe is smaller than a comma; in these cases anyway, making them larger (as in a raised or turned comma) just makes sense.

As a typesetter, one problem I always encounter is whether to use the proper Unicode characters, U+02BB & U+02BD, or U+2018 & U+2019 (single open & close quotes). The reason is ebooks are getting to where they are almost always made with university press books, and the spacing modifiers are absent in most fonts. iPAD & Kindle won't display them; and in spite of the EPUB3 specification, Apple's big concern seems to destroying flash. No worry about small things like having the right characters, or allowing embedding fonts.

The other reason is interior designers never check to see if proper characters are in the fonts they choose, so if the EULA forbids it, you can't even make up the correct character for the print edition. Or you take it from another font, which takes time, money, and hand-fitting every occurrence. Not likely.

These gripes are pedestrian compared to what is being discussed, but it is an issue once you get beyond the manuscript.

December 31, 201114 yr

[Michel Boyer:] On the other hand, using the slashed circle for a linguistics null symbol looks inappropriate. In any case, shouldn't linguists have their own symbol?

Semantically the linguistic null symbol is the same as the mathematical empty set. Both are used to mean ‘nothing’ or something similar. But typographically they are very much distinct. From a Unicode perspective this means that the two are to be identified as the same character, and hence the problem is left to font design. It’s similar to the language-specific issues regarding letters like italic U+0442 ‘Cyrillic Small Letter Te’ in Serbian versus other Cyrillic, or whether a dollar sign should have one vertical bar or two. In these sorts of cases the meaning of the symbol is the same, it’s just the shape that differs. But where OpenType provides contextual indicators of languages, there is no such indication for scientific disciplines. I’m not suggesting that such things are necessary, just that the parallel between the two situations breaks down there.

Linguists do generally prefer TeX’s default $\emptyset$ , and mathematicians generally seem to like $\varnothing. (La)TeX makes this distinction explicitly available, but Unicode does not. I actually side with Unicode on this, though obviously I bemoan the lack of differentiation outside of the TeXosphere.

[charles_e:] As a typesetter, one problem I always encounter is whether to use the proper Unicode characters, U+02BB & U+02BD, or U+2018 & U+2019 (single open & close quotes).

The difference between the modifier letters (U+02BB – U+02BD and U+02EE) versus the open and close quotes (U+2018 & U+2019) is not really in their appearance. Instead, the modifier letters in Unicode belong to the Lm (Modifier_Letter) category and hence they are meant to be processed like alphabetic characters (A–Z, etc.). That means that they are part of the language’s alphabet rather than being accessory symbols like punctuation. In contrast, U+2018 and U+2019 are in the Pi and Pf categories respectively (Initial_Punctuation and Final_Punctuation). They are punctuation characters just like the ampersand, hyphen, question mark, octothorpe, and so forth.

This distinction between modifier letter and punctuation is crucial but largely invisible. The modifier letters should be treated just like any other letter in the language’s alphabet. So if a modifier letter is part of a digraph then it shouldn’t be divided for hyphenation, for the same reason that you wouldn’t divide ch as c-h in English. It’d be equally wrong to divide Tlingit’s tsʼ as ts-ʼ. Although the idea of hyphenating a quotation mark seems strange, there are other contexts where the difference is important. In American-style punctuation practice it’s possible to reorder the punctuation symbols at the end of a quotation, so that commas and periods should be shifted to the left of a quotation mark: ‘... foo’, he said should become ‘... foo,’ he said. For modifier letters this rearrangement is absolutely prohibited: ... tʼoochʼ, yéi yaawaḵaa should never ever become ... tʼooch,ʼ yéi yaawaḵaa because that latter form is nonsense in Tlingit. Modifier letters are simply not punctuation, and should never be confused with punctuation. They are instead inherent parts of a letter or word, just like all the other alphabetic characters. (Your mileage may vary regarding apostrophes in English and French used to mark elision of letters, or English’s possessive suffix -’s.)

Now, most linguists don’t make this distinction in Unicode because they don’t know about it. Or rather, they don’t know that they know – they have an implicit understanding of the distinction but don’t realize that Unicode actually puts this distinction into practice. Nearly every linguist I’ve ever met is ignorant of Unicode’s fine structure, they are only concerned with “is there a character in the ‘symbols’ dialog box that looks like what I want?”. So it’s the typesetter’s job to figure out from the submitted typescript that there should be a distinction between one kind of apostrophe symbol used for punctuation and another kind of apostrophe symbol used alphabetically. Often this is obvious, but sometimes you have to ask. The simplest question to ask the author is whether a particular apostrophe is part of a letter or word, or whether it’s just punctuation like in English. You may get back an essay on ejectives or glottal stops or something, or you may get a nastygram saying “it’s there for a reason, don’t change it, dammit”, but in either case you’ll get feedback which is better than mangling the results and being chewed out later.

December 31, 201114 yr

[jcrippen] Semantically the linguistic null symbol is the same as the mathematical empty set. Both are used to mean ‘nothing’ or something similar.

The empty set is not nothing. It is the set {}, the set that contains no element. In particular, there is a unique function from the empty set to the empty set: that is the function whose graph is the empty set. That gives a combinatorial "proof" that zero to the power zero is equal to 1 (after a little set theory).

December 31, 201114 yr

Well yes, to be precise it is the set of nothing. And since linguistics is founded partly on set theory, the linguistic zero usually means the same thing. So if you have a set of morphemes {-p, -t, -k, and -q} that occur in a paradigm, you can also analyze the set as including an empty set. That’s because by definition the empty set exists in any set, including the empty set itself. This empty set can then considered to be an empty morpheme, -∅. Or you may instead want to define the empty morpheme as an element distinct from the empty set, depending on your theory of morphology. In that case the other morphemes are made of elements taken from the set of all possible sounds in the language, and the null morpheme is the empty set that is included in that set of all sounds. That leaves a distinction between a zero morpheme and a lack of a morpheme, which is important in some morphological theories.

So linguistic nulls are the nothings of a category, just like mathematical nulls; indeed, they are mathematical nulls because linguistic analysis is just another application of mathematics. But all of this is pedantic from a typographic and character set standpoint. The basic issue for typographers is that they’re represented by the same character, though not necessarily with the same presentation form (font variant, etc.).

December 31, 201114 yr

The empty set is a subset of {-p, -t, -k, -q}, it is not an element of that set. The empty set ∅ contains no element. The set {∅} contains one element and in the Von Neumann notation for integers, it represents the integer 1. Similarly, the set {∅, {∅}} represents 2 etc. With the word "contains" you keep a dangerous ambiguity: is it an element? is it a subset? You cannot add the empty set to a set without changing it unless it already contains the empty set as an element; ∅ is not an element of ∅, and it is not an element of {-p, -t, -k, -q} either.

What I know of linguistics (way before Government and binding) uses rewrite systems (still in use in computer science). In such systems, a variable may be rewritten as a sequence of variables and letters in some alphabet. When the right-hand side contains nothing, we use the empty string, denoted ε or λ or Λ but never ∅, even if the right-hand side looks like "nothing". For instance the grammar X -> ε, X ->Xa generates the set of strings {ε, a, aa, aaa, aaaa, ... }. Would you ever use ∅ for ε? If not, why is it more justified with morphemes?

January 1, 201214 yr

The difference between the modifier letters (U+02BB – U+02BD and U+02EE) versus the open and close quotes (U+2018 & U+2019) is not really in their appearance. Instead, the modifier letters in Unicode belong to the Lm (Modifier_Letter) category and hence they are meant to be processed like alphabetic characters (A–Z, etc.). That means that they are part of the language’s alphabet rather than being accessory symbols like punctuation. In contrast, U+2018 and U+2019 are in the Pi and Pf categories respectively (Initial_Punctuation and Final_Punctuation). They are punctuation characters just like the ampersand, hyphen, question mark, octothorpe, and so forth.

You miss my point. As long as the task is limited to getting ink on paper, there is not much of a problem -- ink does not preserve character encodings. Or, if your purpose is limited to academics circulating texts privately, the problems are manageable.

When you move to publishing material, the problems are rather larger, and decisions more complicated. Now the correct Unicode encoding is important. Here are the problems: (1) Few type designers are willing to include character sets with limited use. (2) Few font publishers are willing to allow their fonts to be modified to make up the needed characters. (3) Few people who select typefaces for published material are willing to use fonts that do include the proper Unicode character. (3) has two forks: (a) the designers who select fonts for ink on paper (rather less of a problem), and (b) the ebook reader device manufacturers, who limit the typefaces their devices will display.

So, someone preparing these files -- the typesetter -- is faced with the decision about which character to use. Skipping the problems ink on paper, every book I've worked on in the past two years has required an eventual ebook. The choice one faces is to use punctuation characters, which are wrong and limit searching a file, or using the correct character, which will be searchable, but will not display.

January 1, 201214 yr

[charles_e:] The choice one faces is to use punctuation characters, which are wrong and limit searching a file, or using the correct character, which will be searchable, but will not display.

Yes, this is the exact problem that I have faced, and which I have kludged around with by having TeX display one character but embed a different one in the text stream of a PDF. The hyperref package includes a command \texorpdfstring to do this. It’s a nasty kludge though, and it’d be better if fonts included characters with appropriate variant forms. I’m not a font designer and not willing to abuse EULAs on existing fonts, so I haven’t fixed this problem for myself. It’d be nice if either EULAs were a bit more flexible for these not-entirely-unusual circumstances, and/or if font designers could ask around a bit more about potential uses of their works. It’s not possible to please everybody, but it is possible to try.

January 1, 201214 yr

As the state of Israel can attest, a policy of ambiguity can be useful...
But it doesn't win you trusted friends, and I think a foundry needs that.

hhp

January 1, 201214 yr

James,

Consider that it isn't the EULA, but the permission to modify that is the issue. For example, Adobe gives the end user permission to modify fonts for one's own use. It is in Adobe's FAQ. It counts as one of the permissible copies. But the Adobe EULA at least use to forbid modifications. It would seem the specifics of the FAQ override a portion of the EULA, as would written permission from the font publisher.

My belief is the EULA is/was an attempt to stop piracy, and modification for one's own use is only occasionally seen as an extra revenue stream. Anyway, if you ask for and receive permission, modification is allowable. The large font publishers have, for a while now, refused to give permission. Back in the mid-1900s, they sometimes did. Well, that's two or more owners ago for Linotype, and at least a couple of owners ago for Monotype. FontFont use to occasionally grant permission when they first moved into the States. But forget them as of 2012.

Adobe's policy solves the print problem, and probably the PDF problem, but not the ebook problem, until font embedding (EPUB3) is implemented by Apple and Amazon. Fat chance, today. And the font publishers may want to rule separately on web fonts. At this point, who can say? But some won't and that's all it takes.

BTW, there are other small foundries/font publishers who will give permission.

January 1, 201214 yr

I sort of agree, hrant. Soon, layout programs will perhaps include a font editing program inside. They already have glyph scaling (non-proportional). All we need is weight changing, and perhaps character addition. I find it absurd that you can "modify" a font with one kind of software, but not another. I know, it is a piracy issue. BTW, there is a small discussion of this in my chapter in Rich's forthcoming book. Are you sure you won't buy a copy? Still going to use the library? ;-)

(I'll allow the discussion is too short to be worth the cost. There are other points discussed...)

January 1, 201214 yr

The books I check out from UCLA nobody else ever wants for
some reason. So I just keep renewing them online. I've had some
of them over ten years. Sometimes I hit their 99 renewal limit, at
which point I have to call in, act dumb and get it reset to zero.

hhp

January 2, 201214 yr

Concerning the slashed zero shape for the empty set, here is what the Unicode Technical Report #25, Unicode Support for Mathematics (docx, 328K) says about it:

A widespread alternate symbol for the empty set is a slashed digit zero. This can be encoded as U+0030 DIGIT ZERO followed by U+0338 COMBINING LONG SOLIDUS OVERLAY.

Michel

August 25, 201213 yr

Thanks, adding it to the "list".

Cheers, Si

It seems that it hasn't been added to the latest versions of Calibri and Cambria.

If you select some text with the character, you'll get it in MS Mincho font.

July 16, 201313 yr

Has anything been done on this front?

July 22, 201312 yr

Three years and nothing has been fixed?

Phonetic symbols in Calibri and Cambria

Featured Replies

Create an account or sign in to comment

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)