Rock ’n’ Roll: correct apostrophe usage

March 7, 201313 yr

Hrant: Just so I'm clear: Is an apostrophe in text always encoded as a RIGHT SINGLE QUOTE MARK?

No. Very often it is encoded as the generic U+0027. I'm guessing that's probably the most common encoding simply because that's what the keyboard makes convenient. In order to encode U+2019, whether as apostrophe or quote mark, one either needs to input it directly (ALT+0146 on Windows, custom keyboard, etc.) or rely on 'smart quote' algorithms, which work a lot of the time and provide work for proofreaders the rest of the time.

What are the chances somebody would write a Word and/or InDesign plug-in that goes through a text and changes the "intended apostrophes" to U+02BC?

I think you could expect such a plug-in to be about as accurate as smart quote algorithms. In other words, it would get it right almost all the time, but would get it wrong in some ambiguous circumstances. As Joe rightly points out, it is likely to get it wrong more often in British usage.

March 7, 201313 yr

Just noticed this earlier comment:

Hrant: BTW auto-quotes also mess up the Hawai‘ian ‘okina diacritic.

The proper character for this really should be U+02BB MODIFIED LETTER TURNED COMMA, but I'm sure it occurs encoded in many instances as either U+0027 or U+2018, and doubtless also U+2019 whether as the result of smart quote algorithms or ignorance.

I wouldn't classify it as a diacritic: it represents a glottal stop, which means it is a full consonant and considered a letter in the alphabet. And to be fair to the ignorant, in other orthographies the ’ is much more commonly found as a glottal stop than the ‘ shape.

March 7, 201313 yr

that's not how our writing system works.

In terms of an informal -if pervasive- tradition, I would have to agree. But where does it say "they must look the same"? It's just a lazy fallback (one that can cause confusion) and I don't think making the apostrophe and single right quote look different is any kind of "reform" - it's just a result of believing that's Good Design.

When I made Cristaal's right quote(s) point up, that was not Wrong, but neither was I following some formal system. And others can see it (or hear about it) and possibly follow suit, creating a new tradition. For a while -thanks largely to ATF- mirrored quotes (where the left quotes point down) were quite common (and interestingly the old MS Core Fonts did that too) but that was not some act of sedition.

Thanks for the correction/elaboration on the ‘okina. One thing I value BTW is that it's supposed to point up (because that makes it less confusable with the apostrophe).

hhp

March 7, 201313 yr

For a while -thanks largely to ATF- mirrored quotes (where the left quotes point down) were quite common (and interestingly the old MS Core Fonts did that too) but that was not some act of sedition.

Unless you're a German whose punctuation system is messed up by such designs.

I don't think it's a 'lazy fallback' that the apostrophe and right quote look the same. Its the outcome of an historical decision that this little mark ’ has more than one usage in text. Yes, it might sometimes result in confusion, but it's not at all uncommon for writing systems to contain such confuseables. Heck, we're talking about capturing natural language here: ambiguity, confusion, multiple meanings -- these are the very hallmarks of human communication.

March 7, 201313 yr

I remember that German situation being mentioned recently. I'm curious, would my upward (and inward) pointing quotes also not work out? Also, would German-localized versions of the quote glyphs solve the problem, or are language tags not well-supported?

Ambiguity is natural, but so are warts. Let's treat them.

hhp

March 7, 201313 yr

Letters:
ʹ — U+02B9 Modifier Letter Prime
ʻ — U+02BB Modifier Letter Turned Comma
ʼ — U+02BC Modifier Letter Apostrophe
ˊ — U+02CA Modifier Letter Acute Accent
ˋ — U+02CB Modifier Letter Grave Accent

Puncutation:
' — U+0027 Apostrophe
‘ — U+2018 Left Single Quotation Mark
’ — U+2019 Right Single Quotation Mark
′ — U+2032 Prime
‵ — U+2035 Reversed Prime

Symbols:
` — U+0060 Grave Accent
´ — U+00B4 Acute Accent

The Unicode standard defines these three categories according to their expected behaviours in various writing systems and other forms of written communication (math, etc.). Letters are alphabetic elements, like the use of ʻ U+02BB Modifier Letter Turned Comma in Hawaiian to represent the glottal stop /ʔ/. Punctuation is an element that is paralinguistic, used for indicating textual phenomena that are not necessarily part of the spoken language. (So e.g. commas have an associated intonation contour in English, but commas don’t always occur where this intonation is found in speech and vice versa. The previous sentence is an example.) Symbols are something else entirely, and are kind of hard to define I guess.

There’s a pretty good argument to be made for the use of ʼ U+02BC Modifier Letter Apostrophe in English where we use apostrophes for contraction and possession: ‹ donʼt › and ‹ dogʼs ›. It is in essence an orthographic element that distinguishes different lexical items, and that’s what we usually think of as a “letter” even though the apostrophe doesn’t have an independent sound of its own. But it’s hard enough to get people to use ’ U+2019 Right Single Quotation Mark instead of ' U+0027 Apostrophe, so that asking people to differentiate the quotation ’ and letter-apostrophe ʼ is just tilting at windmills.

As for rock ’n’ roll, I think it’s best with the two apostrophes pointing the same way. Logically they are both apostrophes and not quotation marks, and English doesn’t have an apostrophe that points in the other direction. (Actually I don’t think any LGC orthography does, but I could be wrong). Writing ‹ rock ‘n’ roll › makes me think at first that the ‹ n › is being scare-quoted.

March 7, 201313 yr

Nice details and analysis. I think U+02BC is sounding pretty solid indeed.

Just one thing:

But it’s hard enough to get people to use ’ U+2019 Right Single Quotation Mark instead of ' U+0027 Apostrophe, so that asking people to differentiate the quotation ’ and letter-apostrophe ʼ is just tilting at windmills.

It's not "people" that need to worry - they can't type anything more a "dumb" quote/apostrophe anyway; we need the software to automatically map to U+02BC (as best it can) as needed.

hhp

March 8, 201313 yr

Hrant: Also, would German-localized versions of the quote glyphs solve the problem, or are language tags not well-supported?

They are unevenly supported. Also, punctuation substitutions are unreliable in OpenType because there is a tendency in some software not to roll punctuation into glyph runs with adjacent text. Remember, OpenType Layout proceeds from script to language system to glyph, but the decision about what constitutes a character in a given script is made by the software, not by the font. Since a lot of punctuation is script-neutral, it can only pick up a script identity by algorithmic analysis of adjacent or surrounding text content. But that doesn't happen everywhere, while some software might simply presume common punctuation characters = Latin, which might help your German quote situation, but is a pain in the neck when trying to e.g. kern tall Thai vowels to preceding quote marks or parentheses!

With regard to the German quote issue, the desirable form of the 'left quote' U+2018, i.e. the German closing quote, is a 180 degree rotated and raised form of the opening baseline quote with which it corresponds.

March 8, 201313 yr

Great insights - thanks.

BTW, every passing day, I like guillemets more. :-)

hhp

March 8, 201313 yr

John (Q):

Back in the old days of ASCII, U+0060 was a grave accent only to the same extent as U+0022 was an umlaut and U+0027 was an acute accent. That is, one possible unconventional coding was to overstrike those characters, and have their shape altered on sophisticated systems, or their meaning recognized by humans for output fr0m unsophisticated ones, to attain accents.

That this exotic coding is now claimed as the primary meaning of the character in the Unicode standard... is, I suppose, possible, but if so it does not give me great confidence in the committee responsible.

You are confusing things by referring to U+0060, U+0022, etc. and then talking about ASCII. The prefix U+ indicates a Unicode codepoint, i.e. a character in the Unicode Standard, not some other standard. So it makes no sense to talk about e.g. U+0060 'back in the old days of ASCII'. The Unicode 'C0 Controls and Basic Latin' block provides a one-to-one mapping of Unicode characters to ASCII characters, which is not the same thing as being the ASCII standard. As you say, the ASCII standard deliberately enabled the interpretation of some codes as representing multiple characters. A principal -- and principle -- goal of Unicode's larger character set was to avoid such confusions, so I think the UTC was eminently sensible and responsible in assigning only one meaning to the Unicode character U+0060, allowing the other interpretations of the corresponding ASCII code to have their own unique Unicode assignments. I also think they made the right choice in selecting the spacing grave as the identity of this character given that the same block includes a corresponding spacing acute character, and the single quote character is handled as deliberately direction agnostic in almost all software -- note as a 'right single quote' and as a vertical glyph in almost all fonts, and there would be in any case no corresponding 'left double quote' if U+0060 were interpreted as a 'left single quote'.

I still occassionally get emails from people that are punctuated like `this'. It is so obviously a mistake, I have to wonder what combination of software and font they might be using that doesn't display it as such, or if they are blind.

March 8, 201313 yr

I have to wonder what combination of software and font they might be using that doesn't display it as such, or if they are blind.

That is the way to get the right thing in LaTeX. Also ``word'' gives the right double quotes.

If you want to get that behaviour in XeLaTeX, you need to specify Mapping=tex-text when setting the font, for instance:
\setromanfont[Mapping=tex-text]{Chaparral Pro}

March 8, 201313 yr

John, that chaps my hide too.

hhp

March 8, 201313 yr

Smart quote software could be made smarter, to include a dictionary of "exceptions" such as the first apostrophe in rock 'n' roll. After all, look at the way the new Blackberry reads people's minds and finishes their sentences for them.

March 11, 201313 yr

I think the reason that’s not caught on – outside of Microsoft Word, perhaps – is because you have to have a different set of exceptions for each language. If all you care about is English and French then it’s not too hard, but once you start including even other big languages like Spanish, German, Dutch, and Italian you’ve got a huge pile of databases to build and maintain.

March 11, 201313 yr

@John Hudson:
On considering the matter more, I can see that the Unicode Consortium decision probably was quite reasonable.

I didn't want to start referring to Unicode ' as U+0027 and ASCII ' as X'27' as that would confuse people.

Since ' was used as the only quote much more than ` and ' were used as paired quotes - ` was, and is, so little used that I kind of wish that codepoint were used for, say, the degree symbol - I can somewhat see the logic of using ` for a grave accent.

But that just felt wrong, simply because accents were far to exotic to be part of the primary 7-bit ASCII set. I also felt that ^ should never have been changed from the up-arrow, so useful as an exponentiation operator.

Thus, when ISO 8859-1 came along, with all those accented letters, but without desperately needed symbols such as ≤, ≥, and ≠, I could only wonder what they were thinking. (On the other hand, placing × and ÷ on codepoints obviously more suitable for Œ and œ simply further compounded the insanity in the opposite direction. After all, * and / were perfectly good for multiplication and division, unless one was writing grade school arithmetic textbooks.)

Of course, the whole world doesn't speak English. So a character set like ISO 8859-1 was indeed a good idea. But it should have been the -2 set, as it were, in my opinion. There already were alternate versions of 7-bit ASCII for the major European languages; what would make sense from my perspective would have been to allow any of those to have a supplementary set of high-bit characters bolted on. (And, thus, characters likely to be replaced - @, [, \, ], ^, _, `, {, |, }, ~ - would end up being copied/moved to the high-bit side) since, in general, after all, people only use one language at a time.

And people who speak different languages clearly aren't able to communicate with each other, and so having the same character coding for areas where different languages are spoken... could wait until we went to 16 bits with Unicode (although, again, a set like 8859-1 for the specialized and exotic purpose of international communication certainly would be of some use).

Of course, strange to relate, in Continental Europe, people don't share the view of people living in, say, North America or Australia that anyone who speaks a different language either lives thousands of miles away or is a poor immigrant who is going to be learning your language instead of the other way around - because of the disparity in the economic value of the effort required.

Rock ’n’ Roll: correct apostrophe usage

Featured Replies

Create an account or sign in to comment

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)