Þorsten Posted February 12, 2015 Posted February 12, 2015 (edited) Current versions of common operating systems allow writers of Romanian texts to enter the correct representations of Ș and Ț. So how much longer should font developers impose automatic replacement of Ş and Ţ¹ when the document language is detected to be Romanian (or Moldovan)? Since it can be non-intuitive (if not outright impossible) to specify the language for parts of documents, such automatic replacement might make it difficult or impossible to correctly render foreign-language terms within Romanian texts. E.g., the following sentences can be found in the Romanian-language edition of Wikipedia: Eskişehir este un oraș din Turcia. Şanlıurfa este un oraș mare din sud-estul Turciei cu peste 500.000 locuitori. (Eskişehir is a city in Turkey. Şanlıurfa is a city …) With auto-replacement, Ş/ş in the Turkish city names might be rendered incorrectly in Romanian texts. One could also foresee problems for authors of Gagauz-language texts. Gagauz is a Turkic language which uses both Ş and the rare Ţ¹ (T-cedilla). As it is a minority language in Moldova, Gagauz authors might conceivably use computers and software configured for Romanian. _____________________1. An added wrinkle: Since T-cedilla is so rare, many fonts incorrectly represent it as T-comma. The font used to render posts in this forum is no exception. Edited February 12, 2015 by Þorsten 1
Þorsten Posted February 13, 2015 Author Posted February 13, 2015 Since buggy live renderings might be confusing, here is a picture of what these letters should look like:
Tatiana Marza Posted February 15, 2015 Posted February 15, 2015 (edited) Very interesting topic. My native language is Romanian and I haven't realized how confusing these letters can be for a text with more than one language... I was wondering, from the design point of view, how the cedillas could be distinguished? Since every font has different designing features, a designing rule will be applied (for these 2 groups of cedillas)? For example, for Romanian language the cedilla should have this form, and for Turkish that form. Does something like this exist? Edited February 15, 2015 by Tatiana Marza
Þorsten Posted February 15, 2015 Author Posted February 15, 2015 From an encoding perspective, Ş (“S-cedilla”, used in various Turkic languages, Unicode value 015E) and Ș (“S-comma”, used only in Romanian¹, Unicode value 0218) are completely separate letters. The same applies to “T-cedilla” (used in Gagauz, Unicode value 0162, the forum font won’t properly show the glyph) and Ț (“T-comma”, used only in Romanian¹, Unicode value 021A) and all their lower-case companions. This wasn’t always the case, though — and herein lies the problem. In early versions of Unicode as well as before Unicode, S-comma and T-comma were (improperly) not considered letters that were distinct from S-cedilla and T-cedilla. This was obviously on oversight and was corrected in later versions of Unicode. Treating Ş and Ș as merely stylistic variations of the same letter not only caused all sorts of practical problems, it was just plain wrong as a matter of principle, I think. Ø and Ö aren’t treated as merely stylistic variations of the same letter, e.g. But today, there should be no conflict. Any font should render characters in Unicode slots 015E and 0162 with cedilla-shaped diacritics, and letters in Unicode slots 0218 and 021A with comma-shaped diacritics. (Again, I am omitting the slots for lower-case characters for brevity.) ___________1. an Moldovan, if you consider it a distinct language
Wrzlprmft Posted February 15, 2015 Posted February 15, 2015 (edited) Here is a detailled history of this situation for anybody interested. Edited February 15, 2015 by Wrzlprmft 1
Þorsten Posted February 15, 2015 Author Posted February 15, 2015 Thanks for the link (which I knew and should have included myself.) The excerpt most relevant to my questions appears to be this: 2008. Some OpenType fonts from Adobe and all C-series Vista fonts implement the optional OpenType feature GSUB/latn/ROM/locl. This feature forces S-cedilla to be rendered using the same glyph as S with comma below. When this second (but optional) remapping takes place, Romanian Unicode text is rendered with comma-below glyphs regardless of code point variants. Good. The author takes a clear stand in favor of automatic substitution — in 2008. Does he still favor it some six years later? I think I’ll ask …
Wrzlprmft Posted February 15, 2015 Posted February 15, 2015 The author takes a clear stand in favor of automatic substitution — in 2008. Does he still favor it some six years later? I think I’ll ask … Well, there is probably nobody better to answer your question.
Tatiana Marza Posted February 16, 2015 Posted February 16, 2015 Quote ___________ 1. an Moldovan, if you consider it a distinct language Let's not open another very, very problematic subject! While I tend to believe that it is about the same language when you write Romanian or Moldovan, it is completely different when you talk in those languages. People from Republic of Moldova can understand Romanians, while the latter have difficulty in understanding us, because the accent is different and we created our own dictionary= Romanian+Russian, which is very funny, if you look from a more positive point of view. Cannot wait for the answer of the mentioned author...
Þorsten Posted February 16, 2015 Author Posted February 16, 2015 *whispering* Okay, let’s not discuss it. *in normal voice* I only mentioned it because most fonts which do implement Ş→Ș replacement appear to use both language tags: ROM and MOL. I don’t even know, by the way, if there are real-world users out there who specify MOL (as opposed to ROM) as a document and/or text language anywhere. Do operating systems and software even support this? The operating system I’m using right now, KDE/Linux, generally has excellent support for even the most obscure languages (and language varieties), but it does not appear to offer support for anything other than Romanian proper. The same seems to apply to OpenOffice.org and Libre Office. (The currency-specific localization settings include the Moldovan leu, but that’s really a separate issue.)
Tatiana Marza Posted February 19, 2015 Posted February 19, 2015 Quote ... but it does not appear to offer support for anything other than Romanian proper. Exactly! Because in written form, Romanian and Moldovan are identical and it would be useless to include Moldovan language. Ok, we, Moldovans may use some out of date words and then Romanians laugh at us ... But referring to ROM and MOL tags, I suppose they are being used because of the political correct form...
Þorsten Posted February 19, 2015 Author Posted February 19, 2015 Thanks for confirming this! On 18 February 2015 at 2:18 PM, Tatiana Marza said: Ok, we, Moldovans may use some out of date words and then Romanians laugh at us Well, we East Germans sometimes encounter this with West Germans, too.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now