Uli Posted August 9, 2012 Posted August 9, 2012 This topic was imported from the Typophile platform It seems that the Adobe Devanagari font was not yet discussed at Typophile. I had a closer technical look at AdobeDevanagari-Regular.otf, version 1.105 (2011), and here are my findings: 1. The Latin diacritics required for transliterating Indic (Hindi, Sanskrit, etc.) texts is incomplete. The diacritic for "sh" (both lowercase and uppercase), very frequently used in Indic words, e.g. in Shiva etc., is missing. 2. Many frequently used ligatures are missing, even ligatures which have a frequency of much more than 0.01 %, e.g. "ddhv" (frequency 0.215 %). For details see http://www.sanskritweb.net/itrans/adobe-ligatures.pdf For comparison see http://www.sanskritweb.net/itrans/itmanual2003.pdf (page 29 seq.) 3. The Adobe Devanagari font does not work with older Windows and older Word. For example, Adobe Devanagari does not work with old Microsoft Word, version 10, in conjunction with Windows XP. For comparison, Mangal and all the other Devanagari Unicode fonts known to me work with older Word and older Windows, provided the Uniscribe system library for foreign language support was installed with Windows.
John Hudson Posted August 9, 2012 Posted August 9, 2012 The first thing that should be noted is that Adobe Devanagari was designed specifically for modern Hindi use, and not for Sanskrit; it may be of limited use even for other modern languages such as Marathi and Nepali. The design brief was specifically to target use of Hindi in a modern business environment (the font was originally made to bundle with Acrobat), and not scholarly use. 1. That could be an oversight. Thanks for bringing it to our attention. 2. See note above re. target language support. The ligature set was based on a mixed approach: referencing the set employed in Linotype Devanagari (on which Fiona also worked) and also frequency analysis of modern Hindi text. 3. Correct. The font uses only the newer 'dev2' script tag and layout behaviour, not the older, deprecated shaping. This is what Adobe spec'd. Hence the font will not work in pre-Vista versions of Windows or other environments that only support 'deva'.
Uli Posted August 9, 2012 Author Posted August 9, 2012 A) > Adobe Devanagari was designed specifically for modern Hindi use, and not for Sanskrit That's okay, but a circumspect designer, adding only a few additional ligatures could make the font also suitable for Marathi and Sanskrit. For example, for Classical Sanskrit, only 11 additional ligatures would be required to make the Adobe Devanagari font suitable for Classical Sanskrit (as opposed to Vedic Sanskrit). see file adobe-ligatures.pdf (see the missing ligatures marked with "!!!") B) > That could be an oversight The oversight was due to the fact that most Indic diacritics are in the Unicode range 1E0C through 1E96, with the exception of 015A Sacute 015B sacute which are used in Polish texts, but which happen to be used also as Indic diacritics. C) > The design brief was specifically to target use of Hindi If this should be the case, then there must have been made design errors. (Cave: I am not a Hindi expert, so that a scholar should examine the following) (a) On the one hand, it seems to me (see cave above) that the Adobe Devanagari font INCLUDES ligatures, which are (to my knowledge) NOT used in Hindi (and also NOT used in Sanskrit etc.), for example "kspr": see http://www.sanskritweb.net/temporary/kspr-kspl.jpg Did the Adobe designers invent the ligature "kspr" just for fun? (b) On the other hand, it seems to me (see cave above) that the Adobe Devanagari font LACKS ligatures, which are (to my knowledge) USED in Hindi, for example "dg" in the Hindi word "khadga" (= "sword" in English): see http://www.sanskritweb.net/temporary/khadga.jpg In the Oxford Hindi-English dictionary by McGregor published in 1993, the word "khadga" was typeset with virama, i.e. without ligature, due to the fact that professional Devanagari fonts, such as Siddhanta.ttf and Sanskrit2003.ttf, were not available at that time. But today in the Unicode era, it is possible to design professional Devanagari fonts suited for Hindi. I wonder whether a Hindi scholar ever checked the Adobe Devanagari font for ligatures (a) required for Hindi and (b) not used in Hindi Instead of including ligatures required for Hindi, the Adobe Devanagari fonts includes a plethora of fancy ikara variants which would only make sense in a font containing the required ligatures. see http://www.sanskritweb.net/temporary/ikara.jpg
Przemysław Posted August 10, 2012 Posted August 10, 2012 The font uses only the newer 'dev2' script tag and layout behaviour, not the older, deprecated shaping. Why then does it have the "script deva;" declaration for nukt, akhn etc. etc.? 015A Sacute 015B sacute What? The font has both.
Uli Posted August 10, 2012 Author Posted August 10, 2012 Mr. Przemysław: >015A Sacute, 015B sacute. What? The font has both. But they do not show up on my old Windows XP machine. With Windows XP, the Adobe Devanagari font has quirks. Other diacritics show up on my machine, but for S/s acute, Wordpad of Windows XP defaults to Verdana (see PDF file). Mr. Boyer: > I don't see your ligature. I have this book too, but which ligature do you mean?
Michel Boyer Posted August 10, 2012 Posted August 10, 2012 which ligature do you mean? The one in "sword", "khadga" as you wrote it. Here is a grab of your link: By the way, I doubt the font did not have it, because this one was available:
Uli Posted August 10, 2012 Author Posted August 10, 2012 Mr. Boyer: dga and nga are two different conjuncts (watch out for the "dot" to the right of the Devanagari glyph). Both Devanagari letters d and n only differ by the "dot" to the right. nga is available in Abobe Devanagari. In the McGregor textbook cited by you, see page xxiii.
Michel Boyer Posted August 10, 2012 Posted August 10, 2012 Herrn Ulrich Stiehl, I know very well that those two are different. I was pointing out that one is listed in the commonest conjuncts (nga), and thus was available in the font, and the one you point out (dga) is not listed. I see no reason a font used in the seventies with nga would not have dga and dga was not listed by McGregor. I conclude that it was intentional on McGregor's part (and not due to a missing glyph).
Uli Posted August 10, 2012 Author Posted August 10, 2012 Mr. Boyer: > commonest conjuncts As far as Sanskrit is concerned, dg has a frequency of 0.167 %, which means that it is very common in Sanskrit. As far as Hindi is concerned, you will have to supply your own frequency research results, because I am no Hindi expert. If we discuss the Adobe Devanagari font on the basis of the "commonest conjuncts", I think we should forget the Adobe font, because who wants to use a font which includes only the commonest conjuncts? "Mangal" was such a makeshift font including only the "commonest conjuncts".
Michel Boyer Posted August 10, 2012 Posted August 10, 2012 I am no expert of Hindi either, and I don't know where I could find a good corpus of Hindi texts for reliable statistics. The best I could do is to download the aspell Hindi dictionary from ftp://ftp.gnu.org/gnu/aspell/dict/0index.html, untar it with tar -jxpvf aspell6-hi-0.02-0.tar.bz2 and, in the directory aspell6-hi-0.02-0, execute preunzip hi.cwl to get the utf-8 encoded dictionary hi.wl. It contains 83514 entries, which compares favorably to my en_US dictionary (62115 entries) or my French dictionary (61305 entries). Here is now a trace of execution (on OS X 10.6) 512 % grep ड्ग hi.wl खड्ग खड्गकोश खड्गधारी खड्गधेनु खड्गमुष्टि खड्गी षड्ग षड्गुण 513 % Those are all the entries containing the ड्ग combination. I guess there are contexts where those entries can be used more frequently than others but I doubt a representative corpus would give the combination ड्ग a very high frequency.
Uli Posted August 10, 2012 Author Posted August 10, 2012 Thanks for performing the search for "dg". Now search for ligature "kspr", and you will see that it does not occur at all.
Uli Posted August 10, 2012 Author Posted August 10, 2012 > If you mean क्स्प्र then you are right. That is what I mean. The Adobe Devanagari font contains conjuncts which cannot occur in Hindi for linguistic reasons. In Sanskrit, theoretically, "kspr" could occur by combining a word ending in "k" with a word beginning with "spr", just like combining in English "ink" with "spray" to "inkspray" which would/could contain the "kspr-conjunct" (in-kspr-ay). But these are "wordplays" which do not have a linguistic basis. Therefore, I think, several conjuncts contained in the Adobe Devanagari font were invented just for fun.
John Hudson Posted August 10, 2012 Posted August 10, 2012 I've confirmed that the ś diacritic is indeed in the Adobe Devanagari fonts, as Przemysław reports. I've no idea why it isn't displaying in your WinXP environment, Uli. Have you checked to see whether it displays correctly with other PostScript OT fonts? Uli, can you share some information about the corpus on which you based your frequency analysis? This looks like very useful information, and I'd like to get an idea of the scope of the analysed text.
Michel Boyer Posted August 10, 2012 Posted August 10, 2012 Michel, I'm working on another Hindi font project at the moment and wonder if you might be able to assist me with some conjunct frequency analysis? That sounds interesting. I'd be glad to help.
John Hudson Posted August 10, 2012 Posted August 10, 2012 Some of the conjuncts in the Adobe Devanagari font were inherited from the Linotype Devanagari font, which was used a lot in newspapers and whose glyph set was in part developed in response to requests from newspaper editors in India. It is entirely possible that some conjuncts were intended for commonly transliterated words from other languages. I do think a more systematic, analytical approach should be taken to defining the conjunct set for Hindi fonts, as Uli has done for Sanskrit. In the case of the Adobe Devanagari, we compiled a set from a variety of sources. We had intended to cover Marathi too, but the development schedule wasn't very long because it needed to be bundled with Acrobat, so we had to prioritise for Adobe's principal targets. We did do testing of extended Hindi texts -- including an entire novel -- and found the font to perform well. Uli, it is worth bearing in mind that modern Hindi typography has broken quite significantly with the Sanskrit tradition. I understand your comment re. fonts that support only the more common conjunct ligatures, but that seems to me very much the perspective of a Sanskritist: Hindi readers have been used to extensive use of half forms and, for some letters, even explicit halants ever since the hot metal typesetting days. Indeed, when Fiona first reintroduced some of the conjunct ligature forms in the digital LT Devanagari (for the Linotron 202) there was resistance from some Indian customers to what were perceived as 'Sanskrit' forms. A lot of the custom encoding solutions for Hindi fonts, e.g. the Modular Infotech system, remain based around extensive half form use. None of this is to say that we couldn't, with more time and research, have come up with a better Hindi conjunct set for Adobe Devanagari, but that the expectation that any given conjunct, however rare, should be presented in its ligature form is not one that Hindi readers share with you. Thank you, by the way, for the very helpful review documents, which I'll certainly examine carefully and I hope will help us to do better in future.
John Hudson Posted August 10, 2012 Posted August 10, 2012 Thanks, Michel. I'll contact you with more details.
Uli Posted August 11, 2012 Author Posted August 11, 2012 1) Mr. Boyer: I discovered that by Edwin Greaves in his 1921 Hindi Grammar see http://archive.org/details/hindigrammar00greauoft the ligature "dg" is reckoned as a "principal compound". see http://www.sanskritweb.net/temporary/dga.jpg This book by Greaves has also an interesting introduction describing "High Hindi" as a language "for those who delight to cram their pages with high-sounding Sanskrit words" (see page 4 of the scanned book). But this was 90 years ago. 2) Mr. Hudson: I did not say that I have "the expectation that any given conjunct, however rare, should be presented in its ligature form". On the contrary, I said that "only 11 additional ligatures would be required to make the Adobe Devanagari font suitable for Classical Sanskrit". What I criticize is that the Adobe Devanagari font contains many extremely infrequent and perhaps entirely unattestable ligatures, whereas this font lacks very frequent ligatures, at least as far as Sanskrit is concerned. 3) Mr. Boyer and Mr. Hudson: I made a complete compound analysis of the Adobe Devanagari font, which will be of interest to Mr. Boyer and to Mr. Hudson: http://www.sanskritweb.net/itrans/adobe-ligatures-analysis.pdf The frequency counts are based on Sanskrit texts developed for Sanskrit fonts, but these statistics may be also of help for Hindi-only fonts. The most infrequent or rarest, though attestable, Sanskrit compounds have a frequency of 0.001 %. This means that in 100000 (one hundred thousand) lines of Sanskrit texts, this compound occurs only ONCE on an average. Now, if Mr. Boyer applies his Gnu Hindi dictionary word count utility to the compounds listed on pages 27 through 37 of the above file (adobe-ligatures-analysis.pdf) containing the rarest or most infrequent Sanskrit ligatures (0.001%), I predict that Mr. Boyer is sure to discover that innumerable ligatures (compounds), which are indeed contained in the Adobe Devanagari font cannot be attested (i.e. located or found) in his Hindi dictionary. If this be true (let's see), this would mean that the Adobe Devanagari font contains innumerable unattested compounds (ligatures) with a frequency of 0.000 % (nil, nothing).
John Hudson Posted August 11, 2012 Posted August 11, 2012 Uli, as noted, Sanskrit was not a target language for the Adobe Devanagari font. It is entirely possible that Adobe will want to extend its glyph set coverage for other languages, but I suspect that modern languages such as Marathi, Konkani and Nepali would be their priorities. I'm a little surprised that you reckon only eleven more conjunct ligatures would be needed for classical Sanskrit; I would think more. Fiona confirms that some of the conjuncts inherited from the Linotype set were for transliteration of foreign words; we included them in Adobe Devanagari because the Linotype fonts are a recognised standard among some Indian customers, and to deviate from the set would be to invite criticism and require explanation. Instead, we get criticism from a German Sanskritist and have to explain to him. :) The core sets on which the Adobe Devanagari set are based were the Linotype Devanagari (for compatibility with a perceived 'standard') and that provided by Rupert Snell in his Hindi grammar, which is the most up-to-date source for the modern language. Recently, we had occasion to add some extra variants for one of Adobe's customers who wanted different forms for certain conjuncts. Greaves' comment about 'High Hindi' is worth bearing in mind. You cite खड्ग as an example of the 'dg' conjunct in Hindi, but the common modern Hindi word for sword is तलवार. Clearly Michel is not going to find 'innumerable' conjuncts in Adobe Devanagari that do not occur in Hindi words: he is going to find a precise number, and these will all be conjuncts that also appeared in the Linotype set. Far from being 'invented just for fun', they were included in a set whose size was limited by technical restraints at the request of Linotype's Indian customers, in order to cleanly (without explicit halant) render transliterations of common foreign words in Hindi newspapers. In the context of those newspapers, it is entirely likely that some foreign words would be more common than those Hindi words that contain low frequency conjuncts. [Transliteration of foreign words, especially proper names, was a major factor for newspaper typesetting in the Subcontinent. Full-word ligatures needed to be added to Urdu fonts whenever the Soviet Union had a change of leaders.]
quadibloc Posted August 11, 2012 Posted August 11, 2012 And thus, I take it, the "kspr" ligature saw heavy use in 1995, perhaps? (This was back in the days when there were two world chess champions, a PCA one as well as a FIDE one... of course, there was also the infamous Fischer-Spassky rematch.)
Uli Posted August 11, 2012 Author Posted August 11, 2012 From the scholarly point of view, I think that it is ridiculous to invent unique ligatures for proper names. But from the marketing point of view, I understand that Linotype wants to cash in on selling unique ligatures, when for instance McDonald’s comes along and says: "We shell out a lot of dough, if you invent for us the unique Devanagari ligature "mcd" and also the unique Devanagari ligature "lds" so that our trademark "McDonald's" has two unique ligatures ("Mcd-ona-lds") for advertisements in Indian newspapers."
John Hudson Posted August 11, 2012 Posted August 11, 2012 Uli: From the scholarly point of view, I think that it is ridiculous to invent unique ligatures for proper names. From a scholarly point of view, I agree, but these conjunct ligatures were made at the request of newspaper publishers, who are perennially concerned with column width and word length, and for whom being able to reduce the width of commonly occurring words through ligation was a practical benefit (especially if you bear in mind that newsprint was rationed in India for long periods of the 20th Century). Linotype were not 'cashing in on selling unique ligatures': you are entirely misunderstanding the nature of the commercial relationship between newspapers and typesetting machine manufacturers at that time. Newspapers invested in machines, and the makers of those machines provided fonts according to the specifications of the newspaper publishers. It is not as if Linotype invented these conjunct ligatures and offered them for sale to the newspapers; rather, the newspaper publishers and editors who had purchased the Linotype typesetting machinery requested the addition of these ligatures to their fonts. I think we're at a point in the development of Indic fonts now where we should reasonably consider whether these legacy transliteration ligatures, which were developed at a particular point in time for a particular technology and particular customer base, should be ignored. As I reported earlier, we included them in Adobe Devanagari because of concerns that users in India familiar with the Linotype sets might consider the Adobe set deficient if it did not include them. This sort of thing happens in situations in which the only reference that many people have for judging the quality of a new font is comparison with previous fonts. We encounter this regularly. I think, Uli, that you and Michel are on entirely the right track with frequency analysis: this is a much better basis on which to plan glyph sets, and in line with what we're doing these days (bear in mind that although Adobe Devanagari was only recently released we actually made it more than four years ago).
John Hudson Posted August 12, 2012 Posted August 12, 2012 This is very useful, Michel. I think the problems at the bottom of page 10 and on page 11 must be encoding issues in the dictionary: whenever you see Uniscribe inserting a dotted circle into the leftmost column, that indicates a character sequence considered invalid, e.g. a vowel followed by a virama. These entries should be ignored. I wonder if you might be able to run the same test on the Hunspell Hindi dictionary? And perhaps a combined test of the two dictionaries (having removed duplicate word entries)? I'd very much appreciate having your analysis in a spreadsheet format: when I try to copy and paste from the PDF, some Devanagari letters show up as unknown characters.
Michel Boyer Posted August 12, 2012 Posted August 12, 2012 The latex source is on my blog for that purpose, utf-8 encoded (I used xeLaTeX). https://typography.guru/xmodules/typophile/files/compounds_20120812b.txt It should be as good as a csv if you replace the character & by a comma or a semicolon and remove the \\ at the end of lines. If you want a particular format, send me a message with the specs. I'll have a look at the Hunspell dictionary. Michel
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now