Product(s)
XMetaL Author Enterprise 6.0
(should also be supported in XMetaL Author Essential 6.0 and XMAX 6.0 when those products are released)
Oops
Instead of providing default settings in the
xmetal60.ini file we decided to leave it up clients to decide on these values (that's actually a good thing), however, apparently we also missed documenting this as well.
Beware: this is going to be a long post ...
Request: I've spent a few hours writing this up and I think it covers most things at this point, but feedback is very welcome. 2009/12/15: I've made some changes directly to the original post after getting feedback from Richard Ishida (it should be easier to read all in one place rather than jumping back and forth between comments).
Background / Legacy Code
The values that XMetaL Author recognizes for spell checking default to legacy values that the product uses internally. These values were invented before xml:lang existed (actually before XML existed due to the fact that it originally came from another product). In most cases they do not match any of the RFC values most people would wish to use with xml:lang. Some of the more common ones happen to match (like EN) but this is just by chance and quite a few others do not.
Standard xml:lang Language Codes
The W3C XML Recommendation defines basic rules for xml:lang (how it must be declared in your DTD or Schema). Also related to this are the standards ISO-639-1, ISO-639-2, RFC4646, and RFC4647 and RFC5646 (the last one actually makes 4646 obsolete). Also related is BCP47 which is the reference preferred by the W3C. BCP47 is a concatenation of several RFCs and though long basically puts everything in one place.
Basically, ISO-639-1 consists of two letter language codes (that many people may recognize) and ISO-639-2 uses three letter codes. The RFCs describe how the full code should be constructed, and codes may include language, region, script, 'variants' and other things, including rules on letter casing and separator characters like "-". If you need to read one document please read BCP47.
We've tried to design our spell checking support for xml:lang to be as flexible as possible. This means you may opt to specify any "standard" value or you may use other values (perhaps from an industry or other standard you may wish to follow), and you may specify multiple values in the INI file for a particular spell checking language (keeping in mind that the value for the xml:lang attribute in the XML source itself can only have one value and will therefore either match one INI setting or none).
This means you should decide which values you will use based on all of your requirements, from external tools, XSLT transforms, specifications, etc, first. Then configure XMetaL Author's spell checker to understand the values you are working with. This is the approach I would recommend: let your requirements drive the values you use, but whenever possible stick to the most current W3C and associated standards.
Table of INI Variables Supported by the Spell Checker for xml:lang
Following is the complete list of currently supported spell checking languages. It includes the INI variable name (prefixed with "WT") that controls the values you wish to have recognized for the xml:lang attribute, the English name for the language, and the corresponding ISO-639-1 and ISO-639-2 value(s) that I think would most commonly be used for that language by most people working with xml:lang.
The values listed here for ISO639-1 and ISO-639-2 are suggestions only, though they were taken directly from those specs. Be sure you consult with other people in your organization before deciding on exact values as other tools and processes may have specific requirements.
INI Variable Name | English Name for the Language | ISO-639-1 | ISO-639-2 |
WT_AFRIKAANS | Afrikaans | af | afr |
WT_CATALAN | Catalan | ca | cat |
WT_CZECH | Czech | cs | ces, cze
Note: Both codes are considered synonyms. |
WT_DANISH | Danish | da | dan |
WT_DUTCH | Dutch | nl | dut, nld
Note: Both codes are considered synonyms. |
WT_ENGLISH | English | en | eng |
WT_FRENCH | French | fr | fra, fre
Note: Both codes are considered synonyms. |
WT_GALICIAN | Galacian | gl | glg |
WT_GERMAN | German | de | ger, deu
Note: Both codes are considered synonyms. |
WT_GREEK | Greek | el | gre, ell
Note(1): Both codes are considered synonyms.
Note(2): Ancient Greek (before the year 1454) is "grc" and is not supported by the spell checker. |
WT_ISLANDIC | Islandic (Icelandic) | is | ice, isl
Note: Both codes are considered synonyms. |
WT_ITALIAN | Italian | it | ita |
WT_NORWEGIAN | Norwegian | no | nor |
WT_PORTUGUESE | Portuguese | pt | por |
WT_RUSSIAN | Russian | ru | rus |
WT_SLOVAK | Slovak | sk | slo, slk
Note: Both codes are considered synonyms. |
WT_SESOTHO | Sesotho (Sotho, South Sotho) | st | sot |
WT_SPANISH | Spanish | es | spa |
WT_SWEDISH | Swedish | sv | swe |
WT_SETSWANA | Setswana (Tswana) | tn | tsn |
WT_TURKISH | Turkish | tr | tur |
WT_XHOSA | Xhosa | xh | xho |
WT_ZULU | Zulu | zu | zul |
WT_ENGLISH_AUSTRALIAN | Australian English | en-au | eng-AU |
WT_ENGLISH_CANADIAN | Canadian English | en-ca | eng-CA |
WT_ENGLISH_BRITISH | British English | en-gb | eng-GB |
WT_ENGLISH_US | United States English | en-us | eng-US |
WT_FRENCH_CANADIAN | Canadian French | fr-ca | fra-CA, fre-CA |
WT_GERMAN_SWISS | Swiss German | de-ch | deu-CH, ger-CH |
WT_PORTUGUESE_BRASIL | Brazilian Portuguese | pt-br | por-BR |
WT_SPANISH_AMERICAN | American Spanish | es-us | spa-US |
WT_NO_LINGUISTIC_CONTENT | Do Not Spell Check (treat content as a non-spellcheckable language) |
| zxx |
INI Settings Examples
The values listed below for ISO639-1 (two letter codes) and ISO-639-2 (three letter codes) are suggestions only, though they were taken directly from those specs. Be sure you consult with other people in your organization before deciding on exact values as other tools and processes may have specific requirements.
If the xml:lang code (the value portion of the INI variable) does not include the particular value you need just replace the existing one, or append your additional value to the end after adding a semicolon.
In the dialects section, two letter country codes are appended to the language code to make up "dialects" which are specific regional variances in languages, however (again) these values are here as examples only and it is up to you to decide what is correct for your organization's purposes.
#SPELL CHECKER LANGUAGES FOR xml:lang ATTRIBUTE VALUES
WT_AFRIKAANS=af;afr
WT_CATALAN=ca;cat
WT_CZECH=cs;ces;cze
WT_DANISH=da;dan
WT_DUTCH=nl;dut;nld
WT_ENGLISH=en;eng
WT_FRENCH=fr;fra;fre
WT_GALICIAN=gl;glg
WT_GERMAN=de;deu;ger
WT_GREEK=el;ell;gre
WT_ISLANDIC=is;ice;isl
WT_ITALIAN=it;ita
WT_NORWEGIAN=no;nor
WT_PORTUGUESE=pt;por
WT_RUSSIAN=ru;rus
WT_SLOVAK=sk;slk;slo
WT_SESOTHO=st;sot
WT_SPANISH=es;spa
WT_SWEDISH=sv;swe
WT_SETSWANA=tn;tsn
WT_TURKISH=tr;tur
WT_XHOSA=xh;xho
WT_ZULU=zu;zul
WT_NO_LINGUISTIC_CONTENT=zxx
#SPELL CHECKER DIALECTS FOR xml:lang ATTRIBUTE VALUES
WT_ENGLISH_AUSTRALIAN=en-AU;eng-AU
WT_ENGLISH_CANADIAN=en-CA;eng-CA
WT_ENGLISH_BRITISH=en-GB;eng-GB
WT_ENGLISH_US=en-US;eng-US
WT_FRENCH_CANADIAN=fr-CA;fra-CA;fre-CA
WT_GERMAN_SWISS=de-CH;deu-CH;ger-CH
WT_PORTUGUESE_BR=pt-BR;por-BR
WT_SPANISH_AMERICAN=es-US;spa-US
Note(1): If your xml:lang value's language code is not listed in the INI file then the fallback functionality of the spell checker is to use the default language as selected in the spell checker's Options dialog (set from within the main spell checker dialog, launched via F7).
Note(2): Letter casing (uppercase vs lowercase) is ignored with regard to xml:lang (ie: "EN-US", "en-us" and "en-US" are considered equivalent).
Note(3): zxx has been recommended to represent text that should
not be interpreted as a standard human language. When it is used as set above XMetaL will skip over any element with xml:lang set to this value and not spell check it at all. This is useful for sections of programming code or perhaps other uses. As with all the other values here you may configure
WT_NO_LINGUISTIC_CONTENT to whatever you like if "zxx" does not meet your needs (provided the value meets the xml:lang attribute value rules in the W3C XML Recommendation).
Note(4): Regardless of any settings in the INI file, when an xml:lang attribute value is set to be an empty string value, such as
xml:lang="" that element will be skipped and not spell checked. This behavior is essentially equivalent to #3 above from the point of view of the spell checker (though it does have a distinct difference in meaning which is actually "no language" as opposed to "non human language"). However, XMetaL Author purposely makes it difficult for users to set an attribute value to be an empty string using the Attribute Inspector, so to do this you must either have implemented special code in your XMetaL Author customization to allow users to accomplish this, or you must set the value using PlainText view.
Note(5): The values you use in the INI file should be unique to each setting. Meaning that if you specify the same value in more than one INI variable unexpected behavior will occur. Please don't ask what the behavior might be, just avoid doing this.
Note(6): Do not specify the same INI variable multiple times. This should not be an issue as far as XMetaL is concerned, but you may not see the results you expect in this case. Again, please don't ask what the behavior might be, just avoid doing this.
The Shipped xmetal60.ini File
The following setting is included with the
xmetal60.ini file:
WT_ENGLISH_BRITISH=EN-UK;EN-GB
This can be safely removed if desired. It should be removed if you will be specifying your own WT_ENGLISH_BRITISH settings elsewhere in the INI file to be sure there are no conflicts. Note however, that the default internal (legacy) code of "EN-UK" will be recognized if this variable is not present and set to another value.
How the Auto-Switching Works
The spell checker, whether you use the spell checking dialog (F7) or use the new 6.0 release's "check spelling while typing" option (see Tools > Options) aka: "red squiggles", XMetaL Author switches to the language specified in the xml:lang attribute when entering an element containing PCDATA (text).
If that element in turn has a child element with a different xml:value the spell checker changes to that corresponding child element's value. When no xml:lang value is set for an element it inherits the value of the parent element or nearest ancestor (standard xml:lang rules).
If such an element has no ancestors with an xml:lang value set then the default value for spell checking (as set in the spell checker's Options dialog) is used.
So, assuming you have all the settings above in your INI file and your default language is set to "English-US" in the spell checker's Options dialog, when entering a given element with one of the following xml:lang values the spell checker should do the following:
- xml:lang is not set --> XMetaL begins walking up the document tree checking for parent elements with an xml:lang value set (and uses the nearest). If it fails to find any then the value as set in the spell checker Options dialog (in this case "English-US") is used.
- xml:lang="en" --> All English spellings (US, UK, CA and AU) are considered correct (both "colour" and "color" are considered correct).
- xml:lang = "en-US" --> English-US is used (ie: "color" is correct, "colour" is incorrect)
- xml:lang = "en-CA" --> English-CA is used (ie: "colour" is correct, "color" is incorrect)
- xml:lang="" --> no spell checking is performed (element is skipped)
- xml:lang="zxx" --> no spell checking is performed (element is skipped)
External References
No hay comentarios:
Publicar un comentario