Frequently Asked Questions
Q: What is this?
A: This is a comparison of the letters (or, more technically,
graphemes) used in the 100 most spoken languages using the Latin alphabet.
Q: Ok, but, um... why did you make it?
A: I'm really not sure! I'm interested in languages, and I noticed that in Icelandic uses some of the letters that used to appear in Old English (think Beowulf). I knew the Swedes and the Danes had the a with the circle over it (å), and I wondered what other strange letters and accents the other languages had. I thought, what must it be like for kids in these countries to learn the alphabet? How different is it from mine, and how different is it across languages?
Q: What criteria did you use for inclusion?
A: This is a complicated one. I set out to include any language that:
- Uses Latin as their primary writing system. This eliminates some pretty commonly spoken languages, like Mandarin, Russian, and almost all of the languages of the Indian subcontinent. The Greeks were not pleased, but this had to be an apples-to-apples comparison. Several of these languages also use or have used another script. In order to appear in this list, Latin either has to be the official script, unofficially dominant, or be trending in that direction.
- Has a significant number of speakers. I went with total number of speakers, not just native speakers.
Q: What was your source for all of this information?
A:
Omniglot, which is a tremendous free website, was the primary source. I frequently corroborated their information with Wikipedia, but the number of speakers and most of the alphabet information comes from Omniglot. Technically, the Omniglot pages I used (and link to) are focused on a language's
orthography, which is really the whole set of rules for writing a language, not just its alphabet. Sometimes the alphabet wasn't included, so in those cases I made an attempt to find other sources. If I couldn't, I just defaulted to what was in the Omniglot orthography page.
Ethnologue is also a great resource, but I don't have the pockets for the membership, and the old print version I bought doesn't have any orthography information.
Q: Ok, fine, but what about all of those weird multiple-letter things at the far right? Surely they can't be part of the alphabet.
digraphs, three-letter ones are called
trigraphs, and any sequence of letters that behaves as a unit is called a
multigraph.) Sometimes though, there isn't a lot of information distinguishing the alphabet from the orthography. This is especially true of languages that are not yet widely written. For example, many of the Bantu languages have multigraphs with q and x representing click sounds. So for those (and others like them), I have erred on the side of inclusion, choosing to show all consonant-based sounds, including multigraphs. Multigraphs representing
diphthongs (vowel combinations of multiple letters) have generally not been included except in a few cases where things like ‘aa’ legitimately appeared to be included in the alphabet.
Q: Got it. So what other rules did you have about what letters and multigraphs you included and how you organized them?
A: Basically I tried to organize everything by the base letter (or grapheme), starting with single-letter elements and then re-setting for multigraphs. Also:
- In general, my rule for multigraphs was that they had to be made of two separate characters, so that meant that ligatures made from two characters, like æ, œ and the dutch ij. In these cases, the ligature is listed in the section of letters most closely associated with it. The same logic has been applied to the German letter ß, which, though still part of the alphabet, is often replaced with ss.
- I tried to only include letters with diacritics (accents and other marks) in cases where the letter with diacritic represents an actual entry in an alphabet. Sometimes there are cases when it simply represents a different phoneme (spoken version of the letter), like with the French accents, but I wanted to pepper them in anyway, to give a sense of what you might see in written French. Conversely, some languages (like Vietnamese) are tonal, and some of those have diacritics that indicate how a particular vowel should be intoned. I've not included these, because the potential number of letter combinations would make this already super-wide table even worse.
- Letters not associated to any logical base grapheme -- like θ, ɣ, and ʌ -- are listed just before the multigraphs.
- The character ’, used often to indicate a glottal stop, has not been included as a special letter. It does form part of several multigraphs, however.
Q: What are all these weird Niger-Congo languages?
A: They're some of the most widely spoken languages in Northern Africa, and they're fascinating! It was fun as part of this project to get out of what I now realize is a very Indo-European-centric view. I also stumbled upon the
!Xóõ language, which, with its clicks, has the highest number of distinct sounds in any language. Hearing it spoken is wild, you should check it out.
Q: Who are you, and what qualifications do you have to do this?
A: I'm Spencer Blackman, and I have absolutely no liguistic qualifications at all, I'm just sort of interested. If you're a language expert and I've gotten something wrong, let me know and I'll look into it. In either case, thanks for looking!
Q: I'm a tech nerd and I want to know more about the weird little plugins you used to get this huge ugly table to have filters and frozen columns and stuff.
A: Sure. The main table is powered by
TableFilter, and the locked header and columns are courtesy of
TableHeadFixer. Thanks to those guys for putting great free tools out there.