Getting Started & Resource Guide

What's in this Wiki

What is Tâigí?

Tâigí (Tâioânese, or just Tâi) is one of a number of local languages spoken in Taiwan, referred to in English variously as Taiwanese Hokkien, Holo Taiwanese, Taiwanese Min Nan (Ban Lam), Taiwanese Southern Min, or simply Taiwanese or Taigi.

While Tâi and other Hokkien variants share a common ancestor that originated in modern-day Fujian Province, China, Tâi has developed somewhat separately over the past 400 years due to the geographical nature of Tâioân as an island, as well as the influence of various non-Chinese cultures. Tâioân has been occupied at various times by Ming, Dutch, Spanish, Qing, Japanese, American*, and Chinese forces, and Tâi has absorbed bits and pieces from all of the above.

Perhaps most importantly, Tâi is considered by its speakers to be a local, Formosan language, which sets it apart from Mandarin as a foreign language, brought to Tâioân largely by the colonial ROC government.

Tâi was the lingua franca in Tâioân throughout most of the past 400 years, and it was only by major oppression from the Chinese Kuomintang, including giving beatings and fines for speaking Tâi, that Mandarin has more or less replaced Tâi as the lingua franca in recent years.

Despite the oppression, Tâi continued to be spoken in homes and private gatherings, and remains the native language of a majority of Tâioân's population today. Since the end of the Martial Law period, there have been ongoing revitalization efforts to ensure not only that Tâi is not forgotten, but that it can be taught in schools and used as a matter of course in business, government and throughout daily life.

* Specifically, administrative military authority of Tâioân was granted to Chiang Kai-shek by the US after WWII; US forces have never occupied the islands directly.

Getting Started

Due to the history mentioned above, Tâi has to date not yet attained widespread standardization, in either spoken or written forms. This has led to a large number of divergent dialects, vocabulary patterns, writing standards, and so on. For a new learner, the variety and choices can seem absolutely overwhelming.

Below are some brief explanations about these various alternatives, and recommendations following.

Spoken Tâi

There are are many alternative dialects, pronunciations, and vocabulary words used in different areas of Tâioân.

These dialects are often categorized in a simplified fashion, such as:

  • North (泉州, Choân-chiu)
  • South (漳州, Chiang-chiu)
  • Coastal (海口, Hái-kháu)

but such categorizations are misleading. While these dialects indeed originated in different areas of Hokkien, they have mixed and mingled in Tâioân for centuries leading to many more variants, sub-variants, and combinations of all of these.

For a learner, the best choice is to choose a specific "target dialect" that will be most useful for you in daily life. That may be the one spoken by your family and relatives, or one that is widely understandable for most native speakers.

Written Tâi

As of today, there is no single agreed upon standard script or orthographic system for writing Tâi. Listed here are some of the more popular existing orthographies, this list is not comprehensive. This list also includes scripts which are (or were) mainly used as phonetic aids, such as Japanese Kana, Bopomo, and most of the romanizations apart from POJ. Examples of full texts written in such scripts are extant, but rather uncommon. In terms of the "working script" of published texts such as books, periodicals, newspapers, and so on, POJ and Hànjī constitute the overwhelming majority.

Pe̍hōejī (POJ) “Plain Letters”

By far the largest body of readily available written Tâi uses Pe̍hōejī (POJ), immediately recognizable by use of the letters ch and the superscript . POJ was more or less a standard for around ~100 years, and its usage has spread far beyond the initial use in the Church (for translating the Bible into Tâi). Many Tâi language activists support keeping POJ as the standard, and although the ROC Ministry of Culture has recognized POJ as part of its documentary heritage program, the ROC Ministry of Education to date has not acknowledged its historical significance or accepted it as an alternative "officially recognized" writing system.

POJ is also frequently called Lô-má-jī ("romanization") or Tâilô ("Tâi romanization").

Hànjī

"Square characters" (sì-kak jī) of the kind found in Japanese, Mandarin, and other East-Asian writing are often ambiguously referred to as Chinese characters. Tâi Hànjī are no more Chinese than Japanese Kanji. Hànjī were used throughout Tâioânese history for writing Tâi, but as the language itself was not used in "official" capacity by government, the Tâi Hànjī system was never fully standardized (although it was well on the way to becoming so before the ROC era).

In recent years, the ROC Ministry of Education has decided that Tâi Hànjī are unbefitting, mainly due to the education system being fully in Mandarin. As part of the process of sinicizing Tâi, the ROCMOE has released a small number of "recommended Taiwan Min Nan [Chinese] characters" (PDF) to use for writing Tâi. However, on this website and among many native speakers, including many of those posting on this site, you will find that the "native" Tâi Hànjī are preferred.

The Tâi Jī Chhân documents a large selection of such native Hànjī.

See Dictionaries for more information.

Other romanizations

The ROC Ministry of Education developed its own standard based on POJ, officially called the "Taiwanese Southern Min Romanization Phonetic Spelling System" (台灣閩南語羅馬字拼音方案 (Wikipedia)), often referred to as KIP (the abbreviation for MOE in Tâi), Bânlô, or Tâilô. (Since many people also refer to POJ as Tâilô, and Bânlô is somewhat politically charged, we simply refer to it as MOE or KIP.) There are not many differences between KIP and POJ, but the differences are significant enough to make KIP much less comfortable than POJ for reading and typing full texts written in romanization. KIP is often used only as a PinYin system for phonetic representation accompanying Chinese characters, which is the preferred writing system by the ROC Ministry of Education.

There are many other less common romanization systems, many invented by individuals and never receiving widespread use apart from personal writings (or in recent times, blogs, Facebook, etc). It's not very useful to invest much time in these as they are generally quite obscure.

Japanese & Korean

During the Japanese colonial period, a system for writing Tâi phonics with Japanese Kana were developed. While this system was mostly used for teaching materials aimed at Japanese learners of Tâi, and while it can still be found in some places today, it is now far less common than POJ or Hànjī.

A modified form of the Korean Hangul alphabet was also created for writing Tâi, although to my knowledge it never attained any kind of real world usage. It's quite a nice system though, and very easy to read if you know Hangul already.

Bopomo

Various versions of Bopomo, sometimes called Taiwanese Phonetic Symbols (TPS) or Hong-im Hû-hō (Hongim) have also been proposed. These systems modify Mandarin's Zhuyin phonetic system with symbols added for sounds that occur in Taiwanese, but not Mandarin. They can be found in some dictionaries and children's textbooks.

What should I learn?

If you aren't sure what variety of spoken and written Tâi you want to learn, my advice is to choose something that is widely used in the real world.

For speaking, I suggest the "Southern" (Ko~Pîn area) dialect. This is likely the most widely understood between native speakers of different dialects, and has the greatest number of native speakers in Tâioân today. Learning this dialect will ensure that you are understood no matter who you talk to, and you should not have much trouble understanding different accents as there are some general patterns you will learn through your studies.

For learners, for native speakers who are not yet literate, and for general purpose use, I strongly recommend starting with Pe̍h-ōe-jī (POJ). It has the widest availability of materials covering the longest period of time throughout Tâioân. It was also used in other areas around Southeast Asia that speak variants of Hokkien, as well as in Tâioân for writing Hakfa, which may be convenient for future studies. Furthermore, the differences between POJ and KIP are easy to understand, and if you can use one you can essentially use both (but I do recommend becoming "comfortable" in POJ).

For those who prefer Hànjī (Chinese characters), I recommend the 台字田 (Tâi Jī Chhân). While Hànjī have gained in popularity in recent years relative to the Tâi written in the early-to-mid 20th century, they remain a complex and politicized topic. You will likely face criticism or skepticism for your choice of characters no matter which ones you use, but the Tâi Jī Chhân provides a number of tangible benefits:

  • It takes a principled approach to selecting only historically documented Hànjī used for writing Tâi specifically, with thorough explanations
  • It notes all common variants and alternates (also with explanations), so that users can make an informed decision between multiple options
  • It takes a similar approach to Hànjī as Japanese, categorizing them as: phonetic loans, semantic loans, Hànbûn, etc. This approach is very well suited to Tâi.
  • The characters themselves are generally simple, easily recognizable characters. This makes input relatively easy using any Chinese character input method, as well as reducing Unicode & font related problems, especially as compared to e.g., the MOE's recommended characters which are often not available in input methods or fonts.
  • The character set gives a very distinct styling to a body of text which uses it, such that it is instantly recognizable as Tâi (rather than Japanese, Simplified or Traditional Mandarin, Cantonese, or any other language that uses Chinese characters). Many people who write with the MOE characters leave some words in Romanization for this purpose, as the MOE's recommendations are less easily distinguishable from Mandarin.

Self-Study Resources

Tâi Phonics & Spelling Quickstart Guides

Maryknoll Textbook Series

For English speakers, the best textbook to get started is the old series from Maryknoll, a Catholic organization that has a fairly comprehensive set of resources for teaching local languages in Tâioân, including both Tâi and Hak. These books are available online for free on NTCU's Memory of the Written Taiwanese website. Go to the search function (2nd button down on the left) and search for "Taiwanese Book". I have also put these images together as PDFs:

Some notes:

  • This book is a scan which includes handwritten notes in Chinese - try to ignore those.
  • Much of the vocabulary is Catholic or religious in nature - ignore them if they're not relevant for you. The rest of the content is extremely practical and valuable.
  • This series teaches the "Tâitiong" (台中) pronunciation, so keep in mind that you may want to modify some pronunciations slightly as you learn more and settle on a dialect to learn.
  • Hard copies are available for purchase in Tâioân at the Maryknoll language centers in Tâipak (台北) or Tâitiong (台中).

Other Textbooks & Coursebooks

Dictionaries

  • (Paperback) 張裕宏 - TJ台語白話小詞典 - The best all-around Tâi dictionary, complete with word etymologies, alternative pronunciations, example sentences, synonyms, and altogether tons of information in a compact and easy-to-carry paperback book. (Definitions in Mandarin)
  • ChhoeTaigi - The most complete and reliable online Tâi dictionary available today. A "multi-dictionary" that includes both Maryknoll and Embree Tâi-English dictionaries, as well as around a dozen other Tâi-Mandarin dictionaries. Search function works in: POJ or KIP with tones as numbers or diacritics, Chinese characters / Mandarin, and English. Make sure you tick the "相關--ê" box or you will get very limited results.
  • 台字田 (Tâi Jī Chhân) - Not a dictionary, as such, but a well-researched and documented collection of Hànjī historically used in written Tâi, based on a vast array of sources. This is the recommended character set for those wishing to write in native Tâi Hànjī that have not been significantly influenced by the norms of written Mandarin.
  • 臺灣閩南語常用詞辭典 (MOE Dictionary) - The Ministry of Education's online dictionary. Don't forget to check the numerous "附錄" for tons of interesting items, like lists of Japanese loanwords and regional pronunciation tables. Also included in ChhoeTaigi. (in Mandarin)
  • 萌典 - The "unofficial" version of the MOE dictionary. The same exact dictionary as above, but with a prettier interface and links between Tâi, Mandarin, Hak, English, and occasionally German or French. Also has a mobile app (search for "萌典").
  • iTaigi - A crowdsourced Mandarin-Tâi dictionary. Note that the quality standards are generally low for iTaigi, so you should not rely on it for accurate information. The contents are also included in ChhoeTaigi.

Online/Social

Tâi Literature

These resources are especially useful for intermediate or advanced learners who need real-world material such as books, periodicals, newspapers, etc.

Bookstores

  • 台灣兮店, a bookstore in Taipei near NTU with a massive catalog of Tâi (and other local language) books, also available for online order. [Unrelated note: here you can see an example of "variant" Hànjī in action. The word "Ê" is often written as 个 or 兮, in Japanese it possibly evolved into ヶ, and on the store's sign you will see its name written as 「入下」 arranged vertically.]

Digital & Computing

Typing (Input Methods / IMEs)

Fonts

Chia̍h pá ·bē! includes web fonts for all popular Tâi lettering, including romanizations, Tâi characters, and ROC characters. Hopefully, you won't have any issues seeing Tâi written on this site, but as most places do not provide proper font support for Tâi, other websites or applications are hit-or-miss.

Font Support

Tailingua has a simple test page to check for properly installed fonts. (As well as plenty of other resources that you should feel free to browse!)

Most fonts are missing a few of the special symbols needed to display Lô-má-jī properly. In particular:

  • 8th tone - a vertical line centered above the vowel, as in the first "e̍" of Pe̍h-ōe-jī. Test characters: A̍ E̍ I̍ O̍ U̍ a̍ e̍ i̍ o̍ u̍
  • o͘ - an o with a dot on the upper right corner. In KIP, this is written as "oo" and does not have any font support issues. Test characters (duplicated due to 2 Unicode variants): O͘ Ó͘ Ò͘ Ô͘ Ō͘ O̍͘ o͘ ó͘ ò͘ ô͘ ō͘ o̍͘ / Ó͘ Ò͘ Ô͘ Ō͘ O̍͘ ó͘ ò͘ ô͘ ō͘ o̍͘
  • The superscript ⁿ of POJ is generally supported in modern fonts.

Downloads

  • Justfont - A Taiwanese font producer that provides the high quality, attractive open source font Hún-îⁿ (粉圓) which fully supports both the MOE's recommended Chinese character set and Pe̍h-ōe-jī.
  • Tauhu is another freely available font which includes the MOE's recommended Chinese characters. It is based on the widely used open source Adobe Source Han Sans, also repackaged by Google as Noto Sans CJK, the default system font in many modern applications including Android phones. (Chia̍h pá ·bē! uses a further customized version of Tauhu, which includes additional Tâi Hanji.)
  • aiong/POJFonts - A collection of open source fonts modified to fully support Pe̍h-ōe-jī, as well as scripting tools for an easy way to modify nearly any font to support Pe̍h-ōe-jī.
  • Hanazono Fonts - A massive Chinese character font that, while not the most beautiful font out there, supports pretty much every Hànjī found in Unicode.
7 Likes

This is a useful guide!

So I've been using many resources pointed out here. I am coming to realize that ChhoeTaigi and 台字田 are different, modern. ChhoeTaigi even supports RegEx-based search!

But for whatever reason, Google does not index these two sites, unlike it does iTaigi and twblg.dict.edu.tw.

This is very unfortunate, because many people will search for something like this: 常見 台語, and they won't find ChhoeTaigi and 台字田 among results.

If you know these people, we need to help them get their entire word/phrase lists indexed by Google.

1 Likes

@Hêbí kám ū hoat-tō͘ kā ChhoeTaigi ê lāi-iông hō͘ Google ê ti-tu chia̍h lòe?

台字田 also supports basic fuzzy search - at least the asterisk. Also supports Tionghoa Hanji to find the conversion for Taiwan Hanji.

1 Likes

Chit-má ū chhiàⁿ lâng têng khai-hoat bāng-chām, beh kái chò SSR/SEO. Sî kàu to̍h ē sán-seng chu-sìn hō͘ Google lia̍h. M̄-koh bô hiah kín, góa chia khiàm lâng chò ê khang-khòe chiok chē. Bô kàu gia̍h ê chu-goân, sî-kan thang chò hiah kín. Ǹg-bāng ta̍k-ke lâi chi-chhî.

4 Likes

Is it possible to create a simple sitemap_index.xml on your site which expands to index all words and phrases? Your website is already indexed by Google. It just doesn't know about the millions of pages you can dynamically generate. If you pre-create several levels of sitemap xml files, then Google will eventually build up enough data to answer any search queries. It may mean that your server will be hit by Google with lots of dynamic requests.

I am no SEO expert. I only dabble in website building because I wanted to publicize chopstick grips. Wordpress happens to already build these sitemaps for you. Some of them are even semi-dynamic (e.g. the category pages). But none of them are truly dynamic the way I envision you will need (e.g. you'll have to fake a page for every word and ever phrase). This is the marcosticks sitemap: https://marcosticks.org/sitemap_index.xml

I think sitemap is the cheapest way for you to get your website out to the entire Taiwanese population. Once someone ends up on your site from Google, they will find it so much more useful than all other ones (except for the voice file feature which I know you will eventually add as well). I think you will find a lot more donors when a lot more people use it and talk about it.

Again, maybe I don't know what I am talking about. And you probably already know about all of these. It just seems to me that most Taiwanese folks trying to find out about specific Taigi words are going to search for: 常見 台語, etc. So you will need to make Google think you have a specific page for 常見, with specific SEO key summaries, so that it can show salient parts of that (I know, a dynamic) page to users.

I see that Aiong has already plugged ChhoeTaigi at www.reddit.com/r/ohtaigi a while back. You may be able to find SEO-savvy Redditors to help there.

Sitemap sī tiāⁿ iōng ê bāng-chí, bo̍k-lio̍k.
Nā ta̍k-ê sû-tiāu lōe-iông lóng ài to̍h ài chò SEO siong-koan chhú-lí.
Chit-má bāng-ia̍h to-sò͘ lóng sī tōng-thài sán-seng--ê, to̍h ài kái iōng SSR.

Tōng-thài sán-seng chiū-sī góa só͘-kóng ê "dynamic pages" (including SSR server-side rendering).

This page talks about Meteor apps and SEO. They seem to discuss mostly the speed issues as Google crawlers experience. Most use https://prerender.io/ to present dynamic contents to humans, but pre-rendered/and possibly pre-cached html to crawler bots.

Like I said, I am not a web engineer, and barely know about SEO. But it seems to me that you will still need a sitemap that points to pre-rendered catalogs of all words and phrases in your dictionaries, if only in the form of URLs with query params filled out with all possible words. How else will the crawler know to fetch those dynamic pages? Because the crawler has timeout and other limits, you'll probably need to break up your sitemaps into multi-tier hierarchies, to let the crawler index them piece by piece over time. Or at least that's how I would do it, if I were to build them without using existing SEO tools.

Cheers. I am going to stop talking about a field I am not familiar with. Your people will already know much better. I simply don't want to see such great site go unindexed : )

1 Likes

Kám-siā, gún í-keng ū teh chìn-hêng--ah.

2 Likes

Under Other romanizations,

The ROC Ministry of Education developed it's own standard based on POJ

the it's should be its.

1 Likes

Under Pe̍hōejī (POJ) “Plain Letters”, there are two “it's” s which should also be its. The abovementioned mistake has not been corrected either.

Thanks, all fixed.

1 Likes