The Pinyin System - How to Pronounce Mandarin Chinese

Mandarin Chinese, known in Chinese as Putonghua ('poo-tong-hwah', literally 'common speech'), is the official language of the People's Republic of China, and is the first language of more than 50% of the population. With more than 900 million native speakers, it is the most common language in the world as a first language, and has almost three times as many speakers as the next most common language, Spanish. There are other types of Chinese - Wu, Hui, Gan, Ziang, Min, Hakka, Yue (Cantonese) and Ping are spoken in the southeast of the country, while Jin is spoken in a small area in the northwest. These are generally referred to as dialects although they are sufficiently different that speakers of different dialects wouldn't necessarily be able to understand each other. In the very north of the country and in most of the west, unrelated non-Chinese-family languages are spoken such as Mongolian, Tibetan, Uyghur, Kazakh and Kyrgyz.

In the rest of this entry, the language will be referred to just as 'Mandarin' rather than 'Mandarin Chinese'. When we say 'Chinese', we mean all the Chinese family of languages.

The consistency of Mandarin over more than half the country compared with the large number of dialects of Chinese in the southeast is thought to be because of the terrain - the flatness of northern China allows for easy travel and migration; people brought the language with them. In southeastern China it is very hilly so people didn't move around as much and dialects arose.


Chinese is written down using a complex system of about 4,000 Chinese characters but there are also a number of ways of writing the language in the Western alphabet. The most common one is known as 'pinyin'. It is the method favoured by the governments of Mainland China, Taiwan and Singapore. Pinyin uses all the letters of the English alphabet except 'v', but some of the letters don't correspond to the same sounds as in English so there are some surprises. Pinyin also uses four accents to indicate the four tones of Mandarin, which are an important part of the language.

Pinyin was introduced by the government in the late 1950s with the intention of eventually replacing the complex Chinese character system, which is a huge barrier to literacy. This attempt appears to have had limited success. While all schoolchildren learn pinyin, nobody seems to use it. Street signs are written in both pinyin and Chinese characters but menus in restaurants, signs over shop doors and informative notices all still use only Chinese characters.

It's still worth learning pinyin, if only to learn how to pronounce the place names. This Entry provides a few useful phrases as examples. There's a bigger collection of phrases, with a rudimentary description of the pronunciation, in the Entry Handy Mandarin Chinese Phrases.

Simplified Guide

Chinese has some subtleties of pronunciation which won't be easy for English speakers to even hear, never mind to pronounce, so this Entry will first present a simplified guide which ignores these features, and then will explain them for those who are interested. Don't worry, you'll get by with the simplified version.

Mandarin is quite easy to pronounce for English speakers, much easier than pronouncing French, Danish or most of the Eastern European languages, because virtually all the sounds of Mandarin are also sounds in English. Admittedly, the tone system takes a bit of getting used to, but it is not actually difficult.


Each consonant is represented by a single letter except for the combinations ch, sh, zh and ng which are each two letters but represent a single consonant. Mandarin does not string consonants together - each syllable starts with a single consonant, and the only consonants that appear at the end of a syllable are n, ng and sometimes r. There's no equivalent of the English consonant clusters such as 'strap' (three consonants in a row at the start) or 'twelfths' (four consonants in a row at the end: l-f-th-s).

For the purposes of this simple guide,

  • ch and q can be treated as the same sound;
  • sh and x can be treated as the same sound;
  • j and zh can be treated as the same sound.
b, ch, d, f, j, k, l, m, n, p, s, sh, t, wall pronounced as in English.
cas ts in pets. This applies even when the c occurs at the start of a word.
gas in the word garden, a hard sound. Never a 'j' sound like in gem.
ha strong 'h' sound
ngas ng in singer in standard English pronunciation, not the ng with a hard g after it that occurs in finger.
qas ch in English (chips, China etc.)
ra strong r, more so than an Irish, American or West Country English accent. Roll your tongue back and put the underside of it close to the roof of your mouth. The tongue buzzes slightly to make what is almost a 'z' sound.
xas sh in ships, shiner etc.
yas y in yellow. Never treated as a vowel like in fly, rhythm.
za dz sound like in the word adze or lids
zhas j in English


Mandarin has only six or seven vowel sounds (as compared with the 11 or 12 in English) so it is straightforward.

abroad ah as in Italian, French, German or Harvard American
ein most cases it is an uh sound like the -er in driver as pronounced by an English person.
eithe combination 'ei' treats the e as an eh so that the combination is eh + ee, as 'ay' in day.
erin the 'er' combination, it is an ah sound to produce 'ar' (imagine a Pirate 'arr' rather than the Queen saying 'are').
i after z, c, s, zh, ch, sh, ruh sound
i anywhere elseee sound as in lean, keen, machine
oas in pot, hot
uoo as in moon
üas u in French tu or the German ü. Put your mouth into the position for saying u and say an ee. If you can't manage this, say the 'u' of the English word tune.

Vowel combinations

While Mandarin doesn't allow combinations of consonants, it does combine vowels. To pronounce a sequence of vowels, just say the sounds of the individual vowels in the order they are written:

  • ai = ah + ee (like English igh in high)
  • ao = ah + oh (like English ow in cow)
  • ei = eh + ee (like English ay in day)
  • iu = ee + oo (like English yoo in yoohoo, tune)
  • ia = ee + ah (like English ya in yam)
  • iao = ee + ah + oh (like English yow in yowling)
  • ie = ee + eh (like English ye in yes)
  • ou = oh + oo (like English oh)


All Chinese languages are tonal - the pitch and the way it changes affect the meaning of each syllable. English uses tone to indicate things like questioning or surprise. You didn't know that? You did know that! Chinese is more basic - the tone of each word is a built-in part of the meaning of the word.

For example, the word ma can mean five different things depending on which tone is used. This is illustrated in the following table.

Mandarin has five tones, or in many descriptions it has four tones plus a neutral tone. There's a technical reason for saying that the neutral tone is not a fifth tone, but for our purposes we can treat it like one.

Tone NameDescriptionWrittenExample
1high tonestaying at a continuous high pitchindicated by flat accent: āmother
2rising tonefrom medium to high pitchindicated by rising accent: áhemp
3low or dipping tonesee belowindicated by small u above letter: ăhorse
4falling tonefrom high to lowindicated by falling accent: àscold
neutralcontinuous pitch at low or medium level, with no stressindicated by no accent on the vowelmaquestion mark

The Third Tone

The most complicated of these is the third tone, known variously as the low or dipping tone. This is because it varies depending on what other syllables are around it:

  • If the syllable is on its own as a single-syllable sentence, then a dipping tone is used. Start at a medium-low pitch, drop down to low, then rise up to medium high. The dip in the middle gives the tone its name.

  • If the syllable comes before one which is also third tone, then the first syllable uses a rising tone.

  • If the syllable comes before a syllable with any other tone, or if it is the last syllable in a sentence, then a low tone is used - this starts medium and falls to low. It sounds somewhat similar to the normal fourth tone (falling), but starts much lower.

The third tone is often accompanied by the gravel voice known as 'vocal fry'. You'll hear this in South Californian speech - a sort of croaking sound which accompanies low-pitched speech. The Kardashian Family provides good examples of this. Using this croaking with the third tone is a way you can tell it from the 'falling' tone.

Some examples of the third tone:

Wŏ - means 'I' or 'me' and is normally pronounced with a low tone.

Nĭ hăo - means 'hello'. Each syllable is a third tone, so the first is pronounced rising, and the second with a low tone.

Some Useful Phrases

Nĭ hăonee howhello (literally 'you good')
Nĭ hăo manee how mahow are you?
Wŏ hěn hăo, nĭ ne?wo hun how nee nuhI'm fine, and you?
Wŏ hěn hăowo hun howI'm fine
Wŏ shì yīngguó rénwo shuh ying-gwo runI am English
Wŏ shì měiguó rénwo shuh may-gwo runI am American (literally 'I'm a Beautiful Country person')
Nĭ huì shuō yīngyŭ manee hoo-ee shoo-oh ying-yoo mado you speak English?
Xiè xièshee-uh shee-uhthank you
Qĭng wénching wunexcuse me (to attract attention)
Hóng jiŭhong jee-oored wine
Pí jiŭpee jee-oobeer
Bié guăn wŏbee-uh gwan wogo away, leave me alone
Wŏ de qìdiànchuán zhuāngmăn le shànyúwo duh chee-dee-an-choo-an joo-ang-man luh shan-yoomy hovercraft is full of eels

Advanced Guide

You'll get by with the above pronunciation guide, but if you want to improve your pronunciation and attempt to sound even vaguely like a native, you'll have to master some subtleties.

  • The letter b does not in fact represent the same sound as in English. Instead, it is a p sound without aspiration. When we pronounce p, we put a small 'h' sound after it. This is called an aspirated p. Chinese people also use this sound and it is written in pinyin as p.

    But the Chinese can also make a p sound without an 'h' after it - this is called an unaspirated p and is written in pinyin as b. It is very difficult for English speakers to hear the difference between these two sounds. Because our b sound, which is a voiced p with vocal chords providing sound, is also unaspirated, the Chinese unaspirated p sounds a bit like a b to us, and our b sounds a bit like an unaspirated p to the Chinese. This is why pinyin uses a b for it. As said above, you'll get away with treating it as a b, but the correct pronunciation is an unaspirated, unvoiced p.

    Listen to the letter P in the English words pit and spit. The p in pit is aspirated, while the p in spit is not. The p in pit would be represented by the pinyin letter p, while the p in spit would be represented by the pinyin letter b.

  • The same distinction applies between t and d - t is voiceless and aspirated as in English. The letter d in English represents a voiced, unaspirated t, while in Pinyin it represents an unvoiced, unaspirated t.

  • The letters j, x, q, zh, sh and ch are different from English and all different from each other:

    Pinyin zh, sh and ch are similar to the English j, sh and ch but with the tip of the tongue placed further back - instead of being just behind the top teeth as in English, it is right back at the flat part of the roof of your mouth.

    Pinyin j, x and q, on the other hand, are similar to the English j, sh and ch, but with the tip of the tongue just behind the lower teeth. The q sound, for example, sounds almost as much like 'ts' as it does like 'ch'.

  • When the letter n occurs at the end of a syllable, it is not pronounced. Instead it modifies the vowel to a nasal sound, like in French.

