How do Search Engines Interpret Non-Latin Characters?
Aug
In this article, I'll strive to explain you how search engines like Google interpret non-latin characters. As I'm familiar with China, I'll only focus on simplified Chinese.
However, I know there are many readers of this blog from other Asian countries like South Korea, Japan, India, Malaysia, Thailand, etc. Feel free to share your knowledge about how search engines interpret queries in your language.
Let's take an example: I'm searching for "clothes" on google.cn (well actually google.com.hk since Google China has gone to Hong Kong).
If I use simplified Chinese characters, I perform the query "衣服", and Google displays the following search results page:
Note that Google identifies the search term "衣服" in red in the search results page, like on Baidu. The search term is highlighted in bold on other Google's sites.
By the way, Google and Baidu are able to read non-latin URLs including Chinese characters. Even domain names will very soon include Chinese characters, as ICANN approved it latest June.
What if I now use Pinyin, the romanization system for Mandarin? I write "yifu". Check out the search results page:
As you can see, the search results are totally different from the query with the term in simplified Chinese. In that case, Google gives for example more weight to sites with domain names including the keyword "yifu".
What is interesting to notice here is that not only does "yifu" term appear in red, but "衣服" is also displayed in the same color in the search results, as if I was searching for the term "衣服". And Google suggests me to search for "衣服"...
If I use the exact Pinyin term "yīfu" (with accent), the search results page will be as following:
This query uses the right spelling for Pinyin but it's actually the one that gives the less relevant results...
How do search engines deal with your language? Please leave your comments!

Comments
Post new comment