Buddhist texts and resources for the cultivation path


三藏 Sanzang: CJK Machine Translation

Sanzang is a compact and simple cross-platform machine translation system. Sanzang is especially useful for translating from the CJK languages (Chinese, Japanese, and Korean), and even from ancient and otherwise difficult texts. Unlike most other machine translation systems, Sanzang is small and approachable. Any user can develop his or her own translation rules, and these rules are simply stored in a text file and applied at runtime.

Sanzang can run on Unix, Linux, BSD, Mac OS X, and Windows platforms. Ruby 1.9 or later is required. Please see the Sanzang manual for details on installation and compatibility. Sanzang is open-source and free software, licensed under the GNU GPL.

Sanzang on the Web

This is a simplified and limited Web interface for Sanzang, that is useful for demo purposes, or for short snippets of text (up to around one or two fascicles). Using this, you can try out some basic Sanzang functionality without needing to download or install any software. For anything beyond a simple demonstrations, the full Sanzang program is more suitable.

The Sanzang Program Gem Version

This is the main Sanzang program – the translation engine. To generate translation listings using Sanzang, you only need this program along with a set of translation rules. The Sanzang program is distributed in RubyGem format, and it is hosted on RubyGems.org. You can also find an archive of current and older versions on this website. Finally, all Sanzang programming and development is hosted on GitHub. Full documentation is also available:

Sanzang Translation Rules

The basic translation engine for Sanzang has been built, but our set of translation rules for the Taishō Tripiṭaka is still incomplete. This is a long-term project. Our aim is to have a fairly reliable translation table in the future, which will ease reading and translation of the Taishō Tripiṭaka. Currently our working set of translation rules is hosted on GitHub for development. The translation table is called zh-en_tripitaka.

Project Status

The Sanzang translation engine is ready for use. Running on a mid-range PC with a translation table of approximately 6000 rules, sanzang batch can generate translation listing files for the entire CBETA standard corpus (Taishō volumes 1-55, and 85) in less than 30 minutes. The next major phase is the development of a larger and more reliable translation table.

^ top