三藏 Sanzang: CJK Machine Translation
Sanzang is a compact and simple cross-platform machine translation system. This program is especially useful for translating from CJK languages (Chinese, Japanese, and Korean), and it is very suitable for working with ancient and otherwise difficult texts. Unlike most other machine translation systems, Sanzang is small and approachable. Any user can develop his or her own translation rules, and these rules are simply stored in a text file and applied at runtime.
Sanzang runs on Unix, Linux, BSD, Mac OS X, and Windows platforms. Ruby 1.9 or later is required. See the Sanzang Manual for details on installation and compatibility. Sanzang is free software and open source, and it is licensed under the GNU GPLv3.
This is a simplified and limited Web interface for Sanzang, that is useful for demo purposes, or for short snippets of text (up to around one or two fascicles). Using this, you can try out some basic Sanzang functionality without needing to download or install any software. For more advanced uses and greater flexibility, the Sanzang software below is more suitable.
Full documentation is available here including a manual teaching about Sanzang concepts, how to install the program, translation rules, command line usage, advanced features, and more. API documentation is also available here if you want to use the Sanzang internals as a programming library.
The Sanzang translation engine is the main Sanzang program. To generate translation listings using Sanzang, you only need this program and a set of translation rules. The Sanzang program is distributed in RubyGem format, and it is hosted on RubyGems.org. You can also find an archive of current and older versions on this website. Finally, all Sanzang programming and development is hosted on GitHub.
The basic translation engine for Sanzang has been built, but our set of translation rules for the Taishō Tripiṭaka is still incomplete. This is a long-term project. Our aim is to have a fairly reliable translation table in the future, which will ease reading and translation of the Taishō Tripiṭaka. Currently our working set of translation rules is hosted on GitHub for development. The translation table is called zh-en_tripitaka.
2013-10-25: Sanzang on the Web now has rich formatting for all output, rather than a plain text listing. This makes the translation output much more readable.
2013-08-26: The beginnings of a translation table are now available on GitHub for development and tracking. The current set of translation rules is called zh-en_tripitaka. A simple Web interface for Sanzang is also now available called Sanzang on the Web.
2013-05-10: Sanzang 1.0 has been released, the first in the 1.x series. Users who have installed earlier versions of Sanzang are encouraged to uninstall these before installing new versions of Sanzang.
The Sanzang translation engine is ready for use. Running on a mid-range PC with a translation table of approximately 6000 rules, sanzang batch can generate translation listing files for the entire CBETA standard corpus (Taishō volumes 1-55, and 85) in less than 30 minutes. The next major phase is the development of a larger and more reliable translation table.