Chinese Character Extraction


This project aims to extract every single Chinese character from manuscript. It is a part of the goal to digitalize and preserve handwritings of ancient manuscripts. Unlike printed documents, the varieties of handwriting are complex, especially characters overlapping with each other that can’t be well segmented with simple straight lines. The problem is modeled as a Hidden Markov Model and a dynamic programming approach called Viterbi algorithm is used to find the most likely paths. All algorithms are implemented by JavaScript and could be run directly in browsers. Demo is available below.

Project Presentation


should take a few seconds at most