Chinese Character Extraction

Last updated on Mar 25, 2020

Summary

This project aims to extract every single Chinese character from manuscript. It is a part of the goal to digitalize and preserve handwritings of ancient manuscripts. Unlike printed documents, the varieties of handwriting are complex, especially characters overlapping with each other that can’t be well segmented with simple straight lines. The problem is modeled as a Hidden Markov Model and a dynamic programming approach called Viterbi algorithm is used to find the most likely paths. All algorithms are implemented by JavaScript and could be run directly in browsers. Demo is available below.

Project Presentation

Demo

should take a few seconds at most

Academia-Sinica