Synthetic Sample Extension in Implementation of Tangut Character Databases


Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

The Tangut script was a logographic writing system used for the extinct Tangut language of the Western Xia Dynasty, which spanned 1038 to 1227. The technic of optical character recognition, machine learning, and computer vision will help greatly in the unscrambling of the character in the ancient scripts. But all these technics are based on the character database, which provides learning samples and test standards. In the process of building the Tangut Character Databases using the ancient Tangut scripts as a data source, it is found that the problem of imbalanced class distribution significantly compromises the performance of learning algorithms. A method of synthetic sample generation was proposed in this paper to improve the performance of learning and recognition of Tangut characters. The comparison of recognition accuracy between the learning base in the original data set and the synthetic generated data set was demonstrated, and presented an impressive superiority utilizing the researchers’ method. The organization of Tangut character databases was also introduced in this paper.

About the authors

Yifei Meng

School of Electronic and Information Engineering Beijing Jiaotong University; School of Physics and Electronic-Electrical Engineering Ningxia University

Author for correspondence.
Email: river_dance@163.com
China, Beijing; Yinchuan

Xue Yuan

School of Electronic and Information Engineering Beijing Jiaotong University

Email: river_dance@163.com
China, Beijing

Xueye Wei

School of Electronic and Information Engineering Beijing Jiaotong University

Email: river_dance@163.com
China, Beijing

Wenhui Yang

School of Physics and Electronic-Electrical Engineering Ningxia University

Email: river_dance@163.com
China, Yinchuan

Yan Chen

School of Physics and Electronic-Electrical Engineering Ningxia University

Email: river_dance@163.com
China, Yinchuan

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2018 Allerton Press, Inc.