期刊导航

论文摘要

基于深度学习的任意形状场景文字识别

Arbitrary shape scene text recognition based on deep learning

作者:徐富勇(四川大学计算机学院);余谅(四川大学计算机学院);盛钟松(四川大学计算机学院)

Author:XU FuYong(College of Computer Science, Sichuan University, Chengdu 610065, China);YU Liang(College of Computer Science, Sichuan University, Chengdu 610065, China);SHENG ZhongSong(College of Computer Science, Sichuan University, Chengdu 610065, China)

收稿日期:2019-04-30          年卷(期)页码:2020,57(2):255-263

期刊名称:四川大学学报: 自然科学版

Journal Name:Journal of Sichuan University (Natural Science Edition)

关键字:深度学习; 场景文字识别; 神经网络; 注意力机制

Key words:Deep learning; Scene text recognition; Neural network; Attention mechanism

基金项目:国家自然科学基金(61872256)

中文摘要

场景文字识别的一个具有挑战性的方面是处理具有扭曲或不规则布局的文字.尤其是侧视文字和曲线文字在自然场景中较为常见,且难以识别.本文提出了一个带有灵活矫正功能的注意力增强网络,将其用于任意形状场景文字识别.此网络由基于卷积神经网络的文字矫正网络和基于注意力增强的识别网络两部分组成.矫正网络自适应地将输入图像中的文字进行矫正,降低识别难度,使基于注意力增强的序列识别网络直接根据矫正后的图像预测字符序列.整个模型可以进行端到端的训练,训练只需要图像和相应的文字真实标签.在各种公开数据集上进行了广泛的实验,包括SVT、ICDAR 2003和CUTE80等数据集,验证了此网络具有优异的性能.

英文摘要

One of the challenging in scene text recognition is to deal with distortions or irregular layout. Especially, perspective text and curved text are common in natural scenes and are difficult to recognize. In this paper, we propose an attention enhanced network with flexible rectification function for Arbitrary shape scene text recognition. The network consists of a text rectification network and an attention enhanced recognition network. The rectification network adaptively rectifies the text in the input image to reduce the difficulty recognition. The recognition network is an attention enhanced sequence to sequence model that predicts a character sequence directly from the rectified image. With end to end training approach, only images and corresponding text labels are required. Extensive experiments have been conducted on a variety of open datasets, including SVT, ICDAR 2003 and CUTE80, and the experimental results shows the proposed network has excellent performance.

关闭

Copyright © 2020四川大学期刊社 版权所有.

地址:成都市一环路南一段24号

邮编:610065