最近忽然对验证码识别很感兴趣,在网上找到一个OCR的模块tesseract , 试用了一下啊,效果不错。我下了个最新版的tesseract,貌似不需要怎么训练识别率也能达到90%以上,前提是自己要写程序把图片变成黑白的并且去噪。
Background
The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. A tiff reader is built in that will read uncompressed TIFF images, or libtiff can be added to read compressed images.

订阅我的BLOG(RSS)