本来想说是验证码的今生前世的 however 今天刚看验证码识别
还是初生吧
pip install pytesseract pip install Pillow
然后下载http://code.google.com/p/tesseract-ocr谷歌的验证码识别
然后关键一步 踩了一小时坑 特喵的中文回答少之又少
在C:\Python34\Lib\site-packages\pytesseract目录下有个pytesseract.py文件
将tesseract_cmd = ‘tesseract’
改成tesseract_cmd = ‘C:/Program Files (x86)/Tesseract-OCR/tesseract.exe’
windows坑啊
然后就简单了
from PIL import Image import pytesseract from urllib.request import urlopen import sys url = sys.argv[1] b = urlopen(url) img = Image.open(b) a = pytesseract.image_to_string(img) print(a)
粗略的写了下
识别率还算很高