본문 바로가기
인공지능

MacOS에 tesseract 설치하여 사용하기 (영어, 한글)

by judy@ 2023. 7. 25.

이것만 해주면 영어 OCR은 된다. 한국어는 안됨

$ homebrew install tesseract
$ pip install pytesseract

영어 OCR

import pytesseract
from PIL import Image

img = Image('test.png')
img_text = pytesseract.image_to_string(img)
print(img_text)

 

 

한국어 설정하기

$ brew list tesseract
/opt/homebrew/Cellar/tesseract/5.3.2/bin/ambiguous_words
/opt/homebrew/Cellar/tesseract/5.3.2/bin/classifier_tester
/opt/homebrew/Cellar/tesseract/5.3.2/bin/cntraining
/opt/homebrew/Cellar/tesseract/5.3.2/bin/combine_lang_model
/opt/homebrew/Cellar/tesseract/5.3.2/bin/combine_tessdata
/opt/homebrew/Cellar/tesseract/5.3.2/bin/dawg2wordlist
/opt/homebrew/Cellar/tesseract/5.3.2/bin/lstmeval
/opt/homebrew/Cellar/tesseract/5.3.2/bin/lstmtraining
/opt/homebrew/Cellar/tesseract/5.3.2/bin/merge_unicharsets
/opt/homebrew/Cellar/tesseract/5.3.2/bin/mftraining
/opt/homebrew/Cellar/tesseract/5.3.2/bin/set_unicharset_properties
/opt/homebrew/Cellar/tesseract/5.3.2/bin/shapeclustering
/opt/homebrew/Cellar/tesseract/5.3.2/bin/tesseract
/opt/homebrew/Cellar/tesseract/5.3.2/bin/text2image
/opt/homebrew/Cellar/tesseract/5.3.2/bin/unicharset_extractor
/opt/homebrew/Cellar/tesseract/5.3.2/bin/wordlist2dawg
/opt/homebrew/Cellar/tesseract/5.3.2/include/tesseract/ (12 files)
/opt/homebrew/Cellar/tesseract/5.3.2/lib/libtesseract.5.dylib
/opt/homebrew/Cellar/tesseract/5.3.2/lib/pkgconfig/tesseract.pc
/opt/homebrew/Cellar/tesseract/5.3.2/lib/ (2 other files)
/opt/homebrew/Cellar/tesseract/5.3.2/share/tessdata/ (35 files)

 

kor 데이터 다운로드

https://tesseract-ocr.github.io/tessdoc/Data-Files.html 에서 korean 찾아 클릭하여 다운로드

 

 

데이터를 tessdata에 옮기기

cp kor.traineddata /opt/homebrew/Cellar/tesseract/5.3.2/share/tessdata/

 

한국어 OCR 확인

import pytesseract
from PIL import Image

img = Image('test.png')
img_text = pytesseract.image_to_string(img, lang='kor')
print(img_text)
반응형