树櫴希德
发表于 2024-5-9 22:47:36
edata 发表于 2023-6-10 16:32
大概率是ocr没有调用成功,没有获取到有效的字符串,导致建立多行文字的时候,使用了空字符串,你可以使 ...
长老,请问是是CAD折线文字重新输出图片然后WIN10 OCR文字到记事本然后又读取进来?
树櫴希德
发表于 2024-5-10 09:40:48
是不是识别PDF转CAD后的文字,将断线文字复制识别到CAD?
树櫴希德
发表于 2024-5-10 19:43:13
edata 发表于 2023-6-10 11:32
可以试试这个,https://github.com/zh-h/Windows.Media.Ocr.Cli
C:\>Windows.Media.Ocr.Cli.exe -o c:/1res.txt x2.png
ERROR: 字符串的长度不能为零。
参数名: oldValue
ERROR: 无法找不到组件。 (异常来自 HRESULT:0x88982F50)
能识别但不能导出,OS 名称: Microsoft Windows 10 专业版
OS 版本: 10.0.19045 暂缺 Build 19045
OS 制造商: Microsoft Corporation
OS 配置: 独立工作站
OS 构建类型: Multiprocessor Free
树櫴希德
发表于 2024-5-12 16:35:07
;(setq qz (vl-filename-mktemp "ocr" (getvar "SAVEFILEPATH")) ) ;前缀(vl-filename-mktemp "ocr" (getenv "temp"))
临时文件放临时保存文件夹较为合适,WIN10某些C盘不能读入写出文件,很烦,(getvar "SAVEFILEPATH") 保存到D盘
OCR不认文件路径"\\",只是认识 正斜杠 于是又要替换 ljx-string-translate
OCR放在CAD安装目录,反正又不大,,不用写路径直接如:(command "shell" Windows.Media.Ocr.Cli.exed:/temp/ocr004.png > d:/temp/ocr004.txt )
Windows.Media.Ocr.Cli.exe要把断线文字线条加宽,不然不认识,空却,还经常认错字,像我一样近视,不要抱太大期望
;;;;函数名称:ljx-string-translate
;;;;调用格式:(ljx-string-translate newtx oldtx str)
;;;;参数说明:newtx ----- 要替换成的新字符
;;;; oldtx-要替换的旧字符
;;;; str在str字符中替换
;;;;返回值:(ljx-string-translate "xx" "a" "123xxjkhxx") 返回""123ajkha""、
;;;; (ljx-string-translate "xx" "" "123jkh")返回"1213jkh"
;;;; newtx为""时相当于删除字符功能,非常好用
;;;;函数简介:字符串替换函数,要替换的源字符与新字符长度可以不同,替换非常灵活好用
;;;;函数来源:原创
;;;;函数作者:jixiangluo
;;;;适用版本:不限
;;;;最后更新时2019-08-03
;|(defun ljx-string-translate (newtx oldtx str / str1 i txz txy)
(setq str1 str)
(while (vl-string-search oldtx str1 )
(cond
((= oldtx str1)
(setq str newtx)
)
((= (vl-string-search oldtx str1) 0)
(setq i (+ (strlen oldtx) 1)
txy""
)
(repeat (- (strlen str1) (strlen oldtx))
(setq txy (strcat txy (substr str1 i 1))
i (1+ i)
)
);;;;repeat
(setq str (strcat newtx txy))
)
((= (vl-string-search oldtx str1) (- (strlen str1) (strlen oldtx) 1))
(setq i 1
txz ""
)
(repeat (- (strlen str1) (strlen oldtx))
(setq txz (strcat txz (substr str1 i 1))
i (1+ i)
)
)
(setq str (strcat txz newtx))
)
(T
(setq i 1
txz ""
)
(repeat (vl-string-search oldtx str1)
(setq txz (strcat txz (substr str1 i 1))
i (1+ i)
)
);;;;repeat
(setq i (+ i (strlen oldtx))
txy ""
)
(repeat (- (strlen str1) i -1)
(setq txy (strcat txy (substr str1 i 1))
i (1+ i)
)
);;;;repeat
(setq str (strcat txz newtx txy))
)
);;;;cond
(setq str1 str)
);;;;while
str1
);;;;defun|;
(defun ljx-string-translate (newtx oldtx str / str1 strL oldL txt tx1 tx2 n1)
(setq str1 str
strL (strlen str1)
oldL (strlen oldtx)
txt ""
)
(while (setq n1 (vl-string-search oldtx str1 ))
(setq tx1 (substr str1 1 n1)
tx2 (substr str1 (+ n1 1 oldL) (- strL (strlen tx1) oldL))
txt (strcat txt tx1 newtx)
)
(setq str1 tx2
strL (strlen str1)
)
)
(setq txt (strcat txt str1))
);;;;defun
(defun lst->ss (lst / SS X)
;图元转选择集
(if (and lst (setq lst (vl-remove nil lst))(setq
lst (VL-REMOVE-IF-NOT (FUNCTION (LAMBDA (A) (ENTGET A))) lst)
))
(progn
(setq ss (ssadd))
(last
(mapcar(function(lambda (x) (ssadd x ss)))lst)
)
)
)
SS
)
(defun $ocr-dwg$ (ss)
(setq qz (vl-filename-mktemp "ocr" (getvar "SAVEFILEPATH")) ) ;前缀(vl-filename-mktemp "ocr" (getenv "temp"))
(setq png-p (strcat qz ".png"));图片文件名
(setq txt-p (strcat qz ".txt"));txt文件名
;(setq ss(ssget ) ) ;提示用户框选pdf图元SS在主函数中"_P"
(VL-CMDF "PNGOUT" png-p ss "") ;输出图片
(SETQ EXE-P (findfile "Windows.Media.Ocr.Cli.exe"))
(vl-cmdf "shell" (strcat"Windows.Media.Ocr.Cli.exe"" " (ljx-string-translate "/" "\\" png-p) " " " > " (ljx-string-translate "/" "\\" txt-p) ))
(setq i 0)
(setq go t)
(while (and go (< i 200000000));防止ocr软件保存txt时间不够,这里给定循环时间
(if (setq f (vl-catch-all-apply 'open (list txt-p "r")))
;开启txt文件准备读取字串
(while
(setq str (vl-catch-all-apply 'read-line (list f)))
;逐行读取
(if str
(setq go nil)
) ;一旦读取到了字串就让while结束循环
(setq strs (cons str strs)) ;字串记录
)
)
(vl-catch-all-apply 'close (list f)) ;关闭文件
(setq i (1+ i))
)
(setq strs (vl-remove '"" strs)) ;删除空行
(setq strs (reverse strs));颠倒过来
(and ss(setq ents(vl-remove-if(function listp)(mapcar (function cadr) (ssnamex SS)))));选择集转图元
(setq jbs(mapcar(function(lambda(a)(cdr(assoc 5(entget a)))))ents));获取句柄
(if ss(SETQ BOX (getbox SS NIL)))
(vl-catch-all-apply 'vl-file-delete (list png-p)) ;删除图片文件
(vl-catch-all-apply 'vl-file-delete (list txt-p)) ;删除txt文件
(setq ss nil)
(LIST (CONS "坐标" BOX) (cons "字串" strs)(cons "图元句柄" jbs));返回
)
(defun getbox (xzj QJBLM / i xdir
ydir zdir origin wcsOrg UcsFlag
matLst matrix revMat sel ent
obj minPt maxPt minLs maxLs
maxX maxY minX minY
Make-Rectange-getbox dxf $GetMatrix$
)
;;; ;诡异的填充
;(ENTMAKE (QUOTE ((0 . HATCH) (100 . AcDbEntity) (100 . AcDbHatch) (10 0.0 0.0 0.0) (210 0.0 0.0 1.0) (2 . SOLID) (70 . 1) (71 . 0) (91 . 1) (92 . 1) (93 . 1) (72 . 2) (10 3.67321e-006 -1.0 0.0) (40 . 1.0) (50 . 1.5708) (51 . 1.5708) (73 . 1) (97 . 0) (75 . 0) (76 . 1) (98 . 1) (10 97.2 305.1 0.0))))会导致外包围错误
;;先判断UCS是否与WCS相同。如是则取得UCS的X方向,
;;Y方向,Z方向,UCS原点及WCS的原点相对UCS的坐标点
;;然后得到UCS变换矩阵和到WCS的逆变换矩阵
;QJBLM全局使用的变量名(字符串格式),这个将作为全局变量名供很多地方调用
(defun Make-Rectange-getbox (PTT1 PTT2 QJBLM)
;;; ;构造矩形
;;; (IF (/= $EF-tzqy(vl-princ-to-string(list(list (car PTT1) (cadr PTT1))(list (car PTT2) (cadr PTT1))(list (car PTT2) (cadr PTT2))(list (car PTT1) (cadr PTT2)))))(ALERT "变了哦"))
(set (read QJBLM)
(vl-princ-to-string
(list (list (car PTT1) (cadr PTT1))
(list (car PTT2) (cadr PTT1))
(list (car PTT2) (cadr PTT2))
(list (car PTT1) (cadr PTT2))
)
) ;将坐标点赋值给QJBLM
)
;;;(TTEDIT (entlast) "zx-tzqy" (cons 1000 (vl-princ-to-string(list(list (car PTT1) (cadr PTT1))(list (car PTT2) (cadr PTT1))(list (car PTT2) (cadr PTT2))(list (car PTT1) (cadr PTT2))))) "del")
)
(defun $GetMatrix$ (lst org Revflag / mat i j)
(setq mat (vlax-make-safearray vlax-vbDouble '(0 . 3) '(0 . 3)))
;初始化一个4X4的矩阵
(setq i 0)
(repeat 3
(vlax-safearray-put-element mat i 3 (nth i org)) ;平移变换
(setq j 0)
(repeat 3
(if RevFlag
(vlax-safearray-put-element mat i j (nth j (nth i lst)))
;角度逆变换
(vlax-safearray-put-element mat i j (nth i (nth j lst)))
;角度的变换
)
(setq j (1+ j))
)
(setq i (1+ i))
)
(vlax-safearray-put-element mat 3 3 1)
mat ;返回矩阵
)
(defun G:CrossProductor (vec1 vec2 / a b c d e f)
;;; ;两矢量的叉积
(setq a (car vec1))
(setq b (cadr vec1))
(setq c (caddr vec1))
(setq d (car vec2))
(setq e (cadr vec2))
(setq f (caddr vec2))
(list
(- (* b f) (* c e))
(- (* c d) (* a f))
(- (* a e) (* b d))
)
)
(if (and
xzj
(or (and (= (TYPE XZJ) 'VLA-OBJECT) (SETQ XZJ (LST->SS (LIST (vlax-vla-object->ename XZJ)))))
(and (= (TYPE XZJ) 'ENAME) (SETQ XZJ (LST->SS (LIST XZJ))))
;如果传递进来的是一个图元,就自动转换一下
(and (= (TYPE XZJ) 'list)
(= (TYPE (car XZJ)) 'ENAME)
(SETQ XZJ (LST->SS xzj))
) ;如果传递进来是一个图元列表ss,就自动转换选择集
t
)
(= (type xzj) 'pickset)
(or (and qjblm (= (type qjblm) 'str) (> (strlen qjblm) 1))
(not qjblm)
)
)
(progn
(setq UcsFlag (getvar "WORLDUCS"))
(if (= UcsFlag 0) ;UCS是否与WCS相同
(setq UcsFlag T ;设置标志位为true
xdir (getvar "UCSXDIR") ;X方向矢量
ydir (getvar "UCSYDIR") ;Y方向矢量
zdir (G:CrossProductor xdir ydir)
;X和Y的方向矢量的叉积
origin(getvar "UCSORG") ;原点
WcsOrg(trans '(0 0 0) 0 1) ;WCS的原点相对UCS的坐标
matLst(list xdir ydir zdir) ;旋转的变换矩阵表
matrix($GetMatrix$ matLst origin nil)
;从WCS到UCS的变换矩阵
revMat($GetMatrix$ matLst WcsOrg T) ;从UCS到WCS的变换矩阵
)
(setq UcsFlag nil) ;否则不予变换
)
;;在UCS下先变换物体到WCS下,取得每个物体的包围框,
;;求出包围框集合的最小XY,最大XY,并用矩形框画出来
;;然后把物体变换回到UCS,并把矩形也变换回去
(if (setq sel xzj)
;;选择物体
(progn
(setq i 0)
(setq minPt (vlax-make-safearray vlax-vbdouble '(0 . 2)))
;左下角点
(setq maxPt (vlax-make-safearray vlax-vbdouble '(0 . 2)))
;右上角点
(repeat (sslength sel)
(if (and (setq dxf nil
dxf (entget (setq ent (ssname sel i)))
) ;图元
(or (AND (WCMATCH (CDR (ASSOC 0 DXF)) "*TEXT")
(WCMATCH (CDR (ASSOC 1 DXF)) "[, ,,]")
;空格或者是空值“”
) ;如果一个空格文字的话
(and (= (cdr (assoc 0 dxf)) "HATCH")
(= (cdr (assoc 93 dxf)) 1)
) ;反正93等于1的时候不能创建填充,但是遇到诡异的填充,93等于1就可以创建,既然93在诡异的情况下可以创建,那就用93=1来排除它,诡异的填充在这里:;(ENTMAKE (QUOTE ((0 . HATCH) (100 . AcDbEntity) (100 . AcDbHatch) (10 0.0 0.0 0.0) (210 0.0 0.0 1.0) (2 . SOLID) (70 . 1) (71 . 0) (91 . 1) (92 . 1) (93 . 1) (72 . 2) (10 3.67321e-006 -1.0 0.0) (40 . 1.0) (50 . 1.5708) (51 . 1.5708) (73 . 1) (97 . 0) (75 . 0) (76 . 1) (98 . 1) (10 97.2 305.1 0.0))))
)
)
() ;以上条件成立以后,啥也不敢干呀
(PROGN
(setq obj (vlax-ename->vla-object ent)) ;obj对象
(and UcsFlag (vla-TransformBy obj revMat))
;反变换到WCS
;;; (vla-GetBoundingBox obj 'minpt 'maxpt)
(and (not (vl-catch-all-error-p
(vl-catch-all-apply
'vla-GetBoundingBox
(list obj 'minPt 'maxPt)
)
)
) ;得到包围框
(setq minPt (vlax-safearray->list minPt))
(setq maxPt (vlax-safearray->list maxPt))
(setq minLs (cons minPt minLs)) ;得到左下角点表
(setq maxLs (cons maxPt maxLs)) ;得到右上角点表
)
(and UcsFlag (vla-TransformBy obj matrix))
;变换回到UCS
)
)
(setq i (1+ i))
)
(and minLs
(setq minX (apply 'min (mapcar 'car minLs)))
;最小点集的最小X
(setq minY (apply 'min (mapcar 'cadr minLs)))
;最小点集的最小Y
)
(and maxLs
(setq maxX (apply 'max (mapcar 'car maxLs)))
;最大点集的最小X
(setq maxY (apply 'max (mapcar 'cadr maxLs)))
;最打点集的最小Y
)
(and QJBLM
(Make-Rectange-getbox
(list minX minY 0)
(list maxX maxY 0)
QJBLM
)
) ;构造边框
;;; (and
;;; UcsFlag ;如果UCS的话
;;; (vla-TransformBy
;;; (vlax-ename->vla-object (entlast))
;;; matrix ;变换边框到UCS
;;; )
;;; )
)
)
)
)
(IF (AND minX minY maxX maxY)
(list (list minX minY 0) (list maxX maxY 0))
)
)
(defun c:ocr123 (/ 1/2p 1/2p-n bl box box-new ent ents html jbs l obj result s sbjg ss str strs)
(setq sbjg ($ocr-dwg$ (ssget)))
(SETQ BOX(CDR(ASSOC "坐标" sbjg)))
(setq strs (cdr (assoc "字串" sbjg)))
(if strs
(progn
(setq jbs(cdr(assoc "图元句柄" sbjg)))
(setq ents(mapcar 'handent jbs))
(setq ss (ssadd))
(mapcar(function(lambda (x) (ssadd x ss)))ents)
(if ss(vl-cmdf "change" ss "" "P" "C" 1 ""))
(setq ss nil)
(setq s nil)
(while (setq str (car strs))
(ifs
(setq s (strcat s "\n" str))
(setq s str)
)
(setq strs (cdr strs))
)
(setq str s)
(SETQ L(ABS(APPLY '-(MAPCAR 'CAR BOX))))
(SETQ L(* L 0.8))
(setq OBJ (vla-addmtext(vla-get-modelSpace(vla-get-ActiveDocument (vlax-get-acad-object)))(vlax-3d-point (list 0 0 0))(vlax-make-variant L 5)str))
(SETQ ENT(VLAX-VLA-OBJECT->ENAME OBJ))
(SETQ BOX-NEW(GETBOX ENT NIL))
(SETQ BL(/(ABS(distance(car BOX)(cadr box)))(ABS(distance(car BOX-NEW)(cadr BOX-NEW)))))
(setq 1/2p(mapcar(function(lambda(x y)(* (+ x y)0.5)))(car BOX)(cadr BOX)))
(setq 1/2p-n(mapcar(function(lambda(x y)(* (+ x y)0.5)))(car BOX-NEW)(cadr BOX-NEW)))
(vla-ScaleEntity OBJ(vlax-3D-point 1/2p-n)bl)
(vla-move obj (vlax-3d-point 1/2p-n) (vlax-3d-point 1/2p))
(setqhtml (vlax-create-object "htmlfile")
result (vlax-invoke
(vlax-get (vlax-get html 'ParentWindow)
'ClipBoardData
)
'setData
"Text"
str
)
)
(vlax-release-object html)
)
)
obj
)
自贡黄明儒
发表于 2024-5-12 16:53:48
树櫴希德 发表于 2024-5-12 16:35
;(setq qz (vl-filename-mktemp "ocr" (getvar "SAVEFILEPATH")) ) ;前缀(vl-filename-mktemp "ocr" (gete ...
这可是新东西,先给您点赞,等有兴趣了再来玩。
树櫴希德
发表于 2024-5-12 18:09:59
自贡黄明儒 发表于 2024-5-12 16:53
这可是新东西,先给您点赞,等有兴趣了再来玩。
谢谢黄教授
qazxswk
发表于 2024-10-19 10:01:42
我运行后,选择PDF转过来的被打散的文字,卡死。
shz9
发表于 2024-10-19 20:17:06
学习了,有机会研究一下看看