.txt中的信息大致如下
(由于附件太大,我只上傳了一個drug的具體信息,全部的上傳到了百度網(wǎng)盤批量提取文本中的網(wǎng)址,連接如下),想要提取的內(nèi)容都是紅色加粗部分:
#
# :
20:12.04.12
# :
# :
is 100% .
# :
/ may bleed risk. be .
The , , the risk of when with the , . for .
# :
For the of -
# :
# :
#
希望經(jīng)過處理后得到的out.txt如下:
, -
... .... ... ... ...
如下是我的程序,得不到結(jié)果,希望大神能給出有效的程序(不需要幫我改我的程序),只要能得到我想要的結(jié)果就好批量提取文本中的網(wǎng)址,灰常感謝!!
# 2>nul&@Gawk -f %0 .txt&Exit
BEGIN{("ENTRY ATC code \n")>>"$Data.txt";A[2]=I[2]=D[2]=P[2]="~"}
END{("\n擁有ATC code的藥物有%d種\n擁有Drug group的藥物有%d種\n擁有 的藥物有%d種\n擁有的藥物有%d種\n",_A,_I,_D,_P)>>"$Data.txt"}
$1~"http:///"{
A[2]!="~"?_A++:0;I[2]!="~"?_I++:0;D[2]!="~"?_D++:0
!="~"?_P++:0
("%-16s %-15s %-16s %-31s %s\n",E,A[2],I[2],D[2],P[2])>>"$Data.txt"
A[2]=I[2]=D[2]=P[2]="~"
}
$1~"ENTRY"{{split($0,B,"");gsub(" ",E[2])}
$0~"ATC code"{split($0,A,": ");gsub(" ",",",A[2])}
$0~":"{split($0,I,": ");gsub(" ",",",I[2])}
$0~":"&&$0!~"of"{split($0,D,:: ");gsub(" ",",",D[2])}
$0~""{split($0,P,": ");gsub(" ",",",P[2])}