Hi everyone I am using proc textmine to extract names of people and organizations. However not all the names are getting extracted from the document, Is there a way to optimize the proc so that all names are extracted?
proc textmine data=mycas.extract;
doc_id id;
var text;
parse
termwgt = none
cellwgt = none
reducef = 4
entities = std
outparent = mycas.outparent
outterms = mycas.outterms
outchild = mycas.outchild
outconfig = mycas.outconfig
;
select "nlpPerson" "nlpOrganization"/group="entities" keep;
run;
Hello,
I am not sure whether you can improve your search for entities with PROC TEXTMINE.
You may try to add regular expressions to search for specific patterns.
You can also try to add a PROC TGPARSE (but it's older than PROC TEXTMINE so I do not expect it to be better when searching entities).
data cars;
input text $1-70;
datalines;
The Volkswagen Polo is the World Car of the Year.
Volkswagen won the award last year.
Mazda sold the Mazda2 in bright green.
The Ford Fiesta is sold in lime green.
The Mazda2 was World Car of the Year in 2008.
;
run;
proc TGPARSE data=cars
/* turn the entity finder on */
entities=yes stemming=yes
tagging=yes key=Key4 out=Out4;
var text;
run;
/* end of program */
Cheers,
Koen
Hello,
Also, on top of my previous response (see above), visit the board :
Analytics > SAS Text and Content Analytics.
It might give you some ideas (on regular expressions a.o.).
And next time you have an NLP - textmine question, post it over there!
Koen
Hi, when I ran the tgparse code, I got the following error.
OK.
PROC TGPARSE is a procedure that is used by SAS TextMiner in SAS 9.4Mx.
Normally these Enterprise Miner and Text Miner procedures still function in SAS VIYA 3.x (if you have VDMML and Visual Text Analytics licensed), but apparently this one is not.
In that case, you are left with the regular expressions as an add-on to PROC TEXTMINE.
But have you tried the Visual Interface? Maybe with the Visual Interface, you discover some extra bells and whistles.
Good luck,
Koen
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.