BookmarkSubscribeRSS Feed
kaziumair
Quartz | Level 8

Hi everyone I am using proc textmine to extract names of people and organizations. However not all the names are getting extracted from the document, Is there a way to optimize the proc so that all names are extracted?

proc textmine data=mycas.extract;
doc_id id;
var text;
parse
   termwgt    = none
   cellwgt    = none
   reducef    = 4
   entities   = std
   outparent  = mycas.outparent
   outterms   = mycas.outterms
   outchild   = mycas.outchild
   outconfig  = mycas.outconfig
   ;
select "nlpPerson" "nlpOrganization"/group="entities" keep;
run;
4 REPLIES 4
sbxkoenk
SAS Super FREQ

Hello,

 

I am not sure whether you can improve your search for entities with PROC TEXTMINE.

You may try to add regular expressions to search for specific patterns.

You can also try to add a PROC TGPARSE (but it's older than PROC TEXTMINE so I do not expect it to be better when searching entities).

data cars;
input text $1-70;
datalines;
    The Volkswagen Polo is the World Car of the Year.
    Volkswagen won the award last year.
    Mazda sold the Mazda2 in bright green.
    The Ford Fiesta is sold in lime green.
    The Mazda2 was World Car of the Year in 2008.
;
run;
proc TGPARSE data=cars
    /* turn the entity finder on */
    entities=yes stemming=yes
    tagging=yes key=Key4 out=Out4;
    var text;
run;
/* end of program */

 

Cheers,

Koen

sbxkoenk
SAS Super FREQ

Hello,

Also, on top of my previous response (see above), visit the board :

Analytics > SAS Text and Content Analytics.

It might give you some ideas (on regular expressions a.o.).

And next time you have an NLP - textmine question, post it over there!

Koen

kaziumair
Quartz | Level 8

Hi, when I ran the tgparse code, I got the following error.tgparse_error.PNG

sbxkoenk
SAS Super FREQ

OK.

PROC TGPARSE is a procedure that is used by SAS TextMiner in SAS 9.4Mx.

Normally these Enterprise Miner and Text Miner procedures still function in SAS VIYA 3.x (if you have VDMML and Visual Text Analytics licensed), but apparently this one is not.

In that case, you are left with the regular expressions as an add-on to PROC TEXTMINE.

But have you tried the Visual Interface? Maybe with the Visual Interface, you discover some extra bells and whistles. 

Good luck,

Koen

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1117 views
  • 0 likes
  • 2 in conversation