BookmarkSubscribeRSS Feed
sassy7
Obsidian | Level 7

Hi Experts,

It's the first time I am working on such project and I would like some advices.
The scope of the project is to extract from free-text documents street addresses, if provided in the text.

I am attempting to extract street addresses from the text documents and I was thinking of using SAS Visual Text Analytics,
because from the documentation, it should contain the functionality to do the task.

Finally, the text documents to be analyzed are not US ones but in foreign language covered by SAS Visual Text Analytics.

Do you have experiences in this regard to suggest me a feasible approach?

10 REPLIES 10
sbxkoenk
SAS Super FREQ

Hello @sassy7 ,

 

As you shouldn't do this by plain SAS programming (e.g. with the data step)

, I have moved your post from the "SAS Programming"-board to the "SAS Data Science"-board.

You need to use SAS Text Miner (SAS 9.4) or SAS Visual Text Analytics (VTA in SAS VIYA) indeed.

 

Look here :

 

Explore NLP (predifined categories) and LITI rules (Concepts node) !

 

Koen

sassy7
Obsidian | Level 7

Hello @koen,

Many thanks for te info provided!


Although street addresses are predefined concepts (nlpPlace), using custom concepts would requires writing the liti/regex syntax: could give me some tips?

 

Also in the SAS Community blogs I have read several posts citing SAS Dataflux/SAS Data Quality as the best tool for address parsing/standardization (where parsing is not Text Analytics parsing). 

I am wondering which tool could be the most feasible one or whether both tools are required-- e.g., Visual Text Analytics for concept extraction and SAS Dataflux/SAS Data Quality for address standardization. 

 

Do you have experiences in these aspects to suggest me a feasible approach?

SASKiwi
PROC Star

What's the scope of your use case? How many addresses are you dealing with spread across which countries? How accurate do these addresses have to be? Do you need to accurately locate and geocode them or not? Dataflux / Data Quality has a significant learning curve and effort so is really only worthwhile if you have a lot of addresses that MUST be accurately verified / located / geocoded. It is not worthwhile for low address volumes and where accurate verification and location is not required. A DIY approach with Base SAS tools is more appropriate for these. If you already have experience with Text Mining and SAS DQ then the answers might be different. 

sassy7
Obsidian | Level 7

Hello @koen,

So the choice about which application to use (e.g. SAS Visual Text Analytics and/or SAS Dataflux/SAS Data Quality and/or Base SAS) will depend on the numbers  of concepts extracted  in the texts.
One more thing, do you know whether SAS Visual Text Analytics provides a macro similar to %TMFILTER to filter text documents?

sbxkoenk
SAS Super FREQ

Hello @sassy7 ,

 

  • I do not know SAS Data Quality using DataFlux very well, so I cannot comment on that.
  • In the data step,
    you can try to accomplish your task with all kinds of (parsing) functions and regex.
    Both SAS regular expressions (the RX functions) and Perl regular expressions (the PRX functions) allow you to locate patterns in text strings.
  • You can definitely use SAS Text Miner (last version 15.3) or SAS Visual Text Analytics (you need SAS VIYA for that and VTA licensed of course).
    The %tmfilter macro can be used to retrieve the documents into a SAS dataset. %tmfilter can also act as a Web crawler macro. %tmfilter is only in SAS Text Miner and NOT in VTA (but similar functionality is there in VTA of course).

Doc %tmfilter macro : 
https://go.documentation.sas.com/doc/en/tmref/15.3/n1f1hnf1pk8w3in1i2h4v94rty2m.htm

 

SAS® Visual Text Analytics
https://support.sas.com/en/software/visual-text-analytics-support.html

 

Cheers,
Koen

sassy7
Obsidian | Level 7

Thanks a lot @koen!

sassy7
Obsidian | Level 7

Also... do you know where I can find documentation about the SAS Text Miner procedures, like the ones available in SAS Visual Text Analytics (proc boolrule, proc textmine and proc tmscore). I found documents about the HP procedures of SAS Text Miner but not about the "standard" procedure used for SAS Text Miner - it would be much easier to use the procedures instead of the UI environment. Or the procedures are the same for the two environments?

 

sbxkoenk
SAS Super FREQ

Hello,

 

This is the official doc :

 

For the standard SAS Text Miner procedures (NOT the HP-procedures), you need to contact SAS Technical Support in your region / country. They will provide you with the doc such that you will be able to run the procedures without using the UI. I am not allowed to share that doc.

 

Added note :
The reason that that documentation for the Text Miner procedures is being withheld is that there is intelligence in the UI. The UI makes sure you can't place certain conflicting options, for example. If you use the procedures directly you don't have that protection and then Technical Support gets all sorts of avoidable questions (questions that are avoided if you use the UI as provided and as designed).
But OK, if you explicitly ask for that documentation and Technical Support can assess that you "know what you're doing" you're going to get that procedures documentation.

 

Thanks,

Koen

sassy7
Obsidian | Level 7

Hello @koen 

will do! Thanks a lot for your help!

Meilan
SAS Employee

Hello,

 

As you are aware, the pre-defined concept "nlpPlace" can extract street addresses. If this does not meet your requirements, you can also customize concepts by writing LITI rules.

For instance, you can start by defining concepts that constitute address components, such as StreetName, StreetType, City, State, Country, and so on. Afterwards, you can combine these concepts following language conventions. For example:
CONCEPT: StreetName, City, State, Country.

 

For guidance on writing LITI rules, please refer to the page: https://go.documentation.sas.com/doc/en/ctxtcdc/v_015/ctxtug/p1kf71w7npr9ecn1gysvovfs42x2.htm 

 

Hope that helps.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2929 views
  • 0 likes
  • 4 in conversation