Hi Experts,
It's the first time I am working on such project and I would like some advices.
The scope of the project is to extract from free-text documents street addresses, if provided in the text.
I am attempting to extract street addresses from the text documents and I was thinking of using SAS Visual Text Analytics,
because from the documentation, it should contain the functionality to do the task.
Finally, the text documents to be analyzed are not US ones but in foreign language covered by SAS Visual Text Analytics.
Do you have experiences in this regard to suggest me a feasible approach?
Hello @sassy7 ,
As you shouldn't do this by plain SAS programming (e.g. with the data step)
, I have moved your post from the "SAS Programming"-board to the "SAS Data Science"-board.
You need to use SAS Text Miner (SAS 9.4) or SAS Visual Text Analytics (VTA in SAS VIYA) indeed.
Look here :
Explore NLP (predifined categories) and LITI rules (Concepts node) !
Koen
Hello @koen,
Many thanks for te info provided!
Although street addresses are predefined concepts (nlpPlace), using custom concepts would requires writing the liti/regex syntax: could give me some tips?
Also in the SAS Community blogs I have read several posts citing SAS Dataflux/SAS Data Quality as the best tool for address parsing/standardization (where parsing is not Text Analytics parsing).
I am wondering which tool could be the most feasible one or whether both tools are required-- e.g., Visual Text Analytics for concept extraction and SAS Dataflux/SAS Data Quality for address standardization.
Do you have experiences in these aspects to suggest me a feasible approach?
What's the scope of your use case? How many addresses are you dealing with spread across which countries? How accurate do these addresses have to be? Do you need to accurately locate and geocode them or not? Dataflux / Data Quality has a significant learning curve and effort so is really only worthwhile if you have a lot of addresses that MUST be accurately verified / located / geocoded. It is not worthwhile for low address volumes and where accurate verification and location is not required. A DIY approach with Base SAS tools is more appropriate for these. If you already have experience with Text Mining and SAS DQ then the answers might be different.
Hello @koen,
So the choice about which application to use (e.g. SAS Visual Text Analytics and/or SAS Dataflux/SAS Data Quality and/or Base SAS) will depend on the numbers of concepts extracted in the texts.
One more thing, do you know whether SAS Visual Text Analytics provides a macro similar to %TMFILTER to filter text documents?
Hello @sassy7 ,
Doc %tmfilter macro :
https://go.documentation.sas.com/doc/en/tmref/15.3/n1f1hnf1pk8w3in1i2h4v94rty2m.htm
SAS® Visual Text Analytics
https://support.sas.com/en/software/visual-text-analytics-support.html
Cheers,
Koen
Thanks a lot @koen!
Also... do you know where I can find documentation about the SAS Text Miner procedures, like the ones available in SAS Visual Text Analytics (proc boolrule, proc textmine and proc tmscore). I found documents about the HP procedures of SAS Text Miner but not about the "standard" procedure used for SAS Text Miner - it would be much easier to use the procedures instead of the UI environment. Or the procedures are the same for the two environments?
Hello,
This is the official doc :
For the standard SAS Text Miner procedures (NOT the HP-procedures), you need to contact SAS Technical Support in your region / country. They will provide you with the doc such that you will be able to run the procedures without using the UI. I am not allowed to share that doc.
Added note :
The reason that that documentation for the Text Miner procedures is being withheld is that there is intelligence in the UI. The UI makes sure you can't place certain conflicting options, for example. If you use the procedures directly you don't have that protection and then Technical Support gets all sorts of avoidable questions (questions that are avoided if you use the UI as provided and as designed).
But OK, if you explicitly ask for that documentation and Technical Support can assess that you "know what you're doing" you're going to get that procedures documentation.
Thanks,
Koen
Hello @koen
will do! Thanks a lot for your help!
Hello,
As you are aware, the pre-defined concept "nlpPlace" can extract street addresses. If this does not meet your requirements, you can also customize concepts by writing LITI rules.
For instance, you can start by defining concepts that constitute address components, such as StreetName, StreetType, City, State, Country, and so on. Afterwards, you can combine these concepts following language conventions. For example:
CONCEPT: StreetName, City, State, Country.
For guidance on writing LITI rules, please refer to the page: https://go.documentation.sas.com/doc/en/ctxtcdc/v_015/ctxtug/p1kf71w7npr9ecn1gysvovfs42x2.htm
Hope that helps.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.