Hello everybody,
I need to do some analysis with Text Miner in SAS Enterprise Miner. I have a Text variable which has a 500 character length and this variable has 30000 distinct values, I want to standartize this variable by excluding +/- signs or conjuctions like “or”, “with” and etc. I also want to transform this values from complicated value to pure value, let’s pretend that I have a value as below;
Papers *?
PAPER
paper
pAPER
Paper
pape
PAPE
Pape
Paper with
Paper or book
I want to see the above values only Paper, how I can do it with Text Miner? Can somebody hepl me to resolve this, please
Thanks
Hi,
This sounds a little more like a data cleansing-fuzzy matching type of task. Take a look at something like this for sas functions and programs to help you standardize the input. https://www.lexjansen.com/sesug/2018/SESUG2018_Paper-143_Final_PDF.pdf
Text Miner is based on how terms tend to cooccur together within documents. The learning occurs across the collection based on how these patterns of cooccurrences exist. In your example, where you mostly have a single term per document, there is no cooccurrence going on and so Text Miner is not the best tool for this kind of task.
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.