BookmarkSubscribeRSS Feed
ertr
Quartz | Level 8

Hello everybody,

 

I need to do some analysis with Text Miner in SAS Enterprise Miner. I have a Text variable which has a 500 character length and this variable has 30000 distinct values, I want to standartize this variable by excluding +/- signs or conjuctions like “or”, “with” and etc. I also want to transform this values from complicated value to pure value, let’s pretend that I have a value as below;


Papers *?

PAPER

paper

pAPER

Paper

pape

PAPE

Pape

Paper with

Paper or book

 

I want to see the above values only Paper, how I can do it with Text Miner? Can somebody hepl me to resolve this, please

Thanks

1 REPLY 1
RussAlbright
SAS Employee

Hi,

 

This sounds a little more like a data cleansing-fuzzy matching type of task. Take a look at something like this for sas functions and programs to help you standardize the input. https://www.lexjansen.com/sesug/2018/SESUG2018_Paper-143_Final_PDF.pdf

Text Miner is based on how terms tend to cooccur together within documents. The learning occurs across the collection based on how these patterns of cooccurrences exist. In your example, where you mostly have a single term per document, there is no cooccurrence going on and so Text Miner is not the best tool for this kind of task. 

 


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 594 views
  • 1 like
  • 2 in conversation