BookmarkSubscribeRSS Feed
CurtisMackWSIPP
Lapis Lazuli | Level 10

I have been writing traditional SAS code to classify building permit records into whether or not they are new residential construction or demolitions. There are a couple of categorical fields that help, but most of the information comes from a long  manually entered text description.  Up until now I have been writing code that searches that string for key words and phrases and makes decisions based on those.

 

I am thinking there must be a better approach to this problem using Enterprise Miner or some other tools.  Something where I can manually classify some records until some sort of machine learning algorithm is trained to make the decisions for me.  I process about 50 files a year with 10,000 to 100,000 records each.

 

Does anybody have a suggestions on how I should approach this?  Maybe an paper?

Thanks! 

3 REPLIES 3
Reeza
Super User

I think you'll need text miner to help with processing the text. Another one, if you know the type of words you're looking for is to do a word analysis. So the key words you've identified and just try a basic logistic regression model and that should be your 'baseline'. Any model from there on out should be giving you better results that the most basic. This can very much be an ML problem though. 

fierceanalytics
Obsidian | Level 7

Hello,

You can visit https://www.lexjansen.com/ to search all the papers related to this topic. Document classification could be end purpose, mainly data management. Another popular usage is to predict. There is a text miner procedure in SAS HPA you can consider. The primary benefit from using that is to better manage intermediate dataset, better than Text Miner. If you come from more open source background approaching SAS, Viya is better for you to start. 

Jia 

AnnKuo
SAS Employee

If you are licensed with SAS Text Miner, then check out the Text Rule Builder node which creates Boolean rules from small subsets of terms to predict a categorical target variable.  The Text Rule Builder node generates an ordered set of rules that together are useful in describing and predicting a target variable.  There is an example in the SAS Text Miner 15.2 that shows you how to predict a categorical target variable using this node.

 

Also in the following SAS Global Forum paper

Classifying and Predicting Spam Messages using Text Mining in
SAS® Enterprise Miner™

 

five other predictive models including memory-based reasoning (MBR), logistic regression, decision tree, random forest and neural network were built and their performance was compared with the Text Rule Builder model.  The best model is later used to classify and predict the messages as spam and ham (non-spam).

 

Hope this helps!

 

-Ann

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1998 views
  • 0 likes
  • 4 in conversation