Some things to try:
1. Try the Text Profile node using the job group as the target variable.
2. The topic node here. Be sure to work on a good stop list to remove terms that might strongly influence a topic but are not relevant to your goal
3. There are any number of things you can try. One straightforward one is to create topics or clusters on one set and then score the other to see which docs from the second set are relevant to that first set and which are not. Even better after you have investigated both sets, if you refine your topics and turn them into user topics (rather than the multiterm topics). You can really control what your looking for. For instance, you can define you own subtopics for various aspects of financial analysis with a weighted list of terms for each one and score every description and training document you have against those topics.
Russ
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Russ,
This is very helpful. Now at least I have an idea of where to focus my effort at figuring out what to use in Text Miner. I'm sure I'll have more questions, but this is enough to get me started for now. Thanks!
For a hierarchy you typically build a different model for each split of your hierarchiy. And you need enough data available as you work your way down the tree so that may not be feasible. The Text Rule Builder node should be useful if your building a predictive model for this kind of hierarchy.
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
If you're building a predictive model, you need training data. Hopefully hundreds or more of job descriptions and then you score your new 2 new college programs with that model that you built.
If you do no have training data. Try building user-defined topics that is based on your domain knowledge and use the topic assignment as your classification. SAS also has a product called Content Categorization that is explicitly designed for this.
Sorry, no immediate papers on "hierarchical classification", but you can google that phrase to find the challenges of it and the approaches people use.
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.