Text mining and content categorization

Need some high level quidance on what approach to use in SAS Text Miner

Reply
New Contributor
Posts: 3

Need some high level quidance on what approach to use in SAS Text Miner

I was wondering if someone could point me in the right direction with SAS Text Miner.  I'm trying to do something, and was wondering if there's any similar examples, white papers, or training material that would help get me using the correct models in SAS Text Miner.
 
Basically what I'm looking to do is to:
 
1.  Read a bunch of job descriptions and try to determine the range of skills and abilities that are most associated with a group of jobs  (i.e. if I pull a bunch of Financial Analyst jobs, how strong would skills like Excel, Cognos, SAS, or Powerpoint come up)
 
2.  Next I was planning on reading in a bunch of post secondary programs (i.e. their program objectives, course objectives, course outlines, etc.) and determine/classify the types of skills that they develop.
 
3.  I wanted to take the two profiles from each corpus of documents (i.e. "in demand market" skills", and "in supply learning/program skills") and determine how well aligned they are ..... i.e. what skill sets that are required in jobs have high concordance with skill sets that are developed in the program, and what skill sets have weak concordance
 
 
I'm not that familiar with the tools and capabilities in Text Miner (I've only "played" with it a bit, and have taken the intro course), but was wondering if you could point me in the right direction, or know someone who may be able to recommend which particular text processing nodes I should use for these 3 things.
 

 

SAS Employee
Posts: 29

Re: Need some high level quidance on what approach to use in SAS Text Miner

Some things to try:

 

1. Try the Text Profile node using the job group as the target variable.

2. The topic node here. Be sure to work on a good stop list to remove terms that might strongly influence a topic but are not relevant to your goal

3. There are any number of things you can try. One straightforward one is to create topics or clusters on one set and then score the other  to see which docs from the second set are relevant to that first set and which are not.  Even better after you have investigated both sets, if you refine your topics and turn them into user topics (rather than the multiterm topics). You can really control what your looking for. For instance, you can define you own subtopics for various  aspects of financial analysis  with a weighted list of terms for each one and score every description and training document you have against those topics.

 

Russ

New Contributor
Posts: 3

Re: Need some high level quidance on what approach to use in SAS Text Miner

Russ,

 

This is very helpful.  Now at least I have an idea of where to focus my effort at figuring out what to use in Text Miner.  I'm sure I'll have more questions, but this is enough to get me started for now.  Thanks!

New Contributor
Posts: 3

Re: Need some high level quidance on what approach to use in SAS Text Miner

Russ,

I have somewhat of a hierarchical structure. In simple terms;

For the Jobs datasets;

An industry area, has many job postings, and a job posting has many
required skills or experience

For the Skills


A college program has many courses that have many skills outcomes



If I developed clustering and scoring to compare the fit of skills between
course outcomes and required job skills, would I be able to assess the fit
while respecting the hieracy. For example, could I assess how adequately a
program or a set of courses satisfies either particular job positions or
are particular range of positions within an industry area ? I'm just
uncertain as to how the clustering would consider the levels and
hieracichal dependencies of programs to course outcomes .... or jobs to
skill requirements.



##- Please type your reply above this line. Simple formatting, no
attachments. -##
SAS Employee
Posts: 29

Re: Need some high level quidance on what approach to use in SAS Text Miner

For a hierarchy you typically build a different model for each split of your hierarchiy. And you need enough data available as you work your way down the tree so that may not be feasible. The Text Rule Builder node should be useful if your building a predictive model  for this kind of hierarchy.

New Contributor
Posts: 3

Re: Need some high level quidance on what approach to use in SAS Text Miner

Thanks Russ,

So, if I were to use a simple case application of what I'm trying to do.

Let's say I have a job description for a Support Worker, and in that job
description they essential skills are; needs to know how to take blood, has
experience with elder care, ... etc....


and I have a 2 college programs


One in social work that has a course on elder care, and another program in
Nursing that teaches about taking blood.


In simplistic terms, I would like text miner to deduce that both these
programs would satisfy 50% of this type of job requirements


So there's a content categorization approach (essential job skills, and
essential education skills) and a hierarchy process (these skills belong to
this job, and these skills are taught in these programs or courses) and a
matching process (this program develops the most skills to support these
types of jobs in this type of industry (or company)

So if I understand you correctly, I would need to create a matching process
at all 3 (?) levels and somehow combine them to determine the program(s) of
best fit for a set of industry job descriptions ?


If you know of any case studies that are attempting to do this, it would be
great to be able to see an example.

Appreciate your help!

##- Please type your reply above this line. Simple formatting, no
attachments. -##
SAS Employee
Posts: 29

Re: Need some high level quidance on what approach to use in SAS Text Miner

If you're building a predictive model, you need training data.  Hopefully  hundreds or more of job descriptions  and then you score your new 2 new college programs with that model that you built.

 

If you do no have training data. Try building user-defined topics that is based on your domain knowledge and use the topic assignment as your classification. SAS also has a product called Content Categorization that is explicitly designed for this.

 

Sorry, no immediate papers on "hierarchical classification", but you can google that  phrase to find the challenges of it and the approaches people use.

 

 

Ask a Question
Discussion stats
  • 6 replies
  • 253 views
  • 0 likes
  • 2 in conversation