BookmarkSubscribeRSS Feed
TET_34
Fluorite | Level 6

Hello everyone,

 

I have a question about Text Mining.

 

We have a data that includes whole job applications with candidate information and we have also data includes whole job postings with employer information and these datas are historical. These data sets will be a live data in the future but we currently proceed over a specific reporting date(snapshot).

 

For Job Postings we have two information;

Employer ID
Job Description


We created a single column that is called Job Description by pulling the following values;

Gender
Location
Military Status
Educational Status
Social Status (Disabled, Veteran etc.),
Marital Status
Driver's License
Foreign Language

 

For Job Applications we have two information;

Employee ID
Candidate Information


We created a single column that is called Candidate Information by pulling the following values;

Gender
Location
Military Status
Educational Status
Social Status (Disabled, Veteran etc.),
Marital Status
Driver's License
Foreign Language

 

As you know, the nodes I can use for Text Miner on Enterprise Miner are as follows;

Import, Parsing, Filter, Topic, Cluster, Profile and Rule Builder

 

The targeted operations with Text Mining are as follows;
  1. Our goal is to find out how well the Candidate Information and the Job Description table match.
  2. Assigning a score according to the match rate.
  3. We want to protect the top 30 candidates with the best score by avoiding multiplication.
 
As a result we want to get an output similar to the table below.
 
KEYRANKEmployer_ID

JOB_DESCRIPTION 

(TEXT)

Employee_ID

CANDIDATE_INFORMATION

(TEXT)

MATCH SCORE
111A B c d e10A B C D E F100
221A B c d e20A B C D E100
331A B c d e30A B C60
441A B c d e70C UK20
::1::::
30301A B c d e40A20
3112TR uK usA ITA90TR UK USA ITA100
3222TR uK usA ITA50TR UK50
3332TR uK usA ITA60TR USA50
3442TR uK usA ITA80TR25
35:2::::
:302TR uK usA ITA70C UK25
 
 

It would be nice if we could get guidance on this case with the Text Miner perspective.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 0 replies
  • 536 views
  • 0 likes
  • 1 in conversation