BookmarkSubscribeRSS Feed
Fluorite | Level 6

I have a data set with job titles and job codes that are 1 to 8 digits long. I have another data set with a study population that have job codes from 10 to 12 digits long. The first 1 to 8 digits will match the job codes from the first data set. My plan is to write code to match the first 8 digits. If it matches, done. If not, match the first 7 digits. If it matches, done. If not, match the first 6 digits. And keep going until it's done. My question is how to write this in SAS. Would the "or" operator achieve this way of matching on the highest number digits, stopping if it finds a match, and looking for the next number of digits if not? Should the code look like this?


proc sql;
	create table job_titles as
	select A.*, B.job_title
	from study_population as A
	left join occupation_data as B
    	on A.industry = B.industry
		and A.seniority = A.seniority 
		and (upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,8))
			or upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,7))
			or upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,6))
			or upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,5))
			or upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,4))
			or upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,3))
			or upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,2))
			or upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,1)) );


You'll notice that I am matching by industry and seniority first because the code will depend on those 2 things as well.

Super User
	and (upcase(A.job_code) =: upcase(substr(B.Occupation_Code,1,8))

I think you can do just that in SAS (colon addition) but not 100% sure it works in SQL. Definitely test it out. 



Diamond | Level 26

=: does not work in SQL, but eqt does work in SQL to see if the start of strings match.


To @daufoi why do you need the upcase() function if the strings are digits?


... and a.job_code eqt b.occupation_code
Paige Miller
Meteorite | Level 14

So you want to find the best match. In that case, you will have to calculate the quality of the match in an inner query (code not tested):


proc sql;
  create table job_titles as
  select *
    select A.*,b.Jobtitle,
        when upcase(substr(A.jobcode,1,8))=upcase(substr(B.Occupation_Code,1,8)) then 8
        when upcase(substr(A.jobcode,1,7))=upcase(substr(B.Occupation_Code,1,7)) then 7        
        when upcase(substr(A.jobcode,1,6))=upcase(substr(B.Occupation_Code,1,6)) then 6
        when upcase(substr(A.jobcode,1,5))=upcase(substr(B.Occupation_Code,1,5)) then 5
        when upcase(substr(A.jobcode,1,4))=upcase(substr(B.Occupation_Code,1,4)) then 4
        when upcase(substr(A.jobcode,1,3))=upcase(substr(B.Occupation_Code,1,3)) then 3
        when upcase(substr(A.jobcode,1,2))=upcase(substr(B.Occupation_Code,1,2)) then 2
        when upcase(substr(A.jobcode,1,1))=upcase(substr(B.Occupation_Code,1,1)) then 1
        else 0 
        end as match        
	from study_population as A
	left join occupation_data as B
    	on A.industry = B.industry
	    and A.seniority = A.seniority 
  where match>0
  group by industry,seniority
  having match=max(match)

This will also put the MATCH column on the output, if you do not want that you will have to name the columns explicitly in the outer query, or use a DROP= option on the output table.


Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 4 in conversation