BookmarkSubscribeRSS Feed
daufoi
Fluorite | Level 6

I have a data set with job titles and job codes that are 1 to 8 digits long. I have another data set with a study population that have job codes from 10 to 12 digits long. The first 1 to 8 digits will match the job codes from the first data set. My plan is to write code to match the first 8 digits. If it matches, done. If not, match the first 7 digits. If it matches, done. If not, match the first 6 digits. And keep going until it's done. My question is how to write this in SAS. Would the "or" operator achieve this way of matching on the highest number digits, stopping if it finds a match, and looking for the next number of digits if not? Should the code look like this?

 

proc sql;
	create table job_titles as
	select A.*, B.job_title
	from study_population as A
	left join occupation_data as B
    	on A.industry = B.industry
		and A.seniority = A.seniority 
		and (upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,8))
			or upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,7))
			or upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,6))
			or upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,5))
			or upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,4))
			or upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,3))
			or upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,2))
			or upcase(A.job_code) = upcase(substr(B.Occupation_Code,1,1)) );
quit;

 

You'll notice that I am matching by industry and seniority first because the code will depend on those 2 things as well.

3 REPLIES 3
Reeza
Super User
	and (upcase(A.job_code) =: upcase(substr(B.Occupation_Code,1,8))
			

I think you can do just that in SAS (colon addition) but not 100% sure it works in SQL. Definitely test it out. 

 

 

PaigeMiller
Diamond | Level 26

=: does not work in SQL, but eqt does work in SQL to see if the start of strings match.

 

To @daufoi why do you need the upcase() function if the strings are digits?

 

... and a.job_code eqt b.occupation_code
--
Paige Miller
s_lassen
Meteorite | Level 14

So you want to find the best match. In that case, you will have to calculate the quality of the match in an inner query (code not tested):

 

proc sql;
  create table job_titles as
  select *
    from(
    select A.*,b.Jobtitle,
      case 
        when upcase(substr(A.jobcode,1,8))=upcase(substr(B.Occupation_Code,1,8)) then 8
        when upcase(substr(A.jobcode,1,7))=upcase(substr(B.Occupation_Code,1,7)) then 7        
        when upcase(substr(A.jobcode,1,6))=upcase(substr(B.Occupation_Code,1,6)) then 6
        when upcase(substr(A.jobcode,1,5))=upcase(substr(B.Occupation_Code,1,5)) then 5
        when upcase(substr(A.jobcode,1,4))=upcase(substr(B.Occupation_Code,1,4)) then 4
        when upcase(substr(A.jobcode,1,3))=upcase(substr(B.Occupation_Code,1,3)) then 3
        when upcase(substr(A.jobcode,1,2))=upcase(substr(B.Occupation_Code,1,2)) then 2
        when upcase(substr(A.jobcode,1,1))=upcase(substr(B.Occupation_Code,1,1)) then 1
        else 0 
        end as match        
	from study_population as A
	left join occupation_data as B
    	on A.industry = B.industry
	    and A.seniority = A.seniority 
     )
  where match>0
  group by industry,seniority
  having match=max(match)
  ;
quit;

This will also put the MATCH column on the output, if you do not want that you will have to name the columns explicitly in the outer query, or use a DROP= option on the output table.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 463 views
  • 2 likes
  • 4 in conversation