BookmarkSubscribeRSS Feed
PhilfromGermany
Fluorite | Level 6

Hi,

I'm trying to solve the following problem by merging multiple data sets.

I have a data set that contains information about employees:

Department IDEmp. IDRole
123450012IT Lead
123450013SE
234560014SE

And another data set that contains information regarding their salary:

Department IDSalaryRole
1234570,000Project Lead
1234555,000Software Engineer
2345680,000SE

I want to combine those two data sets and remove the ID column in the first data set and the role column in the second data set. The resulting data set should only contain one column that contains both salary and role. I call this column "Attribute".

Department IDIDAttribute
123450012IT Lead
12345001270,000
123450013SE
12345001355,000
234560014SE
23456001480,000
...

Roles within the departments are unique. Emp IDs are unique, as well. The problem is that the second data set does not contain the employee ID column. However, we know that there is only one IT Lead in the department and that his emp ID is 0012. That information is given in the first data set.

Another problem is that in the second data set, there are 5 possible values to choose from for each role, i.d. SE could be Software Engineer, SE, Soft-Eng, S.E., or S-Eng. These synonyms are predefined. The roles in the first data set are already standardized and do not vary.

I would really appreciate your help.

Regards,

Phil

6 REPLIES 6
ballardw
Super User

Do you already have a table or something that matches the "Project Lead" to "IT Lead", "Software Engineer" to "SE" and any other pairs of "Roles" that have different values but mean the same? If not, that's going to be a needed piece unless you only have the three values shown.

PhilfromGermany
Fluorite | Level 6

Hi,

yes, that table exists. It looks like this:

RoleSynonym
IT LeadIT Lead, ITL, Lead, Project Lead, Program Manager
SESoftware Engineer, Soft.-Eng., S-eng, S-Engineer, SE
......
yeshwanth
Fluorite | Level 6

Hello

We need to use Merge and Append to get the result.

Try this

data y1;
input deptno empid role $;
cards;
1234 1 lead
1234 2 lead1
9999 1 it
;
run;

data y2;
input deptno salary $;
cards;
1234 1000
1234 2000
;
run;

proc sort data=y1;by deptno;run;
proc sort data=y2;by deptno;run;

data y3;
merge y1(in=a) y2(in=b);
by deptno;
if a and b;
run;

data y4;
set y3;
rename role=attribute;
run;

data y5;
set y3;
rename salary=attribute;
run;

data final(drop= salary role);
set y4 y5;
run;

PhilfromGermany
Fluorite | Level 6

Thank you for your answer, yeshwanth. However, your code does not solve the problem, primarily because it ignores the role and only matches based on sorting

Ksharp
Super User

OK.

data emp;
input DepartmentID $     EmpID $     Role & $20.;
cards;
12345     0012     IT Lead
12345     0013     SE
23456     0014     SE
;
run;
data dic;
infile cards truncover;
input Synonym $100.;
cards;
IT Lead     IT Lead, ITL, Lead, Project Lead, Program Manager
SE     Software Engineer, Soft.-Eng., S-eng, S-Engineer, SE
;
run;
data salary;
input DepartmentID     $ Salary $     Role & $20.;
cards;
12345     70,000     Project Lead
12345     55,000     Software Engineer
23456     80,000     SE
;
run;

proc sql;
 create table x as
  select *
   from emp,dic
    where Synonym contains strip(Role) ;
quit;
data want;
 if _n_ eq 1 then do;
  if 0 then set salary(rename=(role=_role));
  declare hash ha(dataset:'salary(rename=(role=_role))');
   ha.definekey('DepartmentID','_Role');
   ha.definedata('Salary');
   ha.definedone();
end;
set x; 
do i=1 to countw(Synonym,',');
 _role=strip(scan(Synonym,i,','));
 if ha.find()=0 then do; output;call missing(salary);leave;end;
end;
run;

Ksharp

UrvishShah
Fluorite | Level 6

Hello,

How about this one...

data one;

   input dept_id emp_id role $12.;

   cards4;

12345 0012 IT Lead

12345 0013 SE

23456 0014 SE

;;;;

data two;

  input dept_id salary role $12.;

  cards4;

12345 70000 Project Lead

12345 55000 Software Eng.

23456 80000 SE

;;;;

data both(drop = salary role);

    merge one two(drop = role);

    by dept_id;

  dept_id = dept_id;

  ID = emp_id;

  attributes = compress(put(salary,8.));

    output;

  attributes = compress(role);

    output;

run;

-Urvish

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1086 views
  • 0 likes
  • 5 in conversation