Hi,
I'm trying to solve the following problem by merging multiple data sets.
I have a data set that contains information about employees:
Department ID | Emp. ID | Role |
---|---|---|
12345 | 0012 | IT Lead |
12345 | 0013 | SE |
23456 | 0014 | SE |
And another data set that contains information regarding their salary:
Department ID | Salary | Role |
---|---|---|
12345 | 70,000 | Project Lead |
12345 | 55,000 | Software Engineer |
23456 | 80,000 | SE |
I want to combine those two data sets and remove the ID column in the first data set and the role column in the second data set. The resulting data set should only contain one column that contains both salary and role. I call this column "Attribute".
Department ID | ID | Attribute |
---|---|---|
12345 | 0012 | IT Lead |
12345 | 0012 | 70,000 |
12345 | 0013 | SE |
12345 | 0013 | 55,000 |
23456 | 0014 | SE |
23456 | 0014 | 80,000 |
... |
Roles within the departments are unique. Emp IDs are unique, as well. The problem is that the second data set does not contain the employee ID column. However, we know that there is only one IT Lead in the department and that his emp ID is 0012. That information is given in the first data set.
Another problem is that in the second data set, there are 5 possible values to choose from for each role, i.d. SE could be Software Engineer, SE, Soft-Eng, S.E., or S-Eng. These synonyms are predefined. The roles in the first data set are already standardized and do not vary.
I would really appreciate your help.
Regards,
Phil
Do you already have a table or something that matches the "Project Lead" to "IT Lead", "Software Engineer" to "SE" and any other pairs of "Roles" that have different values but mean the same? If not, that's going to be a needed piece unless you only have the three values shown.
Hi,
yes, that table exists. It looks like this:
Role | Synonym |
---|---|
IT Lead | IT Lead, ITL, Lead, Project Lead, Program Manager |
SE | Software Engineer, Soft.-Eng., S-eng, S-Engineer, SE |
... | ... |
Hello
We need to use Merge and Append to get the result.
Try this
data y1;
input deptno empid role $;
cards;
1234 1 lead
1234 2 lead1
9999 1 it
;
run;
data y2;
input deptno salary $;
cards;
1234 1000
1234 2000
;
run;
proc sort data=y1;by deptno;run;
proc sort data=y2;by deptno;run;
data y3;
merge y1(in=a) y2(in=b);
by deptno;
if a and b;
run;
data y4;
set y3;
rename role=attribute;
run;
data y5;
set y3;
rename salary=attribute;
run;
data final(drop= salary role);
set y4 y5;
run;
Thank you for your answer, yeshwanth. However, your code does not solve the problem, primarily because it ignores the role and only matches based on sorting
OK.
data emp; input DepartmentID $ EmpID $ Role & $20.; cards; 12345 0012 IT Lead 12345 0013 SE 23456 0014 SE ; run; data dic; infile cards truncover; input Synonym $100.; cards; IT Lead IT Lead, ITL, Lead, Project Lead, Program Manager SE Software Engineer, Soft.-Eng., S-eng, S-Engineer, SE ; run; data salary; input DepartmentID $ Salary $ Role & $20.; cards; 12345 70,000 Project Lead 12345 55,000 Software Engineer 23456 80,000 SE ; run; proc sql; create table x as select * from emp,dic where Synonym contains strip(Role) ; quit; data want; if _n_ eq 1 then do; if 0 then set salary(rename=(role=_role)); declare hash ha(dataset:'salary(rename=(role=_role))'); ha.definekey('DepartmentID','_Role'); ha.definedata('Salary'); ha.definedone(); end; set x; do i=1 to countw(Synonym,','); _role=strip(scan(Synonym,i,',')); if ha.find()=0 then do; output;call missing(salary);leave;end; end; run;
Ksharp
Hello,
How about this one...
data one;
input dept_id emp_id role $12.;
cards4;
12345 0012 IT Lead
12345 0013 SE
23456 0014 SE
;;;;
data two;
input dept_id salary role $12.;
cards4;
12345 70000 Project Lead
12345 55000 Software Eng.
23456 80000 SE
;;;;
data both(drop = salary role);
merge one two(drop = role);
by dept_id;
dept_id = dept_id;
ID = emp_id;
attributes = compress(put(salary,8.));
output;
attributes = compress(role);
output;
run;
-Urvish
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.