DATA Step, Macro, Functions and more

Merging Data Sets

Reply
Contributor
Posts: 39

Merging Data Sets

Hi,

I'm trying to solve the following problem by merging multiple data sets.

I have a data set that contains information about employees:

Department IDEmp. IDRole
123450012IT Lead
123450013SE
234560014SE

And another data set that contains information regarding their salary:

Department IDSalaryRole
1234570,000Project Lead
1234555,000Software Engineer
2345680,000SE

I want to combine those two data sets and remove the ID column in the first data set and the role column in the second data set. The resulting data set should only contain one column that contains both salary and role. I call this column "Attribute".

Department IDIDAttribute
123450012IT Lead
12345001270,000
123450013SE
12345001355,000
234560014SE
23456001480,000
...

Roles within the departments are unique. Emp IDs are unique, as well. The problem is that the second data set does not contain the employee ID column. However, we know that there is only one IT Lead in the department and that his emp ID is 0012. That information is given in the first data set.

Another problem is that in the second data set, there are 5 possible values to choose from for each role, i.d. SE could be Software Engineer, SE, Soft-Eng, S.E., or S-Eng. These synonyms are predefined. The roles in the first data set are already standardized and do not vary.

I would really appreciate your help.

Regards,

Phil

Super User
Posts: 10,497

Re: Merging Data Sets

Do you already have a table or something that matches the "Project Lead" to "IT Lead", "Software Engineer" to "SE" and any other pairs of "Roles" that have different values but mean the same? If not, that's going to be a needed piece unless you only have the three values shown.

Contributor
Posts: 39

Re: Merging Data Sets

Hi,

yes, that table exists. It looks like this:

RoleSynonym
IT LeadIT Lead, ITL, Lead, Project Lead, Program Manager
SESoftware Engineer, Soft.-Eng., S-eng, S-Engineer, SE
......
Contributor
Posts: 22

Re: Merging Data Sets

Hello

We need to use Merge and Append to get the result.

Try this

data y1;
input deptno empid role $;
cards;
1234 1 lead
1234 2 lead1
9999 1 it
;
run;

data y2;
input deptno salary $;
cards;
1234 1000
1234 2000
;
run;

proc sort data=y1;by deptno;run;
proc sort data=y2;by deptno;run;

data y3;
merge y1(in=a) y2(in=b);
by deptno;
if a and b;
run;

data y4;
set y3;
rename role=attribute;
run;

data y5;
set y3;
rename salary=attribute;
run;

data final(drop= salary role);
set y4 y5;
run;

Contributor
Posts: 39

Re: Merging Data Sets

Thank you for your answer, yeshwanth. However, your code does not solve the problem, primarily because it ignores the role and only matches based on sorting

Super User
Posts: 9,676

Re: Merging Data Sets

OK.

data emp;
input DepartmentID $     EmpID $     Role & $20.;
cards;
12345     0012     IT Lead
12345     0013     SE
23456     0014     SE
;
run;
data dic;
infile cards truncover;
input Synonym $100.;
cards;
IT Lead     IT Lead, ITL, Lead, Project Lead, Program Manager
SE     Software Engineer, Soft.-Eng., S-eng, S-Engineer, SE
;
run;
data salary;
input DepartmentID     $ Salary $     Role & $20.;
cards;
12345     70,000     Project Lead
12345     55,000     Software Engineer
23456     80,000     SE
;
run;

proc sql;
 create table x as
  select *
   from emp,dic
    where Synonym contains strip(Role) ;
quit;
data want;
 if _n_ eq 1 then do;
  if 0 then set salary(rename=(role=_role));
  declare hash ha(dataset:'salary(rename=(role=_role))');
   ha.definekey('DepartmentID','_Role');
   ha.definedata('Salary');
   ha.definedone();
end;
set x; 
do i=1 to countw(Synonym,',');
 _role=strip(scan(Synonym,i,','));
 if ha.find()=0 then do; output;call missing(salary);leave;end;
end;
run;

Ksharp

Regular Contributor
Posts: 195

Re: Merging Data Sets

Hello,

How about this one...

data one;

   input dept_id emp_id role $12.;

   cards4;

12345 0012 IT Lead

12345 0013 SE

23456 0014 SE

;;;;

data two;

  input dept_id salary role $12.;

  cards4;

12345 70000 Project Lead

12345 55000 Software Eng.

23456 80000 SE

;;;;

data both(drop = salary role);

    merge one two(drop = role);

    by dept_id;

  dept_id = dept_id;

  ID = emp_id;

  attributes = compress(put(salary,8.));

    output;

  attributes = compress(role);

    output;

run;

-Urvish

Ask a Question
Discussion stats
  • 6 replies
  • 283 views
  • 0 likes
  • 5 in conversation