Calcite | Level 5

## Help with combining and merging data sets

I have three separate data sets below. I want to use a data step to show for each of the new employees and interns who their manager is and which department they will work in. Also, which department has the most new people. I don't know how to sort the new_employee and new_intern data sets to then have them merge with the manager data set.

```data New_Employee;
length name \$15;
input name \$ team training_course work_experience;
datalines;
Bohr,Neils 1 0 3
Zook,Carla 2 2 2
Penrose,Roger 1 3 1
Martinez,Maria 3 1 2
Orfali,Philip 2 1 1
;
run;

data New_Intern;
length name \$15 major \$20;
input name \$ team training_course major \$;
datalines;
Capalleti,Jimmy 3 1 FashionDesign
Cannon,Annie 1 0 Mathematics
Einstein,Albert 1 1 ComputerScience
;
run;

data Manager;
length mname \$15 department \$10;
input mname \$ team department \$;
datalines;
Wilson,Kenneth 1 Operation
Bardeen,John 2 Marketing
Sorrell,Joseph 3 Design
;
run;```

5 REPLIES 5
Diamond | Level 26

## Re: Help with combining and merging data sets

The only variable you can possibly merge by is TEAM, as that exists in all three data sets.

Therefore, you would have to sort all three data sets by TEAM before doing the merge.

--
Paige Miller
Meteorite | Level 14

## Re: Help with combining and merging data sets

Something like this?

``````proc sort data=New_Employee(keep=name team) out=emp;
by team name;
run;

proc sort data=New_Intern(keep=name team) out=int;
by team name;
run;

data all;
set emp int;
by team name;
run;

proc sort data=Manager;
by team;
run;

data want;
if 0 then set all manager; /* just to get the right order of variables */
do count=1 by 1 until(last.team);
set all;
by team;
end;
do until(last.team);
merge all manager;
by team;
output;
end;
run;
``````
Opal | Level 21

## Re: Help with combining and merging data sets

Below using SQL.

``````proc sql;
select
m.*
,emp.emp_type
,emp.name
,emp.training_course
,emp.work_experience
,emp.major
from manager m

left join
(
select
team
,'new_employee' as emp_type
,name
,training_course
,work_experience
,'' as major
from new_employee
union corr all
select
team
,'new_intern' as emp_type
,name
,training_course
,major
,. as work_experience
from new_intern
) emp
on m.team=emp.team

order by team, name
;
quit;``````
Calcite | Level 5

## Re: Help with combining and merging data sets

I am trying to make a process run faster, that uses 3 years worth of hospital data. My goal is to just run the past 3 months of data, and append it to the existing large dataset. If a patient identifier appears in both tables, I want to only use the data from the new table. For example: data have_a; input ID amount; datalines; 1 10 3 15 4 20 7 10 7 15 9 12 10 14 ; run; data have_b; input ID amount; datalines; 2 15 3 20 4 10 4 15 5 12 7 20 8 15 9 10 11 20 ; run; data test; merge have_a have_b; by ID; run; My goal: If have_a and have_b contain any ID# in common, it should only keep the values from have_b. However, as it stands, ID#7 has 2 rows in the final dataset (with amounts 20 and 15) when it should only have 1. This is a very simplified version of my real data, which contains 20+ columns and millions of rows.
Super User

## Re: Help with combining and merging data sets

How does this relate to the original question about combining 3 data sets???

This looks like it should be a completely new topic as your requirements are very different from the original question starting this thread.

Also, you seem not to actually want an Append (add records) as an UPDATE, which would replace the values in the master table from a new (transaction) data set based on matching criteria, likely the ID. An update will add new records if the Id is new.

If you have corrupted your master data set by creating records with duplicate ids then you need to recover the old version before you start.

Discussion stats
• 5 replies
• 840 views
• 0 likes
• 6 in conversation