BookmarkSubscribeRSS Feed
hovicke
Calcite | Level 5

I have three separate data sets below. I want to use a data step to show for each of the new employees and interns who their manager is and which department they will work in. Also, which department has the most new people. I don't know how to sort the new_employee and new_intern data sets to then have them merge with the manager data set.

 

data New_Employee;
length name $15;
input name $ team training_course work_experience;
datalines;
Bohr,Neils 1 0 3
Zook,Carla 2 2 2
Penrose,Roger 1 3 1
Martinez,Maria 3 1 2
Orfali,Philip 2 1 1
;
run;


data New_Intern;
length name $15 major $20;
input name $ team training_course major $;
datalines;
Capalleti,Jimmy 3 1 FashionDesign
Chen,Len 2 0 BusinessAnalytics
Cannon,Annie 1 0 Mathematics
Davis,Brad 3 0 Art
Einstein,Albert 1 1 ComputerScience
;
run;


data Manager;
length mname $15 department $10;
input mname $ team department $;
datalines;
Wilson,Kenneth 1 Operation
Bardeen,John 2 Marketing
Sorrell,Joseph 3 Design
;
run;

 

5 REPLIES 5
PaigeMiller
Diamond | Level 26

The only variable you can possibly merge by is TEAM, as that exists in all three data sets.

 

Therefore, you would have to sort all three data sets by TEAM before doing the merge.

--
Paige Miller
s_lassen
Meteorite | Level 14

Something like this?

proc sort data=New_Employee(keep=name team) out=emp;                                                                                    
  by team name;                                                                                                                         
run;                                                                                                                                    
                                                                                                                                        
proc sort data=New_Intern(keep=name team) out=int;                                                                                      
  by team name;                                                                                                                         
run;                                                                                                                                    
                                                                                                                                        
data all;                                                                                                                               
  set emp int;                                                                                                                          
  by team name;                                                                                                                         
run;                                                                                                                                    
                                                                                                                                        
proc sort data=Manager;                                                                                                                 
  by team;                                                                                                                              
run;                                                                                                                                    
                                                                                                                                        
data want;                                                                                                                              
  if 0 then set all manager; /* just to get the right order of variables */                                                                                                           
  do count=1 by 1 until(last.team);                                                                                                     
    set all;                                                                                                                            
    by team;                                                                                                                            
    end;                                                                                                                                
  do until(last.team);                                                                                                                  
    merge all manager;                                                                                                                  
    by team;                                                                                                                            
    output;                                                                                                                             
    end;                                                                                                                                
run;
Patrick
Opal | Level 21

Below using SQL.

proc sql;
  select 
    m.*
    ,emp.emp_type
    ,emp.name
    ,emp.training_course
    ,emp.work_experience
    ,emp.major
  from manager m

  left join 
  (
    select 
      team
      ,'new_employee' as emp_type
      ,name
      ,training_course
      ,work_experience
      ,'' as major
    from new_employee
    union corr all
    select
      team
      ,'new_intern' as emp_type
      ,name
      ,training_course
      ,major
      ,. as work_experience
    from new_intern 
  ) emp
  on m.team=emp.team

  order by team, name
  ;
quit;
ichisdeno
Calcite | Level 5
I am trying to make a process run faster, that uses 3 years worth of hospital data. My goal is to just run the past 3 months of data, and append it to the existing large dataset. If a patient identifier appears in both tables, I want to only use the data from the new table. For example: data have_a; input ID amount; datalines; 1 10 3 15 4 20 7 10 7 15 9 12 10 14 ; run; data have_b; input ID amount; datalines; 2 15 3 20 4 10 4 15 5 12 7 20 8 15 9 10 11 20 ; run; data test; merge have_a have_b; by ID; run; My goal: If have_a and have_b contain any ID# in common, it should only keep the values from have_b. However, as it stands, ID#7 has 2 rows in the final dataset (with amounts 20 and 15) when it should only have 1. This is a very simplified version of my real data, which contains 20+ columns and millions of rows.
ballardw
Super User

@ichisdeno 

How does this relate to the original question about combining 3 data sets???

 

This looks like it should be a completely new topic as your requirements are very different from the original question starting this thread.

Also, you seem not to actually want an Append (add records) as an UPDATE, which would replace the values in the master table from a new (transaction) data set based on matching criteria, likely the ID. An update will add new records if the Id is new.

If you have corrupted your master data set by creating records with duplicate ids then you need to recover the old version before you start.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1192 views
  • 0 likes
  • 6 in conversation