Re: merging data

Archibald2 · Posted 07-18-2018 12:01 PM

Hello,

I need to merge the DATASET 1 with DATASET 2 to get to a final dataset "DATA WANT" and I need some guidance.

please note that dataset 1 is a matched pair data of students and dataset 2 contains additional variables on the matched pairs data.

DATASET 1

groupA_id	groupB_id	groupA_grade	groupB_grade	groupA_sex	groupB_sex
00000-000001	00000-000200	2	2	1	1
00000-000001	00000-000201	2	2	1	1
00000-000001	00000-000202	2	2	1	1
00000-000002	00000-000203	2	2	1	1
00000-000002	00000-000204	2	2	1	1
00000-000002	00000-000205	2	2	1	1
00000-000003	00000-000206	2	2	1	1
00000-000003	00000-000207	2	2	1	1
00000-000003	00000-000208	2	2	1	1

DATASET 2

id	Siblings	economic status
00000-000001	Yes	low
00000-000001	yes	low
00000-000001	yes	low
00000-000002	yes	low
00000-000002	yes	low
00000-000002	yes	low
00000-000003	no	high
00000-000003	no	high
00000-000003	no	high
00000-000200	yes	low
00000-000201	no	low
00000-000202	no	high
00000-000203	no	medium
00000-000204	no	high
00000-000205	yes	low
00000-000206	no	medium
00000-000207	yes	high
00000-000208	yes	high

DATA WANT

id	grade	sex	Siblings	economic status
00000-000001	2	1	Yes	low
00000-000002	2	1	yes	low
00000-000003	2	1	no	high
00000-000002	2	1	yes	low
00000-000200	2	1	yes	low
00000-000201	2	1	no	low
00000-000202	2	1	no	high
00000-000203	2	1	no	medium
00000-000204	2	1	no	high
00000-000205	2	1	yes	low
00000-000206	2	1	no	medium
00000-000207	2	1	yes	high
00000-000208	2	1	yes	high

Any help would be appreciated. Thanks

hashman · Posted 07-18-2018 02:53 PM

Your sample output is inconsistent in terms of keeping ID duplicates because ID=00000-000002 is duplicated while the rest are not. Assuming that it's a typo and you don't want any dupes but only the first ID encountered in the data set 2, you can proceed as follows:

data want (drop = group: rc) ;                                                 
  if _n_ = 1 then do ;                                                         
    dcl hash h (ordered:"a") ;                                                 
    h.defineKey ("id") ;                                                       
    h.defineData ("id", "grade", "sex") ;                                      
    h.defineDone () ;                                                          
    do until (z) ;                                                             
      set ds1 end = z ;                                                        
      id = groupa_id ; grade = groupa_grade ; sex = groupa_sex ; rc = h.add() ;
      id = groupb_id ; grade = groupb_grade ; sex = groupb_sex ; rc = h.add() ;
    end ;                                                                      
    h.output (dataset:"hash") ;                                                
    dcl hash u () ;                                                            
    u.defineKey ("id") ;                                                       
    u.defineDone () ;                                                          
  end ;                                                                        
  set ds2 ;                                           
  if u.check() ne 0 ;                                                          
  u.add() ;                                                                    
  if h.find() ne 0 then call missing (grade, sex) ;                            
run ;

If you want to keep the dupes from DS2, get rid of the hash table U and the related statements with the method calls referencing U.

Archibald2 · Posted 07-18-2018 05:08 PM

Many thanks hashman

merging data

Re: merging data

Re: merging data

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away