Solved: help with merge/join to get only new ssid

cm3 · Posted 09-26-2019 02:37 PM

Hi

I have two datasets like below

DATA T1;
INPUT ssid gender $ grade;
DATALINES ;
100 M 08
101 M 08
200 F 08
300 F 08

400 M 08
;
RUN;

DATA T2;
INPUT old new tmp $;
DATALINES ;
100 101 abc
200 201 bcd
400 401 def
;
RUN;

I want to merge these two data sets to get the below output

ssid gender grade

101 M 08

201 F 08

300 F 08

401 M 08

appreciate your help on this code, without "WARNING: Variable already exists on file" in the log as my data is large

thanks

hashman · Posted 09-27-2019 12:52 AM

@cm3:

If I understand your specs correctly, you want to update SSID in T1 to NEW from T2 if SSID=OLD but don't want duplicate records in the output. If so, the hash object is your friend:

data t1 ;                                                                                                                                                                                                                                                       
  input ssid gender $ grade ;                                                                                                                                                                                                                                   
  cards ;                                                                                                                                                                                                                                                       
100 m 08                                                                                                                                                                                                                                                        
101 m 08                                                                                                                                                                                                                                                        
200 f 08                                                                                                                                                                                                                                                        
300 f 08                                                                                                                                                                                                                                                        
400 m 08                                                                                                                                                                                                                                                        
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data t2 ;                                                                                                                                                                                                                                                       
  input old new tmp $ ;                                                                                                                                                                                                                                         
  cards ;                                                                                                                                                                                                                                                       
100 101 abc                                                                                                                                                                                                                                                     
200 201 bcd                                                                                                                                                                                                                                                     
400 401 def                                                                                                                                                                                                                                                     
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want (keep = ssid gender grade) ;                                                                                                                                                                                                                          
  if _n_ = 1 then do ;                                                                                                                                                                                                                                          
    if 0 then set t2 ;                                                                                                                                                                                                                                          
    dcl hash h (dataset:"t2") ;                                                                                                                                                                                                                                 
    h.definekey ("old") ;                                                                                                                                                                                                                                       
    h.definedata ("new") ;                                                                                                                                                                                                                                      
    h.definedone () ;                                                                                                                                                                                                                                           
    dcl hash x () ;                                                                                                                                                                                                                                             
    x.definekey ("ssid", "gender", "grade") ;                                                                                                                                                                                                                   
    x.definedone () ;                                                                                                                                                                                                                                           
  end ;                                                                                                                                                                                                                                                         
  set t1 ;                                                                                                                                                                                                                                                      
  if h.find (key:ssid) = 0 then ssid = new ;                                                                                                                                                                                                                    
  if x.check() ne 0 ;                                                                                                                                                                                                                                           
  x.add() ;                                                                                                                                                                                                                                                     
run ;

The hash table H is used to effectuate the updates; the hash table X is used to exclude duplicates.

Kind regards

Paul D.

View solution in original post

Shmuel · Posted 09-26-2019 02:47 PM

Two methods: (1) using sql (2) using merge datastep:

/*1*/
proc sql;
  create table want as select *
  from t1  where
    ssid not in (select old from t2);
quit;

/*2 - assuming data are sorted */
data want;
  merge t1(in=in1)
        t2(in=in2  rename=(ols=ssid));
   by ssid;
        if in1 and not in2;
run;

cm3 · Posted 09-26-2019 03:25 PM

Appreciate your quick response.

I can't use subquery option as I have another variable, old to new update which I did not mention in my example.

And with the merge logic , new 201 and 401 records are missing in output data set.

thank you

Shmuel · Posted 09-26-2019 04:48 PM

Its not clear what table T2 represent and how to deal with.

Where from do you get the 201, 401 records' attributes if they are not exist in T1 tabel ?

hashman · Posted 09-27-2019 12:52 AM

@cm3:

If I understand your specs correctly, you want to update SSID in T1 to NEW from T2 if SSID=OLD but don't want duplicate records in the output. If so, the hash object is your friend:

data t1 ;                                                                                                                                                                                                                                                       
  input ssid gender $ grade ;                                                                                                                                                                                                                                   
  cards ;                                                                                                                                                                                                                                                       
100 m 08                                                                                                                                                                                                                                                        
101 m 08                                                                                                                                                                                                                                                        
200 f 08                                                                                                                                                                                                                                                        
300 f 08                                                                                                                                                                                                                                                        
400 m 08                                                                                                                                                                                                                                                        
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data t2 ;                                                                                                                                                                                                                                                       
  input old new tmp $ ;                                                                                                                                                                                                                                         
  cards ;                                                                                                                                                                                                                                                       
100 101 abc                                                                                                                                                                                                                                                     
200 201 bcd                                                                                                                                                                                                                                                     
400 401 def                                                                                                                                                                                                                                                     
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want (keep = ssid gender grade) ;                                                                                                                                                                                                                          
  if _n_ = 1 then do ;                                                                                                                                                                                                                                          
    if 0 then set t2 ;                                                                                                                                                                                                                                          
    dcl hash h (dataset:"t2") ;                                                                                                                                                                                                                                 
    h.definekey ("old") ;                                                                                                                                                                                                                                       
    h.definedata ("new") ;                                                                                                                                                                                                                                      
    h.definedone () ;                                                                                                                                                                                                                                           
    dcl hash x () ;                                                                                                                                                                                                                                             
    x.definekey ("ssid", "gender", "grade") ;                                                                                                                                                                                                                   
    x.definedone () ;                                                                                                                                                                                                                                           
  end ;                                                                                                                                                                                                                                                         
  set t1 ;                                                                                                                                                                                                                                                      
  if h.find (key:ssid) = 0 then ssid = new ;                                                                                                                                                                                                                    
  if x.check() ne 0 ;                                                                                                                                                                                                                                           
  x.add() ;                                                                                                                                                                                                                                                     
run ;

The hash table H is used to effectuate the updates; the hash table X is used to exclude duplicates.

Kind regards

Paul D.

cm3 · Posted 09-27-2019 11:08 AM

Thanks Paul D, It worked.

Sorry @Shmuel, my specifications were not clear

I figured out the below code which worked as well.

proc sql;
create table want as
select A.*, B.*
from T1 A left outer join T2 B
on A.SSID = B.old
or A.SSID = B.new;
quit;

data want (drop= old new tmp);
set want;
if SSID=new then delete;
if old ^= . then do;
SSID=new;
end;
run;

Regards,

cm3

help with merge/join to get only new ssid

Re: help with merge/join to get only new ssid

Re: help with merge/join to get only new ssid

Re: help with merge/join to get only new ssid

Re: help with merge/join to get only new ssid

Re: help with merge/join to get only new ssid

Re: help with merge/join to get only new ssid

Registration is open

Call for Content EXTENDED

help with merge/join to get only new ssid

Re: help with merge/join to get only new ssid

Re: help with merge/join to get only new ssid

Re: help with merge/join to get only new ssid

Re: help with merge/join to get only new ssid

Re: help with merge/join to get only new ssid

Re: help with merge/join to get only new ssid

Registration is open

Call for Content EXTENDED

SAS Training: Just a Click Away