BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cm3
Fluorite | Level 6 cm3
Fluorite | Level 6

Hi 

I have two datasets like below

DATA T1;
    
INPUT ssid gender $ grade;
     DATALINES ;
100 M 08
101 M 08
200 F 08
300 F 08

400 M 08
;
RUN;

 

DATA T2;
    
INPUT old new tmp $;
     DATALINES ;
100 101 abc
200 201 bcd
400 401 def
;
RUN;

 

I want to merge these two data sets to get the below output

ssid gender grade

101 M 08

201 F 08

300 F 08

401 M 08

 

appreciate your help on this code, without "WARNING: Variable already exists on file" in the log as my data is large

thanks

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
hashman
Ammonite | Level 13

@cm3:

If I understand your specs correctly, you want to update SSID in T1 to NEW from T2 if SSID=OLD but don't want duplicate records in the output. If so, the hash object is your friend:

data t1 ;                                                                                                                                                                                                                                                       
  input ssid gender $ grade ;                                                                                                                                                                                                                                   
  cards ;                                                                                                                                                                                                                                                       
100 m 08                                                                                                                                                                                                                                                        
101 m 08                                                                                                                                                                                                                                                        
200 f 08                                                                                                                                                                                                                                                        
300 f 08                                                                                                                                                                                                                                                        
400 m 08                                                                                                                                                                                                                                                        
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data t2 ;                                                                                                                                                                                                                                                       
  input old new tmp $ ;                                                                                                                                                                                                                                         
  cards ;                                                                                                                                                                                                                                                       
100 101 abc                                                                                                                                                                                                                                                     
200 201 bcd                                                                                                                                                                                                                                                     
400 401 def                                                                                                                                                                                                                                                     
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want (keep = ssid gender grade) ;                                                                                                                                                                                                                          
  if _n_ = 1 then do ;                                                                                                                                                                                                                                          
    if 0 then set t2 ;                                                                                                                                                                                                                                          
    dcl hash h (dataset:"t2") ;                                                                                                                                                                                                                                 
    h.definekey ("old") ;                                                                                                                                                                                                                                       
    h.definedata ("new") ;                                                                                                                                                                                                                                      
    h.definedone () ;                                                                                                                                                                                                                                           
    dcl hash x () ;                                                                                                                                                                                                                                             
    x.definekey ("ssid", "gender", "grade") ;                                                                                                                                                                                                                   
    x.definedone () ;                                                                                                                                                                                                                                           
  end ;                                                                                                                                                                                                                                                         
  set t1 ;                                                                                                                                                                                                                                                      
  if h.find (key:ssid) = 0 then ssid = new ;                                                                                                                                                                                                                    
  if x.check() ne 0 ;                                                                                                                                                                                                                                           
  x.add() ;                                                                                                                                                                                                                                                     
run ;                                     

The hash table H is used to effectuate the updates; the hash table X is used to exclude duplicates.

 

Kind regards

Paul D.

View solution in original post

5 REPLIES 5
Shmuel
Garnet | Level 18

Two methods: (1) using sql  (2) using merge datastep:

/*1*/
proc sql;
  create table want as select *
  from t1  where
    ssid not in (select old from t2);
quit;

/*2 - assuming data are sorted */
data want;
  merge t1(in=in1)
        t2(in=in2  rename=(ols=ssid));
   by ssid;
        if in1 and not in2;
run;
cm3
Fluorite | Level 6 cm3
Fluorite | Level 6

Appreciate your quick response.

I can't use subquery option as I have another variable, old to new update which I did not mention in my example.

 

And with the merge logic ,  new 201 and 401 records are missing in output data set.

 

thank you

Shmuel
Garnet | Level 18

Its not clear what table T2 represent and how to deal with.

Where from do you get the 201, 401 records' attributes if they are not exist in T1 tabel ?

hashman
Ammonite | Level 13

@cm3:

If I understand your specs correctly, you want to update SSID in T1 to NEW from T2 if SSID=OLD but don't want duplicate records in the output. If so, the hash object is your friend:

data t1 ;                                                                                                                                                                                                                                                       
  input ssid gender $ grade ;                                                                                                                                                                                                                                   
  cards ;                                                                                                                                                                                                                                                       
100 m 08                                                                                                                                                                                                                                                        
101 m 08                                                                                                                                                                                                                                                        
200 f 08                                                                                                                                                                                                                                                        
300 f 08                                                                                                                                                                                                                                                        
400 m 08                                                                                                                                                                                                                                                        
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data t2 ;                                                                                                                                                                                                                                                       
  input old new tmp $ ;                                                                                                                                                                                                                                         
  cards ;                                                                                                                                                                                                                                                       
100 101 abc                                                                                                                                                                                                                                                     
200 201 bcd                                                                                                                                                                                                                                                     
400 401 def                                                                                                                                                                                                                                                     
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want (keep = ssid gender grade) ;                                                                                                                                                                                                                          
  if _n_ = 1 then do ;                                                                                                                                                                                                                                          
    if 0 then set t2 ;                                                                                                                                                                                                                                          
    dcl hash h (dataset:"t2") ;                                                                                                                                                                                                                                 
    h.definekey ("old") ;                                                                                                                                                                                                                                       
    h.definedata ("new") ;                                                                                                                                                                                                                                      
    h.definedone () ;                                                                                                                                                                                                                                           
    dcl hash x () ;                                                                                                                                                                                                                                             
    x.definekey ("ssid", "gender", "grade") ;                                                                                                                                                                                                                   
    x.definedone () ;                                                                                                                                                                                                                                           
  end ;                                                                                                                                                                                                                                                         
  set t1 ;                                                                                                                                                                                                                                                      
  if h.find (key:ssid) = 0 then ssid = new ;                                                                                                                                                                                                                    
  if x.check() ne 0 ;                                                                                                                                                                                                                                           
  x.add() ;                                                                                                                                                                                                                                                     
run ;                                     

The hash table H is used to effectuate the updates; the hash table X is used to exclude duplicates.

 

Kind regards

Paul D.

cm3
Fluorite | Level 6 cm3
Fluorite | Level 6

Thanks Paul D, It worked. 

 

Sorry  @Shmuel, my specifications were not clear

 

I figured out the below code which worked as well.

 

proc sql;
create table want as
select A.*, B.*
from T1 A left outer join T2 B
on A.SSID = B.old
or A.SSID = B.new;
quit;

data want (drop= old new tmp);
set want;
if SSID=new then delete;
    if old ^= . then do;
        SSID=new;
    end;
run;

 

Regards,

cm3

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1026 views
  • 0 likes
  • 3 in conversation