DATA Step, Macro, Functions and more

How to create a variable based on the matching values of other two variables in the same dataset?

Accepted Solution Solved
Reply
Regular Learner
Posts: 1
Accepted Solution

How to create a variable based on the matching values of other two variables in the same dataset?

[ Edited ]

Hello everyone!

 

I'm using SAS Studio.

I have the following dataset with three variables: ID is the identification of the respondent, PARTNERID is the identification of the partner of the respondent and YEARSEDU is the number of years of education of the respondent.

 

 

ID                          PARTNERID           YEARSEDU        

AT-000327-01       AT-000327-02         15                      

AT-000327-02       AT-000327-01         20                      

AT-000328-01       .                              10                       

AT-000329-01       AT-000329-02           8                      

AT-000329-02       AT-000329-01         13                        

 

I'd like to create another variable that represents the years of education of the partner of the respondent so that I can obtain the following dataset:

 

ID                          PARTNERID           YEARSEDU        YEARSEDU_P  

AT-000327-01       AT-000327-02         15                      20

AT-000327-02       AT-000327-01         20                      15

AT-000328-01       .                              10                       .

AT-000329-01       AT-000329-02           8                      13

AT-000329-02       AT-000329-01         13                        8

 

I know I should compare the value of ID with every value of PARTNERID and if I find a match use the value of YEARSEDU in the matched PARTNERID row as value for YEARSEDU_P in the ID row. 

I really have no idea on how to do that, can you please help me?

 

Many thanks.

Regards,

Sara


Accepted Solutions
Solution
a month ago
Valued Guide
Posts: 631

Re: How to create a variable based on the matching values of other two variables in the same dataset

Posted in reply to saramanzella

If the dataset is not to large, using a hash-object can solve the problem:

 

data work.have;
   length id partnerId $ 15 yearsEdu 8;
   input id partnerId yearsEdu;
   datalines;
AT-000327-01 AT-000327-02 15 
AT-000327-02 AT-000327-01 20 
AT-000328-01 . 10 
AT-000329-01 AT-000329-02 8 
AT-000329-02 AT-000329-01 13 
;
run;

data want;   
   set work.have;

   length yearsEdu_P 8;

   if _n_ = 1 then do;
      declare hash h(dataset: 'work.have(keep=partnerId yearsEdu rename=(yearsEdu = yearsEdu_P))');
      h.defineKey('partnerId');
      h.defineData('yearsEdu_P');
      h.defineDone();
   end;

   if h.find(key: id) ^= 0 then do;
      yearsEdu_P = .;
   end;
run;

View solution in original post


All Replies
Super User
Super User
Posts: 9,866

Re: How to create a variable based on the matching values of other two variables in the same dataset

Posted in reply to saramanzella

This may not be the most efficient method (and not tested as no test data in datastep):

proc sql;
  create table want as 
  select  a.*,
          b.yearsedu_p
  from    have a
  left join (select id,sum(yearsedu) as yearsedu_p from have group by id) b
  on      a.partnerid=b.id;
quit;
Solution
a month ago
Valued Guide
Posts: 631

Re: How to create a variable based on the matching values of other two variables in the same dataset

Posted in reply to saramanzella

If the dataset is not to large, using a hash-object can solve the problem:

 

data work.have;
   length id partnerId $ 15 yearsEdu 8;
   input id partnerId yearsEdu;
   datalines;
AT-000327-01 AT-000327-02 15 
AT-000327-02 AT-000327-01 20 
AT-000328-01 . 10 
AT-000329-01 AT-000329-02 8 
AT-000329-02 AT-000329-01 13 
;
run;

data want;   
   set work.have;

   length yearsEdu_P 8;

   if _n_ = 1 then do;
      declare hash h(dataset: 'work.have(keep=partnerId yearsEdu rename=(yearsEdu = yearsEdu_P))');
      h.defineKey('partnerId');
      h.defineData('yearsEdu_P');
      h.defineDone();
   end;

   if h.find(key: id) ^= 0 then do;
      yearsEdu_P = .;
   end;
run;
PROC Star
Posts: 1,410

Re: How to create a variable based on the matching values of other two variables in the same dataset

Posted in reply to saramanzella

I think @andreas_ldss solution is probably the most efficient you will find if the data fits in memory.

 

Here is an alternative data step approach without the hash object.

 

data work.have;
   length id partnerId $ 15 yearsEdu 8;
   input id partnerId yearsEdu;
   datalines;
AT-000327-01 AT-000327-02 15 
AT-000327-02 AT-000327-01 20 
AT-000328-01 . 10 
AT-000329-01 AT-000329-02 8 
AT-000329-02 AT-000329-01 13 
;
run;

data want(drop=compPartnerID);
   set have nobs=nobs;
   do _N_=1 to nobs;
      set have(rename=(partnerId=compPartnerID yearsEdu=yearsEdu_P)) point=_N_;
      if id=partnerId then output;
   end;
   if missing(partnerId) then do; 
      call missing(yearsEdu_P);
      output;
   end;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 92 views
  • 3 likes
  • 4 in conversation