Hello everyone!
I'm using SAS Studio.
I have the following dataset with three variables: ID is the identification of the respondent, PARTNERID is the identification of the partner of the respondent and YEARSEDU is the number of years of education of the respondent.
ID PARTNERID YEARSEDU
AT-000327-01 AT-000327-02 15
AT-000327-02 AT-000327-01 20
AT-000328-01 . 10
AT-000329-01 AT-000329-02 8
AT-000329-02 AT-000329-01 13
I'd like to create another variable that represents the years of education of the partner of the respondent so that I can obtain the following dataset:
ID PARTNERID YEARSEDU YEARSEDU_P
AT-000327-01 AT-000327-02 15 20
AT-000327-02 AT-000327-01 20 15
AT-000328-01 . 10 .
AT-000329-01 AT-000329-02 8 13
AT-000329-02 AT-000329-01 13 8
I know I should compare the value of ID with every value of PARTNERID and if I find a match use the value of YEARSEDU in the matched PARTNERID row as value for YEARSEDU_P in the ID row.
I really have no idea on how to do that, can you please help me?
Many thanks.
Regards,
Sara
If the dataset is not to large, using a hash-object can solve the problem:
data work.have;
length id partnerId $ 15 yearsEdu 8;
input id partnerId yearsEdu;
datalines;
AT-000327-01 AT-000327-02 15
AT-000327-02 AT-000327-01 20
AT-000328-01 . 10
AT-000329-01 AT-000329-02 8
AT-000329-02 AT-000329-01 13
;
run;
data want;
set work.have;
length yearsEdu_P 8;
if _n_ = 1 then do;
declare hash h(dataset: 'work.have(keep=partnerId yearsEdu rename=(yearsEdu = yearsEdu_P))');
h.defineKey('partnerId');
h.defineData('yearsEdu_P');
h.defineDone();
end;
if h.find(key: id) ^= 0 then do;
yearsEdu_P = .;
end;
run;
This may not be the most efficient method (and not tested as no test data in datastep):
proc sql; create table want as select a.*, b.yearsedu_p from have a left join (select id,sum(yearsedu) as yearsedu_p from have group by id) b on a.partnerid=b.id; quit;
If the dataset is not to large, using a hash-object can solve the problem:
data work.have;
length id partnerId $ 15 yearsEdu 8;
input id partnerId yearsEdu;
datalines;
AT-000327-01 AT-000327-02 15
AT-000327-02 AT-000327-01 20
AT-000328-01 . 10
AT-000329-01 AT-000329-02 8
AT-000329-02 AT-000329-01 13
;
run;
data want;
set work.have;
length yearsEdu_P 8;
if _n_ = 1 then do;
declare hash h(dataset: 'work.have(keep=partnerId yearsEdu rename=(yearsEdu = yearsEdu_P))');
h.defineKey('partnerId');
h.defineData('yearsEdu_P');
h.defineDone();
end;
if h.find(key: id) ^= 0 then do;
yearsEdu_P = .;
end;
run;
I think @andreas_ldss solution is probably the most efficient you will find if the data fits in memory.
Here is an alternative data step approach without the hash object.
data work.have;
length id partnerId $ 15 yearsEdu 8;
input id partnerId yearsEdu;
datalines;
AT-000327-01 AT-000327-02 15
AT-000327-02 AT-000327-01 20
AT-000328-01 . 10
AT-000329-01 AT-000329-02 8
AT-000329-02 AT-000329-01 13
;
run;
data want(drop=compPartnerID);
set have nobs=nobs;
do _N_=1 to nobs;
set have(rename=(partnerId=compPartnerID yearsEdu=yearsEdu_P)) point=_N_;
if id=partnerId then output;
end;
if missing(partnerId) then do;
call missing(yearsEdu_P);
output;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.