Hi,
If I have two datasets:
dataset ONE:
ID fruits
1 apple
1 peach
2 orange
3 pineapple
dataset TWO:
ID
a
b
c
d
e
dataset WANT:
ID fruits
a apple
a peach
b orange
c pineapple
So, I am trying to replace all the values in dataset ONE for column ID from the column ID in dataset TWO. However, the difficult part for me is that there are multiple rows with the same ID in dataset ONE. For those repeating IDs, they need to be replaced with the same ID from dataset TWO. Like the example shown above, in dataset WANT, there should be two a's since there are two 1's. How do I do that?
Thanks!
Assuming the you want to replace the first old id with the first new id and second with second etc then simple SET statements should do it.
Assuming there are enough observations in TWO and the observations in ONE are grouped (even if not strictly sorted) then just read in a new id when the old one changes.
data one ;
input ID fruits $;
cards;
1 apple
1 peach
2 orange
3 pineapple
;
data two;
input id $ ;
cards;
a
b
c
d
e
;
data want;
set one(rename=(id=oldid));
if oldid ne lag(oldid) then set two;
run;
Result
Obs oldid fruits id 1 1 apple a 2 1 peach a 3 2 orange b 4 3 pineappl c
Assuming the you want to replace the first old id with the first new id and second with second etc then simple SET statements should do it.
Assuming there are enough observations in TWO and the observations in ONE are grouped (even if not strictly sorted) then just read in a new id when the old one changes.
data one ;
input ID fruits $;
cards;
1 apple
1 peach
2 orange
3 pineapple
;
data two;
input id $ ;
cards;
a
b
c
d
e
;
data want;
set one(rename=(id=oldid));
if oldid ne lag(oldid) then set two;
run;
Result
Obs oldid fruits id 1 1 apple a 2 1 peach a 3 2 orange b 4 3 pineappl c
I am going to assume that your real problem you have more complex values of ID, such as an account number in one file and the name of the account holder in the other.
Do you have a data set that has the connection between the "account number" and "account holder name" or equivalent? If you do you can use that a link in the process:
data one ; input ID fruits $; cards; 1 apple 1 peach 2 orange 3 pineapple ; data two; input id $ ; cards; a b c d e ; data details; input idnum idtext $; datalines; 1 a 2 b 3 c ; Proc sql; create table want as select c.id,a.fruits from one as a left join details as b on a.id=b.idnum left join two as c on b.idtext=c.id ; quit;
See how the two id values provide the connection from the number in one set to the text in the other?
Really, if you don't have such a linkage data set then you will have to expend a lot of words describing how different values are to be considered the same.
Such as data set could also be used to create a custom format:
data cntlin; set details; fmtname='Idnum2Idtext'; start=idnum; label=idtext; run; proc format library=work cntlin=cntlin; run; data want; set one (rename=(id=idnumber)); id=put(idnumber,Idnum2Idtext. -L); drop idnumber; run;
Or even just use the Format and don't change the values:
Proc print data=one noobs; format id idnum2idtext.; run;
A format like this will be honored by analysis, reporting and for most purposes in graphs.
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.