BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
cosmid
Lapis Lazuli | Level 10

Hi,

 

If I have two datasets:

dataset ONE:
ID fruits

1    apple

1    peach

2   orange

3   pineapple

 

dataset TWO:

ID

a

b

c

d

e

 

dataset WANT:

ID fruits

a    apple

a   peach

b   orange

c   pineapple

 

So, I am trying to replace all the values in dataset ONE for column ID from the column ID in dataset TWO. However, the difficult part for me is that there are multiple rows with the same ID in dataset ONE. For those repeating IDs, they need to be replaced with the same ID from dataset TWO. Like the example shown above, in dataset WANT, there should be two a's since there are two 1's. How do I do that?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Assuming the you want to replace the first old id with the first new id and second with second etc then simple SET statements should do it.

Assuming there are enough observations in TWO and the observations in ONE are grouped (even if not strictly sorted) then just read in a new id when the old one changes.

data one ;
  input ID fruits $;
cards;
1    apple
1    peach
2   orange
3   pineapple
;

data two;
  input id $ ;
cards;
a
b
c
d
e
;

data want;
  set one(rename=(id=oldid));
  if oldid ne lag(oldid) then set two;
run;

Result

Obs    oldid    fruits      id

 1       1      apple       a
 2       1      peach       a
 3       2      orange      b
 4       3      pineappl    c


View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

Assuming the you want to replace the first old id with the first new id and second with second etc then simple SET statements should do it.

Assuming there are enough observations in TWO and the observations in ONE are grouped (even if not strictly sorted) then just read in a new id when the old one changes.

data one ;
  input ID fruits $;
cards;
1    apple
1    peach
2   orange
3   pineapple
;

data two;
  input id $ ;
cards;
a
b
c
d
e
;

data want;
  set one(rename=(id=oldid));
  if oldid ne lag(oldid) then set two;
run;

Result

Obs    oldid    fruits      id

 1       1      apple       a
 2       1      peach       a
 3       2      orange      b
 4       3      pineappl    c


cosmid
Lapis Lazuli | Level 10
Sorry, my example should contain more cases. The repeating ID's in dataset ONE is not limited to two, for example, it can be:
dataset ONE:
ID fruits
1 apple
1 green appels
1 yellow apples
1 purple apples
2 peach
3 watermelon
3 big watermelon

It can be 1, 2, ..., 10. There is no pattern
ballardw
Super User

I am going to assume that your real problem you have more complex values of ID, such as an account number in one file and the name of the account holder in the other.

 

Do you have a data set that has the connection between the "account number" and "account holder name" or equivalent? If you do you can use that a link in the process:

 

data one ;
  input ID fruits $;
cards;
1    apple
1    peach
2   orange
3   pineapple
;

data two;
  input id $ ;
cards;
a
b
c
d
e
;

data details;
   input idnum idtext $;
datalines;
1 a
2 b
3 c
;

Proc sql;
   create table want as
   select c.id,a.fruits
   from one as a
        left join
        details as b
        on a.id=b.idnum
        left join
        two as c
        on b.idtext=c.id
   ;
quit;

See how the two id values provide the connection from the number in one set to the text in the other?

 

Really, if you don't have such a linkage data set then you will have to expend a lot of words describing how different values are to be considered the same.

Such as data set could also be used to create a custom format:

data cntlin;
   set details;
   fmtname='Idnum2Idtext';
   start=idnum;
   label=idtext;
run;

proc format library=work cntlin=cntlin;
run;

data want;
   set one (rename=(id=idnumber));
   id=put(idnumber,Idnum2Idtext. -L);
   drop idnumber;
run;

Or even just use the Format and don't change the values:

Proc print data=one noobs;
   format id idnum2idtext.;
run;

A format like this will be honored by analysis, reporting and for most purposes in graphs.

cosmid
Lapis Lazuli | Level 10
There is no connection between the two datasets at all. My task was to replace the IDs in dataset ONE with the IDs with dataset TWO. The solution Tom provided worked 100%. I just didn't realize it at the time. I appreciate your comments. I am very weak on joins and formats. So this is going to be very useful for me. Thank you!
cosmid
Lapis Lazuli | Level 10
My apologies! I didn't test the code. I thought the lag function would check only the previous record and this solution would only work if there are two identical IDs. So I went to thinking about using the lagn function. After awhile, I just realized, if you are comparing against the previous IDs, then it would work regardless the number of repeating IDs. I am the worst SAS programmer in the world. Thanks again for your help Tom!

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1429 views
  • 2 likes
  • 3 in conversation