Solved: How to replace a value from a column with the value of column from ano...

cosmid · Posted 09-13-2024 02:16 PM

Hi,

If I have two datasets:

dataset ONE:
ID fruits

1 apple

1 peach

2 orange

3 pineapple

dataset TWO:

ID

a

b

c

d

e

dataset WANT:

ID fruits

a apple

a peach

b orange

c pineapple

So, I am trying to replace all the values in dataset ONE for column ID from the column ID in dataset TWO. However, the difficult part for me is that there are multiple rows with the same ID in dataset ONE. For those repeating IDs, they need to be replaced with the same ID from dataset TWO. Like the example shown above, in dataset WANT, there should be two a's since there are two 1's. How do I do that?

Thanks!

Tom · Posted 09-13-2024 02:24 PM

Assuming the you want to replace the first old id with the first new id and second with second etc then simple SET statements should do it.

Assuming there are enough observations in TWO and the observations in ONE are grouped (even if not strictly sorted) then just read in a new id when the old one changes.

data one ;
  input ID fruits $;
cards;
1    apple
1    peach
2   orange
3   pineapple
;

data two;
  input id $ ;
cards;
a
b
c
d
e
;

data want;
  set one(rename=(id=oldid));
  if oldid ne lag(oldid) then set two;
run;

Result

Obs    oldid    fruits      id

 1       1      apple       a
 2       1      peach       a
 3       2      orange      b
 4       3      pineappl    c

View solution in original post

Tom · Posted 09-13-2024 02:24 PM

Assuming the you want to replace the first old id with the first new id and second with second etc then simple SET statements should do it.

Assuming there are enough observations in TWO and the observations in ONE are grouped (even if not strictly sorted) then just read in a new id when the old one changes.

data one ;
  input ID fruits $;
cards;
1    apple
1    peach
2   orange
3   pineapple
;

data two;
  input id $ ;
cards;
a
b
c
d
e
;

data want;
  set one(rename=(id=oldid));
  if oldid ne lag(oldid) then set two;
run;

Result

Obs    oldid    fruits      id

 1       1      apple       a
 2       1      peach       a
 3       2      orange      b
 4       3      pineappl    c

cosmid · Posted 09-13-2024 02:28 PM

Sorry, my example should contain more cases. The repeating ID's in dataset ONE is not limited to two, for example, it can be:
dataset ONE:
ID fruits
1 apple
1 green appels
1 yellow apples
1 purple apples
2 peach
3 watermelon
3 big watermelon

It can be 1, 2, ..., 10. There is no pattern

ballardw · Posted 09-13-2024 02:56 PM

I am going to assume that your real problem you have more complex values of ID, such as an account number in one file and the name of the account holder in the other.

Do you have a data set that has the connection between the "account number" and "account holder name" or equivalent? If you do you can use that a link in the process:

data one ;
  input ID fruits $;
cards;
1    apple
1    peach
2   orange
3   pineapple
;

data two;
  input id $ ;
cards;
a
b
c
d
e
;

data details;
   input idnum idtext $;
datalines;
1 a
2 b
3 c
;

Proc sql;
   create table want as
   select c.id,a.fruits
   from one as a
        left join
        details as b
        on a.id=b.idnum
        left join
        two as c
        on b.idtext=c.id
   ;
quit;

See how the two id values provide the connection from the number in one set to the text in the other?

Really, if you don't have such a linkage data set then you will have to expend a lot of words describing how different values are to be considered the same.

Such as data set could also be used to create a custom format:

data cntlin;
   set details;
   fmtname='Idnum2Idtext';
   start=idnum;
   label=idtext;
run;

proc format library=work cntlin=cntlin;
run;

data want;
   set one (rename=(id=idnumber));
   id=put(idnumber,Idnum2Idtext. -L);
   drop idnumber;
run;

Or even just use the Format and don't change the values:

Proc print data=one noobs;
   format id idnum2idtext.;
run;

A format like this will be honored by analysis, reporting and for most purposes in graphs.

cosmid · Posted 09-13-2024 05:11 PM

There is no connection between the two datasets at all. My task was to replace the IDs in dataset ONE with the IDs with dataset TWO. The solution Tom provided worked 100%. I just didn't realize it at the time. I appreciate your comments. I am very weak on joins and formats. So this is going to be very useful for me. Thank you!

cosmid · Posted 09-13-2024 02:54 PM

My apologies! I didn't test the code. I thought the lag function would check only the previous record and this solution would only work if there are two identical IDs. So I went to thinking about using the lagn function. After awhile, I just realized, if you are comparing against the previous IDs, then it would work regardless the number of repeating IDs. I am the worst SAS programmer in the world. Thanks again for your help Tom!

How to replace a value from a column with the value of column from another dataset

Re: How to replace a value from a column with the value of column from another dataset

Re: How to replace a value from a column with the value of column from another dataset

Re: How to replace a value from a column with the value of column from another dataset

Re: How to replace a value from a column with the value of column from another dataset

Re: How to replace a value from a column with the value of column from another dataset

Re: How to replace a value from a column with the value of column from another dataset

How to replace a value from a column with the value of column from another dataset

Re: How to replace a value from a column with the value of column from another dataset

Re: How to replace a value from a column with the value of column from another dataset

Re: How to replace a value from a column with the value of column from another dataset

Re: How to replace a value from a column with the value of column from another dataset

Re: How to replace a value from a column with the value of column from another dataset

Re: How to replace a value from a column with the value of column from another dataset

The 2025 SAS Hackathon has begun!