Hi All,
I'm trying to write a data step with IF THEN ELSE statements.
I have 2 columns and I'm trying to create a third column based on these two columns.
table:
column 1 column2 new_column
A 1 X
A 4 X
A 5 X
B 2 Y
B 3 Y
B 8 Y
C 1 X,Y
C 2 X,Y
C 7 X,Y
IF a value in column 1 has value '1' in column 2 then the new column should be X and all the other rows should be X related to that column1 value A.
IF a value in column 1 has value '2' in column 2 then the new column should be Y and all the other rows should be Y related to that column1 value B.
IF a value in column 1 has value ('1','2') in column 2 then the new column should be (X,Y) and all the other rows should be X related to that column1 value C.
Could you help me with the syntax for this logic. The data set I'm working on had more than 300,000 rows. Thank you.
This is a little more difficult, because you have to base the result on more than one observation. But then you have to go back and insert the proper result on previous observation(s).
Here's one way:
proc sort data=have;
by column1 column2;
run;
data want;
length new_column $ 3;
do until (last.column1);
set have;
by column1;
if column2=1 then new_column='X';
else if column2=2 then do;
if new_column=' ' then new_column='Y';
else if new_column='X' then new_column = 'X,Y';
end;
end;
do until (last.column1);
set have;
by column1;
output;
end;
run;
The top loop examines all records for a value of COLUMN1, and creates NEW_COLUMN. The bottom loop reads those same records back in, and outputs (including the value for NEW_COLUMN).
This is a little more difficult, because you have to base the result on more than one observation. But then you have to go back and insert the proper result on previous observation(s).
Here's one way:
proc sort data=have;
by column1 column2;
run;
data want;
length new_column $ 3;
do until (last.column1);
set have;
by column1;
if column2=1 then new_column='X';
else if column2=2 then do;
if new_column=' ' then new_column='Y';
else if new_column='X' then new_column = 'X,Y';
end;
end;
do until (last.column1);
set have;
by column1;
output;
end;
run;
The top loop examines all records for a value of COLUMN1, and creates NEW_COLUMN. The bottom loop reads those same records back in, and outputs (including the value for NEW_COLUMN).
Thank you so much.
Best.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.