Hello there,
I used "if...then do;...end;" statement to get a new dataset, however, the new created variables are all missing values. I already tried the same format codes to generate another similar dataset with no errors.
Here is my codes:
data my_OR; set OR;
if effect="LDL_10 at gluc_10=8.80" then do group=1; gluc=88; end;
if effect="LDL_10 at gluc_10=9.90" then do group=1; gluc=99; end;
if effect="LDL_10 at gluc_10=12.50" then do group=1; gluc=125; end;
if effect="Gluc_10 at LDL_10=9.05" then do group=2; LDL=90.5; end;
if effect="Gluc_10 at LDL_10=13.51" then do group=2; LDL=135.1; end;
if effect="Gluc_10 at LDL_10=18.75" then do group=2; LDL=187.5; end;
run;
The log looks fine.
NOTE: There were 6 observations read from the data set WORK.OR.
NOTE: The data set WORK.MY_OR has 6 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.03 seconds
Here is the output:
Thank you so much!
How about running this code:
data junk; set OR; newvar = quote(effect); run; proc print data=junk noobs; var newvar; run;
and show us the result by pasting into a text box opened on the forum using the </> icon.
One suspects one or more spelling problems possible with the values of effect in the IF statements. The sneakiest to find are often spaces in the front of the value or in the middle where proportionate fonts make it hard to see the embedded spaces. The Quote function I show will place " " around the value and if there is a leading space (or two) it should be noticeable.
Proc Print and most of the procs like Freq will left align text and "lose" a leading space. So you may not see it.
Your IF statements are not formatted correctly. To do multiple assignments in each IF condition, you need a colon after the DO.
data my_OR;
set OR;
if effect="LDL_10 at gluc_10=8.80" then do; group=1; gluc=88; end;
else if effect="LDL_10 at gluc_10=9.90" then do; group=1; gluc=99; end;
else if effect="LDL_10 at gluc_10=12.50" then do; group=1; gluc=125; end;
else if effect="Gluc_10 at LDL_10=9.05" then do; group=2; LDL=90.5; end;
else if effect="Gluc_10 at LDL_10=13.51" then do; group=2; LDL=135.1; end;
else if effect="Gluc_10 at LDL_10=18.75" then do; group=2; LDL=187.5; end;
run;
It also seems like you could simplify this drastically:
if find(effect, "LDL")>0 then do;
group=1;
LDL = input(scan(effect, 2, "="), 8.)*10;
end;
else do;
group = 2;
gluc = input(scan(effect, 2, "="), 8.)*10;
end;
/*or this way, which is likely what I would do*/
if find(effect, "LDL")>0 then group=1;
else group=2;
measure = input(scan(effect, 2, "="), 8.)*10;
@aw016 wrote:
Hello there,
I used "if...then do;...end;" statement to get a new dataset, however, the new created variables are all missing values. I already tried the same format codes to generate another similar dataset with no errors.
Here is my codes:
data my_OR; set OR;
if effect="LDL_10 at gluc_10=8.80" then do group=1; gluc=88; end;
if effect="LDL_10 at gluc_10=9.90" then do group=1; gluc=99; end;
if effect="LDL_10 at gluc_10=12.50" then do group=1; gluc=125; end;
if effect="Gluc_10 at LDL_10=9.05" then do group=2; LDL=90.5; end;
if effect="Gluc_10 at LDL_10=13.51" then do group=2; LDL=135.1; end;
if effect="Gluc_10 at LDL_10=18.75" then do group=2; LDL=187.5; end;
run;
The log looks fine.
NOTE: There were 6 observations read from the data set WORK.OR.
NOTE: The data set WORK.MY_OR has 6 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.03 seconds
Here is the output:
Thank you so much!
Although @Reeza said you should put semi-colons after DO (and I agree for purposes of good form and clarity), that is not the reason your program fails. I used it unchanged below and it works as expected.
In this particular case the absence of a semi-colon works just fine, because the statement
if effect="LDL_10 at gluc_10=8.80" then do group=1; gluc=88; end;
is effectively
if effect="LDL_10 at gluc_10=8.80" then do group=1 while (group=1); gluc=88; end;
Edit: (added the "while (group=1)" component above). The prior "do group=1 to 1" ended up incrementing group to 2.
You probably thought of GROUP as just one of the variables to be assigned a value inside a DO group (I know I did at first), but SAS thinks GROUP is a do loop index.
As I said the problem is not in the code - it has to be in the data. Your code seems to work unchanged here:
data OR;
length effect $30;
effect="LDL_10 at gluc_10=8.80" ; output;
effect="LDL_10 at gluc_10=9.90" ; output;
effect="LDL_10 at gluc_10=12.50"; output;
effect="Gluc_10 at LDL_10=9.05" ; output;
effect="Gluc_10 at LDL_10=13.51"; output;
effect="Gluc_10 at LDL_10=18.75"; output;
run;
data my_OR; set OR;
if effect="LDL_10 at gluc_10=8.80" then do group=1; gluc=88; end;
if effect="LDL_10 at gluc_10=9.90" then do group=1; gluc=99; end;
if effect="LDL_10 at gluc_10=12.50" then do group=1; gluc=125; end;
if effect="Gluc_10 at LDL_10=9.05" then do group=2; LDL=90.5; end;
if effect="Gluc_10 at LDL_10=13.51" then do group=2; LDL=135.1; end;
if effect="Gluc_10 at LDL_10=18.75" then do group=2; LDL=187.5; end;
put (_all_) (=);
run;
which produces populated values for group, gluc and LDL in the log.
Right:
do group=1; ... end;
Is an example of this syntax:
do group=1,5,10; ... end;
but with only one item in the list of values.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.