Dear All,
It was just occured to me that I still have this question on 2x Do-loops regarless of so many times I have used them.
For example:
data have;
infile cards missover;
input YEAR patient $ Mortality_in365days $;
cards;
1990 221
1991 221
1991 221
1993 221
1995 221 *
2000 789
2001 789
2002 789 *
2001 965
2005 965 *
;
data want;
do until (last.patient);
set have;
by patient year;
Death_year=ifn(last.patient,year,death_year);
retain death_year;
end;
do until (last.patient);
set have;
by patient year;
Years_Before_death=death_year-year;
output;
end;
run;
My question is: by the time the first loop reaches the end of first 'have', the datastep should stop, so the final output should be lacking the last group of data, say patient= 965 should not be seen in the final output. Instead, datastep keeps on moving forward, and only being finished after the second set reach its end. I mean I am really glad that datastep acts smart on this one, the 2xDUL gets to hold its charm. But why?
Thanks in advance for all of your inputs and Answers!
Haikuo
I can't write a full paper here to explain the concept but, forunately, Paul Dorfman and Koen Vyverman already have. Take a good look at http://support.sas.com/resources/papers/proceedings09/038-2009.pdf .
I think it will provide all of the explanations you are seeking.
I can't write a full paper here to explain the concept but, forunately, Paul Dorfman and Koen Vyverman already have. Take a good look at http://support.sas.com/resources/papers/proceedings09/038-2009.pdf .
I think it will provide all of the explanations you are seeking.
Wow, it is a GREAT paper! I probably need more time to dwell on it, but Thanks, Art!
Hi Hai.kuo,
I looked at the paper Art recommended. It is a very good paper. I made some changes to one of your posts for practice. Thank you!
data have;
input naics4 $ taxable1-taxable5;
cards;
1 20 30 40 50 60
1 25 35 45 55 65
1 30 40 50 60 70
2 20 30 40 50 60
2 25 35 45 55 65
3 30 40 50 60 70
;
run;
data want (drop=tax: );
do until (last.naics4);
set have;
by naics4;
array tax(*) taxable1-taxable5;
array st(*) sum_tax1-sum_tax5;
do _n_=1 to dim(tax);
st(_n_)=sum(st(_n_),tax(_n_));
end;
end;
run;
proc print;run;
Haikuo,
The problem is not with the DO loop. It has to do with the behavior of the SET statement.
Each time the SET statement is executed, SAS reads one observation into the program data vector. SET reads all variables and all observations from the input data sets unless you tell SAS to do otherwise. A SET statement can contain multiple data sets; a DATA step can contain multiple SET statements. See Combining and Modifying SAS Data Sets: Examples. "
By design, the second SET statement starts over at the beginning of the dataset. There are examples in the documentation that cover the behavior that you observed.
Doc Muhlbaier
Duke
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.