Solved: 2X DWL or DUL dilemma?

Haikuo · Posted 02-07-2012 10:45 AM

Dear All,

It was just occured to me that I still have this question on 2x Do-loops regarless of so many times I have used them.

For example:

data have;

infile cards missover;

input YEAR patient $ Mortality_in365days $;

cards;

1990 221

1991 221

1993 221

1995 221 *

2000 789

2001 789

2002 789 *

2001 965

2005 965 *

;

data want;

do until (last.patient);

set have;

by patient year;

Death_year=ifn(last.patient,year,death_year);

retain death_year;

end;

do until (last.patient);

set have;

by patient year;

Years_Before_death=death_year-year;

output;

end;

run;

My question is: by the time the first loop reaches the end of first 'have', the datastep should stop, so the final output should be lacking the last group of data, say patient= 965 should not be seen in the final output. Instead, datastep keeps on moving forward, and only being finished after the second set reach its end. I mean I am really glad that datastep acts smart on this one, the 2xDUL gets to hold its charm. But why?

Thanks in advance for all of your inputs and Answers!

Haikuo

art297 · Posted 02-07-2012 10:58 AM

I can't write a full paper here to explain the concept but, forunately, Paul Dorfman and Koen Vyverman already have. Take a good look at http://support.sas.com/resources/papers/proceedings09/038-2009.pdf .

I think it will provide all of the explanations you are seeking.

View solution in original post

art297 · Posted 02-07-2012 10:58 AM

I can't write a full paper here to explain the concept but, forunately, Paul Dorfman and Koen Vyverman already have. Take a good look at http://support.sas.com/resources/papers/proceedings09/038-2009.pdf .

I think it will provide all of the explanations you are seeking.

Haikuo · Posted 02-07-2012 11:08 AM

Wow, it is a GREAT paper! I probably need more time to dwell on it, but Thanks, Art!

Linlin · Posted 02-07-2012 08:23 PM

Hi Hai.kuo,

I looked at the paper Art recommended. It is a very good paper. I made some changes to one of your posts for practice. Thank you!

data have;

input naics4 $ taxable1-taxable5;

cards;

1 20 30 40 50 60

1 25 35 45 55 65

1 30 40 50 60 70

2 20 30 40 50 60

2 25 35 45 55 65

3 30 40 50 60 70

;

run;

data want (drop=tax: );

do until (last.naics4);

set have;

by naics4;

array tax(*) taxable1-taxable5;

array st(*) sum_tax1-sum_tax5;

do _n_=1 to dim(tax);

st(_n_)=sum(st(_n_),tax(_n_));

end;

run;

proc print;run;

Doc_Duke · Posted 02-07-2012 11:05 AM

Haikuo,

The problem is not with the DO loop. It has to do with the behavior of the SET statement.

"What SET Does

Each time the SET statement is executed, SAS reads one observation into the program data vector. SET reads all variables and all observations from the input data sets unless you tell SAS to do otherwise. A SET statement can contain multiple data sets; a DATA step can contain multiple SET statements. See Combining and Modifying SAS Data Sets: Examples. "

By design, the second SET statement starts over at the beginning of the dataset. There are examples in the documentation that cover the behavior that you observed.

Doc Muhlbaier

Duke

2X DWL or DUL dilemma?

2X DWL or DUL dilemma?

2X DWL or DUL dilemma?

2X DWL or DUL dilemma?

2X DWL or DUL dilemma?

2X DWL or DUL dilemma?

"What SET Does

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away