## do loop

Solved
Super Contributor
Posts: 1,041

# do loop

Hi Team,
Data related questions:
In the data below what is the variable with the name last_cd4?
why is cd4date variable appearing twice in the data?
why is the cd4 data to the bottom right side having negative values

Question?
For instance, id 20, I want to calculate a difference between cd4=332 (first date) and cd4=71 (20Nov2007) in which this date is the nearest one year after the first test.

Also, id 40, a difference between cd4=20 (30Jan2003) and cd4=239 (19Jun2003) that the date is nearest one year after the first test.

Code related Questions:
why is the do loop placed prior to the SET statement??
in the delay variable we are doing the interval check.
we are checking the days....the difference between first date and CD4 date...
why are we substracting the equation with 365
could you explain the code from there????

Thanks.

Obs id CD4COUNT CD4DATE cd4   cd4date  last_cd4 cd4date 12month     cd4
1 20     332   29MAY2006 332  29MAY2006      6 03NOV2009 29MAY2007    326
2 20     267   12JUN2006 332  29MAY2006      6 03NOV2009 29MAY2007    326
3 20     207   05DEC2006 332  29MAY2006      6 03NOV2009 29MAY2007    326
4 20      71   20NOV2007 332  29MAY2006      6 03NOV2009 29MAY2007    326
5 20      15   24JUN2008 332  29MAY2006      6 03NOV2009 29MAY2007    326
6 20       8   07AUG2008 332  29MAY2006      6 03NOV2009 29MAY2007    326
7 20       3   02JUN2009 332  29MAY2006      6 03NOV2009 29MAY2007    326
8 20       6   03NOV2009 332  29MAY2006      6 03NOV2009 29MAY2007    326
9 40      20   30JAN2003 20  30JAN2003     326 14DEC2010 30JAN2004   -306
10 40      10   13MAR2003 20  30JAN2003     326 14DEC2010 30JAN2004   -306
11 40     300   08MAY2003 20  30JAN2003     326 14DEC2010 30JAN2004   -306
12 40     239  19JUN2003  20  30JAN2003     326 14DEC2010 30JAN2004   -306
run;

proc sort data=have; by id CD4DATE; run;

data want;
do until(last.id);
set have; by id;
if first.id then do;
firstDate = CD4DATE;
firstCD4 = CD4COUNT;
end;
else do;
delay = abs(intck("DAY", firstDate, CD4DATE) - 365);
minDelay = min(delay, minDelay);
if minDelay = delay then do;
yearCD4 = CD4COUNT;
lastDate = CD4DATE;
end;
end;
end;
format firstDate lastDate date9.;
keep id firstDate firstCD4 yearCD4 lastDate;
run;

proc print; run;

Accepted Solutions
Solution
‎10-04-2012 04:23 PM
PROC Star
Posts: 8,167

## Re: do loop

Karun,

First, please refrain from asking specific people to answer your questions.  The purpose of a forum is so that we can all learn and share.  No one person is always going to be best at answering all questions.

The best person to ask about a particular data set is the person who created that dataset.  The data you showed didn't just have 2 instances of cd4date, it had at least three.  There were other variables repeated as well.

Also, the best person to ask regarding "why" certain code was used is the person who wrote the program.  Anyone else, including us, can only help you understand "what" the code is doing.  But without knowing what the data are, or what they represent, we would only be guessing as to "why" it was used.

As for your code related questions:

why is the do loop placed prior to the SET statement??

It is called a DOW loop and you can read about it at: http://analytics.ncsu.edu/sesug/2010/BB13.Dorfman.pdf

In your case, it will only output the last record for each id, and will have access to the other variables populated before it gets to that records.

in the delay variable we are doing the interval check.

You are asking a "why" question.  Ask the person who wrote the code.

we are checking the days....the difference between first date and CD4 date...
why are we substracting the equation with 365

Again, you are asking a "why" question.  You left off that the code is taking the absolute value of that difference.

could you explain the code from there????

The rest of the code looks simple enough.  However, again, I think you are asking "why" rather than "what".

All Replies
Super User
Posts: 6,785

## Re: do loop

karun,

Just some of the answers here ...

The report does not match the data.  The data set contains just the 5 variables listed in the KEEP statement, which does not match the report.

There are two ways a report could contain CD4DATE twice.  One is that PROC PRINT did not produce the report, but PROC REPORT did instead.  The other possibility is that one of the variables has a different variable name, but has the label CD4DATE (or perhaps cd4date).  PROC PRINT can use the LABEL option to print the variable label instead of the variable name.

Finally (for my portion of the answer) last_cd4 is the final CD4COUNT for the particular ID.  In your data, ID=20 has 8 observations, and the 8th and final one has CD4COUNT=6.  That value is copied to last_cd4.  ID=40 has more than the 4 observations shown in your sample report, and the final one has CD4COUNT=326.

Sorry, but work beckons!

Good luck.

Super Contributor
Posts: 1,041

## Re: do loop

Thanks a lot Astounding for the detailed info.

I am also seeking help understanding the code

Regards

Solution
‎10-04-2012 04:23 PM
PROC Star
Posts: 8,167

## Re: do loop

Karun,

First, please refrain from asking specific people to answer your questions.  The purpose of a forum is so that we can all learn and share.  No one person is always going to be best at answering all questions.

The best person to ask about a particular data set is the person who created that dataset.  The data you showed didn't just have 2 instances of cd4date, it had at least three.  There were other variables repeated as well.

Also, the best person to ask regarding "why" certain code was used is the person who wrote the program.  Anyone else, including us, can only help you understand "what" the code is doing.  But without knowing what the data are, or what they represent, we would only be guessing as to "why" it was used.

As for your code related questions:

why is the do loop placed prior to the SET statement??

It is called a DOW loop and you can read about it at: http://analytics.ncsu.edu/sesug/2010/BB13.Dorfman.pdf

In your case, it will only output the last record for each id, and will have access to the other variables populated before it gets to that records.

in the delay variable we are doing the interval check.

You are asking a "why" question.  Ask the person who wrote the code.

we are checking the days....the difference between first date and CD4 date...
why are we substracting the equation with 365

Again, you are asking a "why" question.  You left off that the code is taking the absolute value of that difference.

could you explain the code from there????

The rest of the code looks simple enough.  However, again, I think you are asking "why" rather than "what".

Super Contributor
Posts: 1,041

## Re: do loop

Hi Arthur,

Firstly Thank you so very much  for your time and efforts in framing the answer. I am extremely sorry for asking specific people. All this while I was thinking If the same person answers then it could save other authors time in reading someone else's question. But now i understand that it is the combined effort for all of us to learn. I will definitely not ask questions to a particular person in the forum:smileyblush::smileyconfused:

Cheers

🔒 This topic is solved and locked.