Dear All,
I would describe my knowledge and abilities in programing and macros as basic. This is my problem:
I have a dataset. I will refer to it as "history". In history, each observation is a summary of a report, including a date of issue and various variables. Reports are issued at different times, i.e. the difference in months between a report and a previous report could be any value between 1 and 24.
I would like to create a new dataset. In this dataset the first observation would be the most recent report in history. The second observation would be the next and closest report in history, conditional on the fact it was issued at least 6 months later than the report in the FIRST observation. The third observation would now be the next report in history, conditional on the fact it was issued at least 6 months later than the report in the SECOND observation, and so forth. In other words, I have a reference date that needs to be updated.
1. Can this be done without a macro? How?
2. Must this be done with a macro? How?
Any help would be highly appreciated.
Michael
Dr van Straten, didn't you broadcast on LBC some years ago? What a surprise!
Try this:
data aborthist;
set aborthist;
by descending startdate;
retain last_kept; /* remember its value from row to row */
drop last_kept; /* but do not bother to write it to the output */
if _n_=1 then do; /* definitely want the first one */
output;
last_kept = startdate; /* keep a note of the date we just wrote */
end;
else if last_kept - startdate ge 180 /* compare current date to last one written */
then do;
output;
last_kept = startdate; /* and this is the one we compare next time */
end;
run;
1. Yes. Sort your data on report id and descending report date. Use a data step with BY report date. Use retain (with firs. logic) or lag() to compare report date between observations. Keep records that meets your criteria.
2. No.
Thank you. I'll try that.
Sorry, this solution is not working;
I used:
data aborthist;
set aborthist;
by descending startdate;
if _n_=1 OR (lag(startdate)-startdate ge 180) then output;
run;
Apparently, the lag function queues the value of the previous observation. However, the previous observation is not necessarily the observation I need for the reference date. The reference date is the date belonging to the previous observation THAT WAS KEPT. For example, I have the dates:
May 2014
Dec 2013
Jun 2013
Mar 2013
Jan 2013
Aug 2012
Mar 2012
Jan 2011
Sep 2011
Mar 2011
The following reports SHOULD be extracted: May 2014, Jun 2013, Aug 2012, Dec 2012, Mar 2011.
However, only the following reports ARE extracted: May 2014, Jun 2013, March 2011.
Dr van Straten, didn't you broadcast on LBC some years ago? What a surprise!
Try this:
data aborthist;
set aborthist;
by descending startdate;
retain last_kept; /* remember its value from row to row */
drop last_kept; /* but do not bother to write it to the output */
if _n_=1 then do; /* definitely want the first one */
output;
last_kept = startdate; /* keep a note of the date we just wrote */
end;
else if last_kept - startdate ge 180 /* compare current date to last one written */
then do;
output;
last_kept = startdate; /* and this is the one we compare next time */
end;
run;
Sorry to disappoint you dkb, I'm not THE Michael van Straten (Health Journalist and much more), I'm just a Michael van Straten (Vet).
Thank you very much for your solution. I'll check it out but am already quite confident it will work as you have addressed the key problem. So I know for the future: if one refers to a variable AFTER using the output command (like you did above), will the value of the variable always be that belonging to the observation that was "output"?
The OUTPUT statement does not change any variable values at all; it just writes the contents of the PDV to the dataset.
Variable values change:
- when an input dataset is read (SET or MERGE statement)
- when values are assigned through direct assignment (=), incrementation (like x + 1;) or by a "call" subroutine
- when a new data step iteration begins (all variables not RETAINED are set to missing)
Worked perfect! Thanks.
Since the first observation is the most recent report, by definition there can be no later report. So I guess you meant "earlier"?
Yes, sorry. Earlier of course.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.