DATA Step, Macro, Functions and more

How can I keep only the last 10 observations once an if condition is met?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

How can I keep only the last 10 observations once an if condition is met?

I would like to modify the below code so that in addition to keeping the observation that meets the if condition it also keeps the 10 observations preceding it. How can I do this? Thanks.

Data want;
set have;
if TrendDown=1 and ck1=1 and ck2=1 then Trend='Down';
if TrendUp=1 and ck1=1 and ck2=1 then Trend='Up';
if Trend='Up' or Trend='Down';
run;


Accepted Solutions
Solution
‎08-18-2014 03:27 PM
Trusted Advisor
Posts: 1,228

Re: How can I keep only the last 10 observations once an if condition is met?

data want;
set have;
cnt+1;
a=1;
if ck1=1 and ck2=1 and (TrendDown=1 or TrendUp=1) then do;
flag=1;
if cnt>0 and flag=1 then cutoff=cnt;
end;
lg=cnt-lag(cutoff);
if lg=1 then grp+a;
run;

proc sql;
create table done(drop=a lg cnt flag cutoff)  as
select * from want
group by grp
having cnt between max(cutoff)-10 and max(cutoff)
order by grp,cnt;
quit;

proc print data=done;
run;

View solution in original post


All Replies
Super Contributor
Posts: 297

Re: How can I keep only the last 10 observations once an if condition is met?

Hi BettyA,

although the question sounds relative simple, it is not that easy because a datastep goes throug af dataset row-wise, and it is cumbersome to construct code that remember the last 10 observations.

Though, it is possible with some tricks. In this example I make a dataset with one variables that Counts from 1 to 100. if Count is between 20 and 30 or between 60 and 70 then I keep the observations, and also I keep it if at least one of the 10 preceding observations fulfull the same criteria.

The trick I use is to use the "open" function to point onto the same dataset as I read from with set. Then I get the possibility to point on the observation I want to without keeping in memory the last 10 observations. There may be other methods that are easier. You can easily change the criteria to the one you have in your example.

data count_to_100;
  do count =1 to 100;output;
  end;
run;

data test;
  set count_to_100 ;
  retain dsid;
  if _N_=1 then dsid=open('count_to_100');
  keepobs=0;
  do i=max(1,_N_-11) to _N_;
    obsid=fetchobs(dsid,i);
if (20<=getvarn(dsid,1)<30)or(60<=getvarn(dsid,1)<70) then keepobs=1;
  end;
  if keepobs;
  drop i obsid dsid;
run;

Note: I have edited this a bit after I published it, to make it simpler.

Super User
Super User
Posts: 7,942

Re: How can I keep only the last 10 observations once an if condition is met?

Per JacobSimonsen, it is not straightforward as there is no logical way to grab "previous 10 records".  One other option is to assign an N number to the data, and reverse sort the data outputting until your ad-hoc number of rows is reached, something like (note its mostly test data and not the most efficient code, so just use as example):

data have;
  grp=1; trend=0; ck=1; abc="A"; output;
  grp=1; trend=0; ck=1; abc="B"; output;
  grp=1; trend=0; ck=1; abc="C"; output;
  grp=1; trend=0; ck=1; abc="D"; output;
  grp=1; trend=0; ck=1; abc="E"; output;
  grp=1; trend=1; ck=1; abc="F"; output;
  grp=2; trend=0; ck=1; abc="A"; output;
  grp=2; trend=0; ck=1; abc="B"; output;
  grp=2; trend=0; ck=1; abc="C"; output;
  grp=2; trend=1; ck=1; abc="D"; output;
  grp=2; trend=0; ck=1; abc="E"; output;
  grp=2; trend=0; ck=1; abc="F"; output;
run;

data want;
  set have;
  by grp;
  retain n;
  if first.grp then n=1;
  else n=n+1;
  if trend=1 and ck=1 then flag="Y";
run;
proc sort data=want;
  by grp descending n;
run;
data want2 (drop=on cnt flag);
  set want;
  by grp;
  retain on cnt;
  if first.grp then on="N";
  if flag="Y" then do;
    on="Y";
    cnt=1;
    output;
  end;
  else if on="Y" and cnt <=3 then do;
    cnt=cnt+1;
    output;
  end;
run;

Super User
Super User
Posts: 7,039

Re: How can I keep only the last 10 observations once an if condition is met?

You could probably use what is know as the DOW  (do while) loop.  Or in this case a a double DOW loop.

The first loop will read until your trend condition is met and the second will re-read the same records and conditionally output them.

data want;

   do count1=1 by 1 until (ck1=1 and ck2=1 and (TrendDown=1 or TrendUp=1));

     set have;

   end;

  if TrendDown=1 then Trend='Down';

  else Trend='Up';

  do count2=1 to count1 ;

     set have;

     if count2 > count1 - 10 then output;

  end;

run;

Super User
Posts: 5,499

Re: How can I keep only the last 10 observations once an if condition is met?

Here's some code you could add just before the RUN statement:

do _i_=max(1, _n_-10) to _n_;

   set have point=_i_;

   output;

end;

Note that you may be outputting the same observation many times.  If TREND is "Down" for one observation, it may very well be "Down" again on the next observation.  How many times do you want to output the same observation?

Respected Advisor
Posts: 3,156

Re: How can I keep only the last 10 observations once an if condition is met?

I agree it will take a seasoned SAS programmer to come up an old school data step solution as you may have already see. However, if you don't mind getting some Hash, it is actually pretty straightforward, at least concept-wise.  The idea is to rolling 10 obs into Hash objects, when condition met, then output the Hash along with the rest of the obs. See example below (To grab 3 obs before name='Mary' and forth):

proc sql;

     select quote(strip(name)) into :name separated by ',' from dictionary.columns

           where LIBNAME='SASHELP' AND MEMNAME='CLASS';

     select nobs into :nobs from dictionary.tables where lIBNAME='SASHELP' AND MEMNAME='CLASS';

QUIT;

data want;

     retain i &nobs;

     if _n_=1 then

           do;

                declare hash h();

                h.definekey('n');

                h.definedata(&NAME.);

                H.DEFINEDONE();

                DECLARE HITER HI('H');

           END;

     set sashelp.class;

     n=MOD(_N_,4);

     if _N_ LE I THEN RC=H.REPLACE();

     if name='Mary' then

           do;

                i=_n_;

                do rc=hi.first() by 0 while (rc=0);

                     output;

                     rc=hi.next();

                end;

                RC=H.DELETE();

           end;

     if _n_ gt i then

           output;

     drop i n rc;

run;


Good luck,

Haikuo

Solution
‎08-18-2014 03:27 PM
Trusted Advisor
Posts: 1,228

Re: How can I keep only the last 10 observations once an if condition is met?

data want;
set have;
cnt+1;
a=1;
if ck1=1 and ck2=1 and (TrendDown=1 or TrendUp=1) then do;
flag=1;
if cnt>0 and flag=1 then cutoff=cnt;
end;
lg=cnt-lag(cutoff);
if lg=1 then grp+a;
run;

proc sql;
create table done(drop=a lg cnt flag cutoff)  as
select * from want
group by grp
having cnt between max(cutoff)-10 and max(cutoff)
order by grp,cnt;
quit;

proc print data=done;
run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 900 views
  • 13 likes
  • 7 in conversation