DATA Step, Macro, Functions and more

How to extract specific ranges of obs from a data set

Reply
N/A
Posts: 1

How to extract specific ranges of obs from a data set

I have a large sequential data set. I have identified a small subset of extreme observations in the set. What I want to do is extract the extreme observations and the 10 observation before and after each extreme observations. If that is possible, I would also like to ensure that duplicates are removed as well. And ideally, since I have by variables, I would love to be able to abide by the first.var and last.var parameters so as not to include an obs from a different by variable within the range. Can anyone help?
Valued Guide
Posts: 634

Re: How to extract specific ranges of obs from a data set

Here is one solution. It requires two passes of the primary data set :-( . If this is too big of an inefficiency there is probably a HASH table solution. try it and let us know.
Art
[pre]
* build set of extreme values - you already have these;
data select(keep=region subsidiary product stores);
set sashelp.shoes(where=(stores>30));
run;

* First pass of data to determine obs numbers of the extreme values;
data cnt(keep=region subsidiary product stores cnt
rename=(region=cregion subsidiary=csubsidiary product=cproduct stores=cstores));
merge sashelp.shoes(in=inshoes)
select(in=inselect);
by region subsidiary product;
cnt+1;
if inshoes and inselect;
run;

* select leading and trailing obs;
* retrieve +/- 3 obs;
data surrounded(keep=region subsidiary product stores sales );
* array to hold used obs numbers - dimension depends on the total number of obs;
array used {100000] $1 _temporary_;
set cnt;
do point=(cnt-3) to (cnt+3);
if 1 le point le nobs then do;
set sashelp.shoes point=point nobs=nobs;
* PRODUCT is not included in the comparison so duplicate obs are possible without checking array;
if used{point}=' ' & region=cregion & subsidiary=csubsidiary then do;
output surrounded;
used{point} = 'x';
end;
end;
end;
run;
[/pre] Message was edited by: ArtC
Super User
Posts: 10,041

Re: How to extract specific ranges of obs from a data set

Hi. Can you give some what your dataset look like with some dummy data,
and what output you would like to be . giving a example would help understand what your purpose.
Super Contributor
Posts: 359

Re: How to extract specific ranges of obs from a data set

Try this. You will want to check for end of file etc to avoid warnings, but this works in the simple example I wrote.


Data new;
set old;
counter+1;
chkvar = byvar;
if (condition met) then do pv = counter -10 to counter + 10;
set old point = pv;
if byvar = chkvar then output;
end;
run;
Ask a Question
Discussion stats
  • 3 replies
  • 163 views
  • 0 likes
  • 4 in conversation