Hello,
Please help in this.
I would like to select 5 rows before and 5 row after a particular observation in a data set. I have tried and searched for the appropriated codes but had no any answer. As the picture below
Basing on 'edate', whenever edate takes a value (for here is 21-APR-2011) I want to take 5 rows before that point and 5 rows after that point. Any suggestion? Thank you.
The solution suggested by @FreelanceReinh looks good, but it can be made a little simpler and safer:
data want;
set have;
if edate>. then do p=max(_n_-5,1) to min(_n_+5,nobs);
set have point=p nobs=nobs;
output;
end;
run;
- using MAX and MIN functions is the easy way to make sure that you do not try to read non-existing observations.
One word of warning: if your table contains deleted observations (e.g. because they have been edited with FSEDIT, or somebody used an SQL DELETE FROM statement on it), this will not work correctly. In that case you should copy your input to a temporary table first, like
data Work.temp;
set mylib.have;
run;
so that the table can be accessed with POINT= without problems.
One question: If you have two valid Edates within 10 rows from each other, do you then want some of the observations output twice?
If not, you may want to do something like this:
data want;
set have;
if edate>. then do p=max(_n_-5,1,p+1) to min(_n_+5,nobs);
set have point=p nobs=nobs;
output;
end;
run;
so that the same observation is not output twice. The point variable (P) is automatically initialized to 0 and retained.
Hello @PhuongNguyen and welcome to the SAS Support Communities!
Here is a basic code example for your task:
data want;
set have;
if edate>. then do p=_n_-5 to _n_+5;
if 1<=p<=n then do;
set have point=p nobs=n;
output;
end;
end;
run;
You may want to extend the code, for example, in order to exclude the middle observation (and keep only the 5+5, if desired), to avoid duplicate observations if two ranges of +/-5 overlap or to ensure that the 10 or 11 selected observations of a range belong to the same RIC.
PS: In future posts please provide sample data in the form of a datastep (see How to convert datasets to data steps) because it is very difficult, even for SAS, to read data from screenshots.
Hi @FreelanceReinh,
I add 'by RIC' to distinguish among the RIC, and it says the POINT= option is incompatible with the BY statement. Can you help?
Sure. The idea is not to use a BY statement, but to compare RIC values:
data want;
set have;
if edate>. then do p=max(_n_-5,1) to min(_n_+5,nobs);
set have(rename=(ric=ric1)) point=p nobs=nobs;
if ric=ric1 then output;
end;
drop ric1;
run;
Thanks so much @FreelanceReinh
The solution suggested by @FreelanceReinh looks good, but it can be made a little simpler and safer:
data want;
set have;
if edate>. then do p=max(_n_-5,1) to min(_n_+5,nobs);
set have point=p nobs=nobs;
output;
end;
run;
- using MAX and MIN functions is the easy way to make sure that you do not try to read non-existing observations.
One word of warning: if your table contains deleted observations (e.g. because they have been edited with FSEDIT, or somebody used an SQL DELETE FROM statement on it), this will not work correctly. In that case you should copy your input to a temporary table first, like
data Work.temp;
set mylib.have;
run;
so that the table can be accessed with POINT= without problems.
One question: If you have two valid Edates within 10 rows from each other, do you then want some of the observations output twice?
If not, you may want to do something like this:
data want;
set have;
if edate>. then do p=max(_n_-5,1,p+1) to min(_n_+5,nobs);
set have point=p nobs=nobs;
output;
end;
run;
so that the same observation is not output twice. The point variable (P) is automatically initialized to 0 and retained.
Thanks @s_lassen, good points!
I think to avoid duplicates without skipping observations the DO loop should start at
max(_n_-5,1,p)
because before leaving the previous DO loop p was incremented once more.
Many thanks to @FreelanceReinh , @s_lassen, and @Ksharp
The codes work perfect. They are even more than what I expected. I will do datastep next times. Thank you.
Hello FreelanceReinhard,
I have a similar issue but the code provided is not solving it. I want the 2 records before and 2 records bracketing a non missing FIRSTE value. So if there aren't two records before or two records after the non missing value of FIRSTE then all the records for that ID get deleted. In the data below only the records from C and D should be in the final dataset but with the code
data have;
input ID $ AGE RATE TYPE $ FIRSTE;
datalines ;
A 14.1 -4.19560471 N .
A 15.1 -4.186550471 E -4.186550471
A 15.3 -4.041576561 E .
A 15.6 -3.984641999 E .
A 15.8 -4.079763059 E .
A 16.2 -4.198425079 E .
B 6.2 -1.118603619 N .
B 12.25 -3.251832999 N .
B 12.4 -3.158418124 N .
B 14.8 -3.739770279 N .
B 19.1 -3.980687937 E -3.980687937
B 19.6 -3.901730725 E .
C 4.3 2.1339735869 N .
C 5.63 2.1571201461 N .
C 5.6 2.1571201461 E 2.1571201461
C 6.1 2.6609995525 E .
C 6.7 2.1939735869 E .
C 7.1 2.6991891929 E .
D 2.4 -0.35998761 N .
D 2.6 -0.650544267 N .
D 3.2 -0.817670577 E -0.817670577
D 3.7 -0.659657975 E .
D 4.2 -0.370159291 E .
;
run;
data want;
set have;
if firste>. then do p=max(_n_-2,1,p) to min(_n_+2,nobs);
set have point=p nobs=nobs;
output;
end;
run;
How can I eliminate records which do not match my condition?
thanks
I was able to get to my desired results by adding another step
proc sql;
create table final as
select *, count(id) as cnt
from want
group by id
having calculated cnt >=5
;
quit;
Does anyone know if it is possible to do it in one step?
Hello s_lassen,
I'm using the code you provided here for a similar problem as the one shown here. However, I have to add a statement to make sure that the -5/+5 rows selected still have the same RIC. If the RIC changes after the +3rd obs, SAS should stop and then only select -5/+3. How do I have to adjust your code to make this happen?
Thank you in advance.
BR
Nici
What if the there is a row before or after 21-APR-2011 and the result shows both rows. But, I want the result to show only 21-APR-2011, so how can I adjusted the codes?
Thank you
Post your data as SAS data step, not picture.Nobody would like type it for you .
Hello,
has anyone had an issue with the code:
if firste>. then do p=max(_n_-2,1,p) to min(_n_+2,nobs);
set have point=p nobs=nobs;
I am having issue with the record ID E.
data have;
input ID $ AGE RATE TYPE $ FIRSTE;
datalines ;
A 14.1 -4.19560471 N .
A 15.1 -4.186550471 E -4.186550471
A 15.3 -4.041576561 E .
A 15.6 -3.984641999 E .
A 15.8 -4.079763059 E .
A 16.2 -4.198425079 E .
B 6.2 -1.118603619 N .
B 12.25 -3.251832999 N .
B 12.4 -3.158418124 N .
B 14.8 -3.739770279 N .
B 19.1 -3.980687937 E -3.980687937
B 19.6 -3.901730725 E .
C 4.3 2.1339735869 N .
C 5.4 2.1571201461 N .
C 5.6 2.1571201461 E 2.1571201461
C 6.1 2.6609995525 E .
C 6.7 2.1939735869 E .
C 7.1 2.6991891929 E .
D 2.4 -0.35998761 N .
D 2.6 -0.650544267 N .
D 3.2 -0.817670577 E -0.817670577
D 3.7 -0.659657975 E .
D 4.2 -0.370159291 E .
E 15.6 -3.679862707 N .
E 16.4 -3.248757219 E -3.248757219
E 16.9 -3.128994865 E .
E 17.4 -2.892552007 E .
E 17.9 -2.741253675 E .
E 18.4 -2.591966458 E .
E 19.3 -2.42630517 E .
;
run;
data want;
set have;
if firste>. then do p=max(_n_-2,1,p) to min(_n_+2,nobs);
set have point=p nobs=nobs;
output;
end;
run;
When I run the code on a subset of my dataset , I get the right records.
E 16.4 -3.248757219 E -3.248757219
E 16.9 -3.128994865 E .
E 17.4 -2.892552007 E .
E 17.9 -2.741253675 E .
But when I run the code on the whole set , for ID E I get one extra observation
E 16.4 -3.248757219 E -3.248757219
E 16.9 -3.128994865 E .
E 17.4 -2.892552007 E .
E 17.9 -2.741253675 E .
E 19.3 -2.42630517 E .
I am not able to figure out why this extra row gets added. It is only happening for this one record when the full dataset is used (1000 records)
Anyone has a suggestion?
Thanks
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.