BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PhuongNguyen
Obsidian | Level 7

Hello,

Please help in this.

I would like to select 5 rows before and 5 row after a particular observation in a data set. I have tried and searched for the appropriated codes but had no any answer.  As the picture below

data.png

Basing on 'edate', whenever edate takes a value (for here is 21-APR-2011)  I want to take 5 rows before that point and 5 rows after that point. Any suggestion? Thank you.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

The solution suggested by @FreelanceReinh looks good, but it can be made a little simpler and safer:

data want;
set have;
if edate>. then do p=max(_n_-5,1) to min(_n_+5,nobs);
  set have point=p nobs=nobs;
  output;
  end;
run;

- using MAX and MIN functions is the easy way to make sure that you do not try to read non-existing observations.

 

One word of warning: if your table contains deleted observations (e.g. because they have been edited with FSEDIT, or somebody used an SQL DELETE FROM statement on it), this will not work correctly. In that case you should copy your input to a temporary table first, like

data Work.temp;
  set mylib.have;
run;

so that the table can be accessed with POINT= without problems.

 

One question: If you have two valid Edates within 10 rows from each other, do you then want some of the observations output twice?

If not, you may want to do something like this:

data want;
set have;
if edate>. then do p=max(_n_-5,1,p+1) to min(_n_+5,nobs);
  set have point=p nobs=nobs;
  output;
  end;
run;

so that the same observation is not output twice. The point variable (P) is automatically initialized to 0 and retained.

View solution in original post

13 REPLIES 13
FreelanceReinh
Jade | Level 19

Hello @PhuongNguyen and welcome to the SAS Support Communities!

 

Here is a basic code example for your task:

data want;
set have;
if edate>. then do p=_n_-5 to _n_+5;
  if 1<=p<=n then do;
    set have point=p nobs=n;
    output;
  end;
end;
run;

You may want to extend the code, for example, in order to exclude the middle observation (and keep only the 5+5, if desired), to avoid duplicate observations if two ranges of +/-5 overlap or to ensure that the 10 or 11 selected observations of a range belong to the same RIC.

 

 

PS: In future posts please provide sample data in the form of a datastep (see How to convert datasets to data steps) because it is very  difficult, even for SAS, to read data from screenshots.

PhuongNguyen
Obsidian | Level 7

Hi @FreelanceReinh,

I add 'by RIC' to distinguish among the RIC, and it says the POINT= option is incompatible with the BY statement. Can you help?

FreelanceReinh
Jade | Level 19

Sure. The idea is not to use a BY statement, but to compare RIC values:

data want;
set have;
if edate>. then do p=max(_n_-5,1) to min(_n_+5,nobs);
  set have(rename=(ric=ric1)) point=p nobs=nobs;
  if ric=ric1 then output;
end;
drop ric1;
run;

 

 

s_lassen
Meteorite | Level 14

The solution suggested by @FreelanceReinh looks good, but it can be made a little simpler and safer:

data want;
set have;
if edate>. then do p=max(_n_-5,1) to min(_n_+5,nobs);
  set have point=p nobs=nobs;
  output;
  end;
run;

- using MAX and MIN functions is the easy way to make sure that you do not try to read non-existing observations.

 

One word of warning: if your table contains deleted observations (e.g. because they have been edited with FSEDIT, or somebody used an SQL DELETE FROM statement on it), this will not work correctly. In that case you should copy your input to a temporary table first, like

data Work.temp;
  set mylib.have;
run;

so that the table can be accessed with POINT= without problems.

 

One question: If you have two valid Edates within 10 rows from each other, do you then want some of the observations output twice?

If not, you may want to do something like this:

data want;
set have;
if edate>. then do p=max(_n_-5,1,p+1) to min(_n_+5,nobs);
  set have point=p nobs=nobs;
  output;
  end;
run;

so that the same observation is not output twice. The point variable (P) is automatically initialized to 0 and retained.

FreelanceReinh
Jade | Level 19

Thanks @s_lassen, good points!

 

I think to avoid duplicates without skipping observations the DO loop should start at

max(_n_-5,1,p)

because before leaving the previous DO loop p was incremented once more.

PhuongNguyen
Obsidian | Level 7

Many thanks to @FreelanceReinh , @s_lassen, and @Ksharp 

The codes work perfect. They are even more than what I expected. I will do datastep next times. Thank you.

Kc2
Quartz | Level 8 Kc2
Quartz | Level 8

Hello FreelanceReinhard,

 

I have a similar issue but the code provided is not solving it. I want the 2 records before and 2 records bracketing a non missing FIRSTE value. So if there aren't  two records before or two records after the non missing value of FIRSTE then all the records for that ID get deleted. In the data below only the records from C and D should be in the final dataset but with the code

 


data have;
input ID $ AGE RATE TYPE $ FIRSTE;
datalines ;
A 14.1 -4.19560471 N .
A 15.1 -4.186550471 E -4.186550471
A 15.3 -4.041576561 E .
A 15.6 -3.984641999 E .
A 15.8 -4.079763059 E .
A 16.2 -4.198425079 E .
B 6.2 -1.118603619 N .
B 12.25 -3.251832999 N .
B 12.4 -3.158418124 N .
B 14.8 -3.739770279 N .
B 19.1 -3.980687937 E -3.980687937
B 19.6 -3.901730725 E .
C 4.3 2.1339735869 N .
C 5.63 2.1571201461 N .
C 5.6 2.1571201461 E 2.1571201461
C 6.1 2.6609995525 E .
C 6.7 2.1939735869 E .
C 7.1 2.6991891929 E .
D 2.4 -0.35998761 N .
D 2.6 -0.650544267 N .
D 3.2 -0.817670577 E -0.817670577
D 3.7 -0.659657975 E .
D 4.2 -0.370159291 E .
;
run;

data want;
set have;
if firste>. then do p=max(_n_-2,1,p) to min(_n_+2,nobs);
set have point=p nobs=nobs;
output;
end;
run;

 

 

How can I eliminate records which do not match my condition?

thanks

Kc2
Quartz | Level 8 Kc2
Quartz | Level 8

I was able to get to my desired results by adding another step

proc sql;
create table final as
select *, count(id) as cnt
from want
group by id
having calculated cnt >=5
;
quit;

 

Does anyone know if it is possible to do it in one step?

Nici
Obsidian | Level 7

Hello s_lassen,

 

I'm using the code you provided here for a similar problem as the one shown here. However, I have to add a statement to make sure that the -5/+5 rows selected still have the same RIC. If the RIC changes after the +3rd obs, SAS should stop and then only select -5/+3. How do I have to adjust your code to make this happen?

 

Thank you in advance.

 

BR

Nici

omkana
Calcite | Level 5

What if the there is a row before or after 21-APR-2011 and the result shows both rows. But, I want the result to show only 21-APR-2011, so how can I adjusted the codes?

 

Thank you 

Ksharp
Super User

Post your data as SAS data step, not picture.Nobody would like type it for you .

Kc2
Quartz | Level 8 Kc2
Quartz | Level 8

Hello,

 

has anyone had an issue with the code:

if firste>. then do p=max(_n_-2,1,p) to min(_n_+2,nobs);
set have point=p nobs=nobs;

 

I am having issue with the record ID E. 

data have;
input ID $ AGE RATE TYPE $ FIRSTE;
datalines ;
A 14.1 -4.19560471 N .
A 15.1 -4.186550471 E -4.186550471
A 15.3 -4.041576561 E .
A 15.6 -3.984641999 E .
A 15.8 -4.079763059 E .
A 16.2 -4.198425079 E .
B 6.2 -1.118603619 N .
B 12.25 -3.251832999 N .
B 12.4 -3.158418124 N .
B 14.8 -3.739770279 N .
B 19.1 -3.980687937 E -3.980687937
B 19.6 -3.901730725 E .
C 4.3 2.1339735869 N .
C 5.4 2.1571201461 N .
C 5.6 2.1571201461 E 2.1571201461
C 6.1 2.6609995525 E .
C 6.7 2.1939735869 E .
C 7.1 2.6991891929 E .
D 2.4 -0.35998761 N .
D 2.6 -0.650544267 N .
D 3.2 -0.817670577 E -0.817670577
D 3.7 -0.659657975 E .
D 4.2 -0.370159291 E .
E 15.6 -3.679862707 N .
E 16.4 -3.248757219 E -3.248757219
E 16.9 -3.128994865 E .
E 17.4 -2.892552007 E .
E 17.9 -2.741253675 E .
E 18.4 -2.591966458 E .
E 19.3 -2.42630517 E .
;
run;

 

data want;
set have;
if firste>. then do p=max(_n_-2,1,p) to min(_n_+2,nobs);
set have point=p nobs=nobs;
output;
end;
run;

 

When I run the code on a subset of my dataset , I get the right records. 

E 16.4 -3.248757219 E -3.248757219
E 16.9 -3.128994865 E .
E 17.4 -2.892552007 E .
E 17.9 -2.741253675 E .

But when I run the code on the whole set , for ID E I get one extra observation

E 16.4 -3.248757219 E -3.248757219
E 16.9 -3.128994865 E .
E 17.4 -2.892552007 E .
E 17.9 -2.741253675 E .

E 19.3 -2.42630517 E .

 

I am not able to figure out why this extra row gets added. It is only happening  for this one record when the full dataset is used (1000 records)

 

Anyone has a suggestion?

Thanks

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 9858 views
  • 6 likes
  • 7 in conversation