BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AK100
Pyrite | Level 9

Hello Guys,|

Simple question. I have a dataset which looks like below (in reality much bigger). I want to remove the first 3 seconds and the last 3 seconds of every measurement. How can that easily be done?

Value   -   DateTime

5          -    0.0

15        -    1.5

74       -     2.2

19       -    3.1

13       -    4.2

71       -     4.9

40       -    5.6

31       -    6.8

87        -     7.8

10       -    8.4

11       -    9.3

47      -   10.1

9       -    0.0

11       -    0.9

7        -     1.9

10       -    2.8

11       -    4.5

74       -     5.2

10       -    6.6

11       -    7.1

71       -     7.8

80       -    8.9

16       -    9.9

15        -     10.9


So I actually want the BOLD ones removed.. Would like to hear. Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

The first step copies the data to your WORK library, which (usually) resides on your local disk, or a local disk within a server. "Local can" also be a disk storage box attached through a SAN, and these tend to be mighty fast nowadays, faster than a network connection. You can safely assume that you will get the best I/O performance out of WORK, if the SAS administrator is worth his/her pay.

 

Next, in order to use a BY statement in a data step (or any other step that supports a BY), the data has to be sorted accordingly; that's why the extra PROC SORT step is necessary.

 

By using BY in a data step, we get two additional automatic variables for each variable in the BY: first. and last., which signal the first observation in a group and the last one.

 

So a double DO works like this: first, it reads all observations from a group (until last.); during this, we can record certain things, here the tjid value of the first observation of a uniekid group. After the first loop has finished, we have the contents of the last observation of the group in the PDV (Program Data Vector); so I store the tjid value in variable end. We can now use those two stored values to make decisions in the second DO loop, that reads exactly the same observations that the first loop read.

 

The double DO loop is very good when you need to make decisions on observations based on values stored in other observations within the same group. By double-reading a group in succession, most of the time the second read happens out of the disk cache, and not the storage medium itself, which translates to a single sequential physical read of the data, the fastest way to process data from (disk) storage, even in the days of SSD drives.

View solution in original post

40 REPLIES 40
PeterClemmensen
Tourmaline | Level 20

Why the first 4 in the second measurement?

 

Do you have an ID for each of your measurements?

AK100
Pyrite | Level 9

I need the first and last 3 seconds removed because don't actually add something. The client doesn't need them. Yes, I forgot to say but they also have an ID yes. So record 1 till 12 have the same ID record 13 till 24 to etc..

PeterClemmensen
Tourmaline | Level 20

Ok. Do all IDs have 12 observations exactly? 

PeterClemmensen
Tourmaline | Level 20

If so, you can do it in a single pass:

 

data have;
input ID Value DateTime;
datalines;
1 5 0.0
1 15 1.5
1 74 2.2
1 19 3.1
1 13 4.2
1 71 4.9
1 40 5.6
1 31 6.8
1 87 7.8
1 10 8.4
1 11 9.3
1 47 10.1
2 9 0.0
2 11 0.9
2 7 1.9
2 10 2.8
2 11 4.5
2 74 5.2
2 10 6.6
2 11 7.1
2 71 7.8
2 80 8.9
2 16 9.9
2 15 10.9
;

data want;
   do _N_ = 1 by 1 until (last.ID);
      set have;
      by ID;
      if _N_ not in (1:3, 10:12) then output;
   end;
run;
Kurt_Bremser
Super User

Use a double DO loop. In the first loop, record first and last time. In the second loop, only OUTPUT when time is between start + 3 and end - 3.

For code, provide data in usable form.

AK100
Pyrite | Level 9

@Kurt_Bremser 
See usable data below:

data have;
input ID Value DateTime;
datalines;
1 5 0.0
1 15 1.5
1 74 2.2
1 19 3.1
1 13 4.2
1 71 4.9
1 40 5.6
1 31 6.8
1 87 7.8
1 10 8.4
1 11 9.3
1 47 10.1
2 9 0.0
2 11 0.9
2 7 1.9
2 10 2.8
2 11 4.5
2 74 5.2
2 10 6.6
2 11 7.1
2 71 7.8
2 80 8.9
2 16 9.9
2 15 10.9
;

 Please note that not every measurements has 12 observations. 

PeterClemmensen
Tourmaline | Level 20
data want(drop=i j);
   do i = 1 by 1 until (last.ID);
      set have;
      by ID;
   end;
   do j = 1 by 1 until (last.ID);
      set have;
      by ID;
      if 3 < j < i-2 then output;
   end;
run;

 

Result:

 

ID  Value DateTime 
1   19    3.1 
1   13    4.2 
1   71    4.9 
1   40    5.6 
1   31    6.8 
1   87    7.8 
2   10    2.8 
2   11    4.5 
2   74    5.2 
2   10    6.6 
2   11    7.1 
2   71    7.8 

 

Kurt_Bremser
Super User
data want;
do until (last.id);
  set have;
  by id;
  if first.id then start = datetime;
  if last.id then end = datetime;
end;
do until (last.id);
  set have;
  by id;
  if start + 3 le datetime le end - 3 then output;
end;
drop start end;
run;
FreelanceReinh
Jade | Level 19

Hello @AK100,

 

If performance is an issue, you may want to try a single-pass solution.

 

EDIT: The code below counts observations, not seconds. Assuming that the first 3 seconds correspond to 0<=DateTime<=3 (although these don't seem to be SAS datetime values), this could be adapted, but without a known end time the definition of the "last 3 seconds" seems doubtful to me.

data want(drop=_:);
do _i=1 by 1 until(last.id);
  set have;
  by id;
  if _i>3 then do; /* conditional only for performance improvement */
    _v=lag2(value);
    _dt=lag2(datetime);
  end;
  if _i>5 & last.id=0 then do;
    value=_v;
    datetime=_dt;
    output;
  end;
end;
run;

 

AK100
Pyrite | Level 9
@FreelanceReinhard.

So this is the same as above, but much faster?
FreelanceReinh
Jade | Level 19

The result is the same (EDIT: as obtained with draycut's solution), but the code should run somewhat faster -- how much faster depends, as usual, on hardware and software conditions, but also on the data (here, for example: the distribution of the number of observations per ID).

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 40 replies
  • 1016 views
  • 11 likes
  • 4 in conversation