Solved: How to remove the first 3 seconds and last 3 seconds of a measurement ...

AK100 · Posted 11-16-2020 03:46 AM

Hello Guys,|

Simple question. I have a dataset which looks like below (in reality much bigger). I want to remove the first 3 seconds and the last 3 seconds of every measurement. How can that easily be done?

Value - DateTime

5 - 0.0

15 - 1.5

74 - 2.2

19 - 3.1

13 - 4.2

71 - 4.9

40 - 5.6

31 - 6.8

87 - 7.8

10 - 8.4

11 - 9.3

47 - 10.1

9 - 0.0

11 - 0.9

7 - 1.9

10 - 2.8

11 - 4.5

74 - 5.2

10 - 6.6

11 - 7.1

71 - 7.8

80 - 8.9

16 - 9.9

15 - 10.9

So I actually want the BOLD ones removed.. Would like to hear. Thank you

Kurt_Bremser · Posted 11-17-2020 06:33 AM

The first step copies the data to your WORK library, which (usually) resides on your local disk, or a local disk within a server. "Local can" also be a disk storage box attached through a SAN, and these tend to be mighty fast nowadays, faster than a network connection. You can safely assume that you will get the best I/O performance out of WORK, if the SAS administrator is worth his/her pay.

Next, in order to use a BY statement in a data step (or any other step that supports a BY), the data has to be sorted accordingly; that's why the extra PROC SORT step is necessary.

By using BY in a data step, we get two additional automatic variables for each variable in the BY: first. and last., which signal the first observation in a group and the last one.

So a double DO works like this: first, it reads all observations from a group (until last.); during this, we can record certain things, here the tjid value of the first observation of a uniekid group. After the first loop has finished, we have the contents of the last observation of the group in the PDV (Program Data Vector); so I store the tjid value in variable end. We can now use those two stored values to make decisions in the second DO loop, that reads exactly the same observations that the first loop read.

The double DO loop is very good when you need to make decisions on observations based on values stored in other observations within the same group. By double-reading a group in succession, most of the time the second read happens out of the disk cache, and not the storage medium itself, which translates to a single sequential physical read of the data, the fastest way to process data from (disk) storage, even in the days of SSD drives.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

PeterClemmensen · Posted 11-16-2020 03:50 AM

Why the first 4 in the second measurement?

Do you have an ID for each of your measurements?

The DATA to DATA Step Macro
Blog: SASnrd

AK100 · Posted 11-16-2020 03:59 AM

I need the first and last 3 seconds removed because don't actually add something. The client doesn't need them. Yes, I forgot to say but they also have an ID yes. So record 1 till 12 have the same ID record 13 till 24 to etc..

PeterClemmensen · Posted 11-16-2020 04:08 AM

Ok. Do all IDs have 12 observations exactly?

The DATA to DATA Step Macro
Blog: SASnrd

AK100 · Posted 11-16-2020 04:13 AM

@PeterClemmensen No they don't all have 12 observations.

PeterClemmensen · Posted 11-16-2020 04:10 AM

If so, you can do it in a single pass:

data have;
input ID Value DateTime;
datalines;
1 5 0.0
1 15 1.5
1 74 2.2
1 19 3.1
1 13 4.2
1 71 4.9
1 40 5.6
1 31 6.8
1 87 7.8
1 10 8.4
1 11 9.3
1 47 10.1
2 9 0.0
2 11 0.9
2 7 1.9
2 10 2.8
2 11 4.5
2 74 5.2
2 10 6.6
2 11 7.1
2 71 7.8
2 80 8.9
2 16 9.9
2 15 10.9
;

data want;
   do _N_ = 1 by 1 until (last.ID);
      set have;
      by ID;
      if _N_ not in (1:3, 10:12) then output;
   end;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Kurt_Bremser · Posted 11-16-2020 04:09 AM

Use a double DO loop. In the first loop, record first and last time. In the second loop, only OUTPUT when time is between start + 3 and end - 3.

For code, provide data in usable form.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AK100 · Posted 11-16-2020 04:19 AM

@Kurt_Bremser
See usable data below:

data have;
input ID Value DateTime;
datalines;
1 5 0.0
1 15 1.5
1 74 2.2
1 19 3.1
1 13 4.2
1 71 4.9
1 40 5.6
1 31 6.8
1 87 7.8
1 10 8.4
1 11 9.3
1 47 10.1
2 9 0.0
2 11 0.9
2 7 1.9
2 10 2.8
2 11 4.5
2 74 5.2
2 10 6.6
2 11 7.1
2 71 7.8
2 80 8.9
2 16 9.9
2 15 10.9
;

Please note that not every measurements has 12 observations.

PeterClemmensen · Posted 11-16-2020 04:28 AM

data want(drop=i j);
   do i = 1 by 1 until (last.ID);
      set have;
      by ID;
   end;
   do j = 1 by 1 until (last.ID);
      set have;
      by ID;
      if 3 < j < i-2 then output;
   end;
run;

Result:

ID  Value DateTime 
1   19    3.1 
1   13    4.2 
1   71    4.9 
1   40    5.6 
1   31    6.8 
1   87    7.8 
2   10    2.8 
2   11    4.5 
2   74    5.2 
2   10    6.6 
2   11    7.1 
2   71    7.8

The DATA to DATA Step Macro
Blog: SASnrd

AK100 · Posted 11-16-2020 04:57 AM

Thank you. I will check it

Kurt_Bremser · Posted 11-16-2020 04:30 AM

data want;
do until (last.id);
  set have;
  by id;
  if first.id then start = datetime;
  if last.id then end = datetime;
end;
do until (last.id);
  set have;
  by id;
  if start + 3 le datetime le end - 3 then output;
end;
drop start end;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AK100 · Posted 11-16-2020 04:58 AM

Thank you. I will check it.

FreelanceReinh · Posted 11-16-2020 05:18 AM

Hello @AK100,

If performance is an issue, you may want to try a single-pass solution.

EDIT: The code below counts observations, not seconds. Assuming that the first 3 seconds correspond to 0<=DateTime<=3 (although these don't seem to be SAS datetime values), this could be adapted, but without a known end time the definition of the "last 3 seconds" seems doubtful to me.

data want(drop=_:);
do _i=1 by 1 until(last.id);
  set have;
  by id;
  if _i>3 then do; /* conditional only for performance improvement */
    _v=lag2(value);
    _dt=lag2(datetime);
  end;
  if _i>5 & last.id=0 then do;
    value=_v;
    datetime=_dt;
    output;
  end;
end;
run;

AK100 · Posted 11-16-2020 05:22 AM

@FreelanceReinhard.

So this is the same as above, but much faster?

FreelanceReinh · Posted 11-16-2020 05:32 AM

The result is the same (EDIT: as obtained with draycut's solution), but the code should run somewhat faster -- how much faster depends, as usual, on hardware and software conditions, but also on the data (here, for example: the distribution of the number of observations per ID).

How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Classroom Training Available!