BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AK100
Pyrite | Level 9
Where does the 5 stand for in your code?
FreelanceReinh
Jade | Level 19

The LAG2 values in the 5th observation of an ID would come from the 3rd observation (because 5-2=3), hence would not qualify for output if the first three observations are to be excluded. Therefore, _i>5 is a necessary condition for the OUTPUT statement to be executed.

AK100
Pyrite | Level 9
@FreelanceReinhard. So what if I wanted to remove the first and last 8 seconds.
FreelanceReinh
Jade | Level 19

Then you'd change the IF conditions and the length of the LAG queues as follows:

...
  if _i>8 then do; /* conditional only for performance improvement */
    _v=lag7(value);
    _dt=lag7(datetime);
  end;
  if _i>15 & last.id=0 then do;
...

To make the code flexible, you could use macro expressions involving a macro variable, say, d=8, in these four places. But, again, this code doesn't care for seconds, only for observations.

FreelanceReinh
Jade | Level 19

I fully agree with KurtBremser. But then you need to explain why the 9th observation of your sample data (with Value 87 and DateTime 7.8) did not qualify for removal, although it is clearly within the last 3 seconds -- if these "3 seconds" are measured from the last DateTime for the ID, i.e. 10.1 in this case.

AK100
Pyrite | Level 9

@FreelanceReinh @Kurt_Bremser @PeterClemmensen 

I tried all your suggestions. Im sure they all would work but the point is that it takes hours to run ... I did not wait for the result because my SAS EG started to crash and it took me to long.

 

In the usable data I showed you above I indeed asked for the first 3 and last 3 seconds, but in the original data 1 measurement is much much longer (approx. 40 seconds and it has around 1500 records). And every day has maybe hundreds of measurements and the I have data of more then 1,5 year.. So you can already count how much records I have. Should be millions+. 

Got such a headache from this continually crashing...

FreelanceReinh
Jade | Level 19

@AK100 wrote:

@FreelanceReinh @Kurt_Bremser @PeterClemmensen 

I tried all your suggestions. Im sure they all would work ...


Well, they yield different results even on your sample data and neither result matches your desired outcome (as stated in the initial post). So, I think you should clarify the requirements before we go on and address performance issues.

PeterClemmensen
Tourmaline | Level 20

The data below has just short of 150M obs. @Kurt_Bremsers code takes about 2 minutes to run on my laptop.

 

Try running this code and see what happens. Otherwise, post a representable sample of your actual data.

 

data have(drop = n i);
   do ID = 1 to (1000 * 365 * 2);
      n = 100 + floor(rand("uniform") * (1 + 300 - 100));
      do i = 1 to n;
         DateTime = sum(DateTime, round(rand('uniform'), .1));
         value = ceil(rand('uniform') * 100);
         output;
      end;
   end;
run;

data want;
do until (last.id);
  set have;
  by id;
  if first.id then start = datetime;
  if last.id then end = datetime;
end;
do until (last.id);
  set have;
  by id;
  if start + 3 le datetime le end - 3 then output;
end;
drop start end;
run;
Kurt_Bremser
Super User

I have found a little potential for performance improvement:

data want;
do until (last.id);
  set have;
  by id;
  if first.id then start = datetime;
end;
end = datetime;
do until (last.id);
  set have;
  by id;
  if start + 3 le datetime le end - 3 then output;
end;
drop start end;
run;

One comparison less in the first loop.

 

@AK100 If that code causes a crash, then something else is the issue. Run a PROC CONTENTS on your real "have" dataset and show us the output.

AND post the code you really use on that dataset.

AK100
Pyrite | Level 9

@Kurt_Bremser 

 

AK100_0-1605604329078.png

@Kurt_Bremser This is the code I tried and its still running now for few minutes...

data remove8seconds;
do until (last.UniekID);
  set tms.datameting;
  by UniekID;
  if first.UniekID then start = Tijd;
end;
end = Tijd;
do until (last.UniekID);
  set tms.datameting;
  by UniekID;
  if start + 8 le Tijd le end - 8 then output;
end;
drop start end;
run;
Kurt_Bremser
Super User

Ouch. You are running this against a DBMS table through ODBC. No miracle this performs like watching the paint dry.

Transfer the table to your SAS environment first, then run the code against it.

AK100
Pyrite | Level 9

@Kurt_Bremser Isn't it already in my SAS environment? 

AK100_0-1605605808366.png

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 40 replies
  • 2805 views
  • 11 likes
  • 4 in conversation