BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AK100
Pyrite | Level 9
Where does the 5 stand for in your code?
FreelanceReinh
Jade | Level 19

The LAG2 values in the 5th observation of an ID would come from the 3rd observation (because 5-2=3), hence would not qualify for output if the first three observations are to be excluded. Therefore, _i>5 is a necessary condition for the OUTPUT statement to be executed.

AK100
Pyrite | Level 9
@FreelanceReinhard. So what if I wanted to remove the first and last 8 seconds.
FreelanceReinh
Jade | Level 19

Then you'd change the IF conditions and the length of the LAG queues as follows:

...
  if _i>8 then do; /* conditional only for performance improvement */
    _v=lag7(value);
    _dt=lag7(datetime);
  end;
  if _i>15 & last.id=0 then do;
...

To make the code flexible, you could use macro expressions involving a macro variable, say, d=8, in these four places. But, again, this code doesn't care for seconds, only for observations.

FreelanceReinh
Jade | Level 19

I fully agree with KurtBremser. But then you need to explain why the 9th observation of your sample data (with Value 87 and DateTime 7.8) did not qualify for removal, although it is clearly within the last 3 seconds -- if these "3 seconds" are measured from the last DateTime for the ID, i.e. 10.1 in this case.

AK100
Pyrite | Level 9

@FreelanceReinh @Kurt_Bremser @PeterClemmensen 

I tried all your suggestions. Im sure they all would work but the point is that it takes hours to run ... I did not wait for the result because my SAS EG started to crash and it took me to long.

 

In the usable data I showed you above I indeed asked for the first 3 and last 3 seconds, but in the original data 1 measurement is much much longer (approx. 40 seconds and it has around 1500 records). And every day has maybe hundreds of measurements and the I have data of more then 1,5 year.. So you can already count how much records I have. Should be millions+. 

Got such a headache from this continually crashing...

FreelanceReinh
Jade | Level 19

@AK100 wrote:

@FreelanceReinh @Kurt_Bremser @PeterClemmensen 

I tried all your suggestions. Im sure they all would work ...


Well, they yield different results even on your sample data and neither result matches your desired outcome (as stated in the initial post). So, I think you should clarify the requirements before we go on and address performance issues.

PeterClemmensen
Tourmaline | Level 20

The data below has just short of 150M obs. @Kurt_Bremsers code takes about 2 minutes to run on my laptop.

 

Try running this code and see what happens. Otherwise, post a representable sample of your actual data.

 

data have(drop = n i);
   do ID = 1 to (1000 * 365 * 2);
      n = 100 + floor(rand("uniform") * (1 + 300 - 100));
      do i = 1 to n;
         DateTime = sum(DateTime, round(rand('uniform'), .1));
         value = ceil(rand('uniform') * 100);
         output;
      end;
   end;
run;

data want;
do until (last.id);
  set have;
  by id;
  if first.id then start = datetime;
  if last.id then end = datetime;
end;
do until (last.id);
  set have;
  by id;
  if start + 3 le datetime le end - 3 then output;
end;
drop start end;
run;
Kurt_Bremser
Super User

I have found a little potential for performance improvement:

data want;
do until (last.id);
  set have;
  by id;
  if first.id then start = datetime;
end;
end = datetime;
do until (last.id);
  set have;
  by id;
  if start + 3 le datetime le end - 3 then output;
end;
drop start end;
run;

One comparison less in the first loop.

 

@AK100 If that code causes a crash, then something else is the issue. Run a PROC CONTENTS on your real "have" dataset and show us the output.

AND post the code you really use on that dataset.

AK100
Pyrite | Level 9

@Kurt_Bremser 

 

AK100_0-1605604329078.png

@Kurt_Bremser This is the code I tried and its still running now for few minutes...

data remove8seconds;
do until (last.UniekID);
  set tms.datameting;
  by UniekID;
  if first.UniekID then start = Tijd;
end;
end = Tijd;
do until (last.UniekID);
  set tms.datameting;
  by UniekID;
  if start + 8 le Tijd le end - 8 then output;
end;
drop start end;
run;
Kurt_Bremser
Super User

Ouch. You are running this against a DBMS table through ODBC. No miracle this performs like watching the paint dry.

Transfer the table to your SAS environment first, then run the code against it.

AK100
Pyrite | Level 9

@Kurt_Bremser Isn't it already in my SAS environment? 

AK100_0-1605605808366.png

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 40 replies
  • 1113 views
  • 11 likes
  • 4 in conversation