Solved: Re: How to remove the first 3 seconds and last 3 seconds of a measurem... - Page 3

Kurt_Bremser · Posted 11-17-2020 05:09 AM

Not really. The Host/Engine Dependent Information shows the DBMS-specifics (schema) and the ODBC driver (the .dll on the bottom)

Run

data datameting;
set tms.datameting;
run;

first, and then the code against the dataset in WORK:

data remove8seconds;
do until (last.UniekID);
  set datameting;
/* and so on */

The first step will show you the raw time to simply get the data from the SQL server.

And if your UniekID groups are large enough to drop data out of the buffer, my data step will force a double read of the complete dataset.

With local data (WORK), the operating system will still have previously read dataset pages in the cache, and that alone will speed things up. Not even mentioning the performance difference between the network connection to your SQL server and the I/O bus to your internal disks, which should be substantial.

Please try the above first step, and let us know how long it took. Then run my data step.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AK100 · Posted 11-17-2020 05:19 AM

I just did this:
`data datameting;
set tms.datameting;
run;`
Its running for 6 minutes now and still running...

Kurt_Bremser · Posted 11-17-2020 05:30 AM

So you can see that the simple sequential transfer of data alone is taking so much time.

One of the problems here is that SAS cannot push processing down to the database (implicit pass-through), to only retrieve the results.

One additional try:

proc sql;
create table remove8seconds as
  select
    UniekID,
    waarde,
    tjid
  from tms.datameting
  group by UniekID
  having min(tjid) + 8 <= tjid and tjid <= max(tjid) - 8
;
quit;

This might enable SAS to push some of the processing to SQL Server, and it should be usable in explicit pass-through, if that is possible through an ODBC connection (I have no experience with that).

But if the result is still a large number of observations, you will still feel the network bottleneck.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AK100 · Posted 11-17-2020 05:36 AM

@Kurt_Bremser Okay So it finished running the first part (it took about 12 miutes I guess). I now have a new datameting. So now you want me to run the following: Is that correct?

data remove8seconds;
do until (last.UniekID);
  set work.datameting;
  by UniekID;
  if first.UniekID then start = Tijd;
end;
end = Tijd;
do until (last.UniekID);
  set work.datameting;
  by UniekID;
  if start + 8 le Tijd le end - 8 then output;
end;
drop start end;
run;

Kurt_Bremser · Posted 11-17-2020 05:58 AM

Yes, try that next.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 11-17-2020 05:59 AM

PS it would be nice to see the log from that 12-minute transfer step.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AK100 · Posted 11-17-2020 06:05 AM

@Kurt_Bremser There you go:

I must say that below code works super fast:

data remove8seconds;
do until (last.UniekID);
  set work.datameting;
  by UniekID;
  if first.UniekID then start = Tijd;
end;
end = Tijd;
do until (last.UniekID);
  set work.datameting;
  by UniekID;
  if start + 8 le Tijd le end - 8 then output;
end;
drop start end;
run;

So it gives output data but it also gives the following error:

Kurt_Bremser · Posted 11-17-2020 06:09 AM

You need to sort the data first; I guess that was implicitly pushed to the DB when you ran the code against the table there.

Run

proc sort data=datameting;
by uniekid tjid;
run;

before running the data step again.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AK100 · Posted 11-17-2020 06:16 AM

@Kurt_Bremser Okay, finally got it working. Appreciate your help.

It seemed that it copied the table from the ODBC to WORK. Is that correct? If so whats the benefit of this and should I always start with below code on every table? It seemd to make things 1000X faster.

data datameting;
set tms.datameting;
run;

And of course also interested in:

proc sort data=datameting;
by UniekID Tijd;
run;

data removeseconds;
do until (last.UniekID);
  set work.datameting;
  by UniekID;
  if first.UniekID then start = Tijd;
end;
end = Tijd;
do until (last.UniekID);
  set work.datameting;
  by UniekID;
  if start + 8 le Tijd le end - 8 then output;
end;
drop start end;
run;

Kurt_Bremser · Posted 11-17-2020 06:33 AM

The first step copies the data to your WORK library, which (usually) resides on your local disk, or a local disk within a server. "Local can" also be a disk storage box attached through a SAN, and these tend to be mighty fast nowadays, faster than a network connection. You can safely assume that you will get the best I/O performance out of WORK, if the SAS administrator is worth his/her pay.

Next, in order to use a BY statement in a data step (or any other step that supports a BY), the data has to be sorted accordingly; that's why the extra PROC SORT step is necessary.

By using BY in a data step, we get two additional automatic variables for each variable in the BY: first. and last., which signal the first observation in a group and the last one.

So a double DO works like this: first, it reads all observations from a group (until last.); during this, we can record certain things, here the tjid value of the first observation of a uniekid group. After the first loop has finished, we have the contents of the last observation of the group in the PDV (Program Data Vector); so I store the tjid value in variable end. We can now use those two stored values to make decisions in the second DO loop, that reads exactly the same observations that the first loop read.

The double DO loop is very good when you need to make decisions on observations based on values stored in other observations within the same group. By double-reading a group in succession, most of the time the second read happens out of the disk cache, and not the storage medium itself, which translates to a single sequential physical read of the data, the fastest way to process data from (disk) storage, even in the days of SSD drives.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 11-17-2020 06:35 AM

PS you can combine the first two steps:

proc sort
  data=tms.datameting
  out=datameting
;
by UniekID Tijd;
run;

which should be a little faster than the two separate steps combined.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Re: How to remove the first 3 seconds and last 3 seconds of a measurement in SAS?

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away