BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kurt_Bremser
Super User

Not really. The Host/Engine Dependent Information shows the DBMS-specifics (schema) and the ODBC driver (the .dll on the bottom)

Run

data datameting;
set tms.datameting;
run;

first, and then the code against the dataset in WORK:

data remove8seconds;
do until (last.UniekID);
  set datameting;
/* and so on */

The first step will show you the raw time to simply get the data from the SQL server.

And if your UniekID groups are large enough to drop data out of the buffer, my data step will force a double read of the complete dataset.

With local data (WORK), the operating system will still have previously read dataset pages in the cache, and that alone will speed things up. Not even mentioning the performance difference between the network connection to your SQL server and the I/O bus to your internal disks, which should be substantial.

Please try the above first step, and let us know how long it took. Then run my data step.

AK100
Pyrite | Level 9
I just did this:
`data datameting;
set tms.datameting;
run;`
Its running for 6 minutes now and still running...
Kurt_Bremser
Super User

So you can see that the simple sequential transfer of data alone is taking so much time.

One of the problems here is that SAS cannot push processing down to the database (implicit pass-through), to only retrieve the results.

One additional try:

proc sql;
create table remove8seconds as
  select
    UniekID,
    waarde,
    tjid
  from tms.datameting
  group by UniekID
  having min(tjid) + 8 <= tjid and tjid <= max(tjid) - 8
;
quit;
  

This might enable SAS to push some of the processing to SQL Server, and it should be usable in explicit pass-through, if that is possible through an ODBC connection (I have no experience with that).

But if the result is still a large number of observations, you will still feel the network bottleneck.

AK100
Pyrite | Level 9

@Kurt_Bremser Okay So it finished running the first part (it took about 12 miutes I guess). I now have a new datameting. So now you want me to run the following: Is that correct?

data remove8seconds;
do until (last.UniekID);
  set work.datameting;
  by UniekID;
  if first.UniekID then start = Tijd;
end;
end = Tijd;
do until (last.UniekID);
  set work.datameting;
  by UniekID;
  if start + 8 le Tijd le end - 8 then output;
end;
drop start end;
run;
AK100
Pyrite | Level 9

@Kurt_Bremser  There you go:

AK100_0-1605610961048.png

I must say that below code works super fast:

data remove8seconds;
do until (last.UniekID);
  set work.datameting;
  by UniekID;
  if first.UniekID then start = Tijd;
end;
end = Tijd;
do until (last.UniekID);
  set work.datameting;
  by UniekID;
  if start + 8 le Tijd le end - 8 then output;
end;
drop start end;
run;

So it gives output data but it also gives the following error:

AK100_1-1605611104972.png

 



Kurt_Bremser
Super User

You need to sort the data first; I guess that was implicitly pushed to the DB when you ran the code against the table there.

Run

proc sort data=datameting;
by uniekid tjid;
run;

before running the data step again.

AK100
Pyrite | Level 9

@Kurt_Bremser Okay, finally got it working. Appreciate your help.

 

It seemed that it copied the table from the ODBC to WORK. Is that correct? If so whats the benefit of this and should I always start with below code on every table? It seemd to make things 1000X faster.

data datameting;
set tms.datameting;
run;

And of course also interested in:

proc sort data=datameting;
by UniekID Tijd;
run;

data removeseconds;
do until (last.UniekID);
  set work.datameting;
  by UniekID;
  if first.UniekID then start = Tijd;
end;
end = Tijd;
do until (last.UniekID);
  set work.datameting;
  by UniekID;
  if start + 8 le Tijd le end - 8 then output;
end;
drop start end;
run;

 

Kurt_Bremser
Super User

The first step copies the data to your WORK library, which (usually) resides on your local disk, or a local disk within a server. "Local can" also be a disk storage box attached through a SAN, and these tend to be mighty fast nowadays, faster than a network connection. You can safely assume that you will get the best I/O performance out of WORK, if the SAS administrator is worth his/her pay.

 

Next, in order to use a BY statement in a data step (or any other step that supports a BY), the data has to be sorted accordingly; that's why the extra PROC SORT step is necessary.

 

By using BY in a data step, we get two additional automatic variables for each variable in the BY: first. and last., which signal the first observation in a group and the last one.

 

So a double DO works like this: first, it reads all observations from a group (until last.); during this, we can record certain things, here the tjid value of the first observation of a uniekid group. After the first loop has finished, we have the contents of the last observation of the group in the PDV (Program Data Vector); so I store the tjid value in variable end. We can now use those two stored values to make decisions in the second DO loop, that reads exactly the same observations that the first loop read.

 

The double DO loop is very good when you need to make decisions on observations based on values stored in other observations within the same group. By double-reading a group in succession, most of the time the second read happens out of the disk cache, and not the storage medium itself, which translates to a single sequential physical read of the data, the fastest way to process data from (disk) storage, even in the days of SSD drives.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 40 replies
  • 1165 views
  • 11 likes
  • 4 in conversation