SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

The SAS URL engine seems a bit slow

Reply
New Contributor
Posts: 4

The SAS URL engine seems a bit slow

I tried experimenting reading a plain vanila txt file containing numbers in fixed width format and read in just one variable at the beginning of the record (lrecl=65557) (number of records =1000) using the SAS url engine.  The data itself is about 64MB (a slice of a much bigger file) and is stored on a MS ONE DRIVE space with a url to reference the data set with.  The ONE DRIVE files also live on the same university net so I don't have to battle the entire Internet to get at the data.  

 

It took about 4 wall clock minutes to chew thru the data set but just less than 5 CPU secs, so I know that the hang up is I/O.  I don't suspect heavy network traffic to be the culprit but am more inclined to point the finger either at the unknown shims necessary to make ONE DRIVE work or at the SAS url engine.  I know that the efficiencies of SAS engines can vary significantly, so I am wondering where the hold up is.  If I read the same data set off of a remote PC using 100mps wifi and battling other animals for bandwidth thru CITRIX it takes ~ 2 minutes to read the file.  

 

Anyone have any experience with this type of issue?

 

cheers,

max

 

Super User
Posts: 19,860

Re: The SAS URL engine seems a bit slow

One variable with 1000 lines? 

 

I just read 700k+ records from a 102MB file in under 30 seconds using SAS Studio...

 

Can you post your code so we can replicate it? I can upload that file I'm working with now to my public OneDrive and see how slow it is for me. 

New Contributor
Posts: 4

Re: The SAS URL engine seems a bit slow

the code couldn't be simpler.  I removed the actual url reference for security purposes but here is the code

 

filename bigdata url "url here redacted for security purposes;
infile bigdata lrecl=65576;
input
resp_id 1-7;
run;
proc contents data=mydata;
run;

Super User
Posts: 19,860

Re: The SAS URL engine seems a bit slow

Well, let's try some very basic fixes first, if you're only reading the first 7 chars can you change LRECL? Apparently I can't test these at home Smiley Sad

 

filename bigdata url "url here redacted for security purposes";

data myData;
 infile bigdata lrecl=7;
 input resp_id 1-7;
run;

proc contents data=mydata;
run;

 

New Contributor
Posts: 4

Re: The SAS URL engine seems a bit slow

Thanks for the initial advice and that would work nicely if the problem was a bit simpler but...

 

Well, the idea is that eventually we will be reading in vars from across the entire width of the data set and the one variable experiment was just that - an experiment to see if 1) you could read in a data file from ONE DRIVE and 2) what kind of times are we talking here.  So to reduce the data file or LRECL in the end would be self-defeating in the end.

Super User
Posts: 19,860

Re: The SAS URL engine seems a bit slow

Does OneDrive not work similarly to Box/DropBox where you have a local copy and it syncs them? 

That's what I use daily with no issues. 

New Contributor
Posts: 4

Re: The SAS URL engine seems a bit slow

Typically yes you are correct that it is similar to dropbox in terms of syncing to a local copy. The difference here is that the data file is shared by another user and that eliminates the CITRIX client from reading the remote copy off a PC and instead reads it off the ONE DRIVE storage drive.


cheers,

max

Super User
Posts: 5,437

Re: The SAS URL engine seems a bit slow

As I see there are three points of concern:
- one drive service/server
- network
- SAS
Move this file outside SAS to the same host were SAS executes, e.g. copy in Windows explorer.of it takes similar amount of time you need to direct this to one drive/network crew.
Data never sleeps
Ask a Question
Discussion stats
  • 7 replies
  • 207 views
  • 0 likes
  • 3 in conversation