BookmarkSubscribeRSS Feed
mkilger
Calcite | Level 5

I tried experimenting reading a plain vanila txt file containing numbers in fixed width format and read in just one variable at the beginning of the record (lrecl=65557) (number of records =1000) using the SAS url engine.  The data itself is about 64MB (a slice of a much bigger file) and is stored on a MS ONE DRIVE space with a url to reference the data set with.  The ONE DRIVE files also live on the same university net so I don't have to battle the entire Internet to get at the data.  

 

It took about 4 wall clock minutes to chew thru the data set but just less than 5 CPU secs, so I know that the hang up is I/O.  I don't suspect heavy network traffic to be the culprit but am more inclined to point the finger either at the unknown shims necessary to make ONE DRIVE work or at the SAS url engine.  I know that the efficiencies of SAS engines can vary significantly, so I am wondering where the hold up is.  If I read the same data set off of a remote PC using 100mps wifi and battling other animals for bandwidth thru CITRIX it takes ~ 2 minutes to read the file.  

 

Anyone have any experience with this type of issue?

 

cheers,

max

 

7 REPLIES 7
Reeza
Super User

One variable with 1000 lines? 

 

I just read 700k+ records from a 102MB file in under 30 seconds using SAS Studio...

 

Can you post your code so we can replicate it? I can upload that file I'm working with now to my public OneDrive and see how slow it is for me. 

mkilger
Calcite | Level 5

the code couldn't be simpler.  I removed the actual url reference for security purposes but here is the code

 

filename bigdata url "url here redacted for security purposes;
infile bigdata lrecl=65576;
input
resp_id 1-7;
run;
proc contents data=mydata;
run;

Reeza
Super User

Well, let's try some very basic fixes first, if you're only reading the first 7 chars can you change LRECL? Apparently I can't test these at home 😞

 

filename bigdata url "url here redacted for security purposes";

data myData;
 infile bigdata lrecl=7;
 input resp_id 1-7;
run;

proc contents data=mydata;
run;

 

mkilger
Calcite | Level 5

Thanks for the initial advice and that would work nicely if the problem was a bit simpler but...

 

Well, the idea is that eventually we will be reading in vars from across the entire width of the data set and the one variable experiment was just that - an experiment to see if 1) you could read in a data file from ONE DRIVE and 2) what kind of times are we talking here.  So to reduce the data file or LRECL in the end would be self-defeating in the end.

Reeza
Super User

Does OneDrive not work similarly to Box/DropBox where you have a local copy and it syncs them? 

That's what I use daily with no issues. 

mkilger
Calcite | Level 5
Typically yes you are correct that it is similar to dropbox in terms of syncing to a local copy. The difference here is that the data file is shared by another user and that eliminates the CITRIX client from reading the remote copy off a PC and instead reads it off the ONE DRIVE storage drive.


cheers,

max

LinusH
Tourmaline | Level 20
As I see there are three points of concern:
- one drive service/server
- network
- SAS
Move this file outside SAS to the same host were SAS executes, e.g. copy in Windows explorer.of it takes similar amount of time you need to direct this to one drive/network crew.
Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1224 views
  • 0 likes
  • 3 in conversation