06-14-2017 06:59 PM
I tried experimenting reading a plain vanila txt file containing numbers in fixed width format and read in just one variable at the beginning of the record (lrecl=65557) (number of records =1000) using the SAS url engine. The data itself is about 64MB (a slice of a much bigger file) and is stored on a MS ONE DRIVE space with a url to reference the data set with. The ONE DRIVE files also live on the same university net so I don't have to battle the entire Internet to get at the data.
It took about 4 wall clock minutes to chew thru the data set but just less than 5 CPU secs, so I know that the hang up is I/O. I don't suspect heavy network traffic to be the culprit but am more inclined to point the finger either at the unknown shims necessary to make ONE DRIVE work or at the SAS url engine. I know that the efficiencies of SAS engines can vary significantly, so I am wondering where the hold up is. If I read the same data set off of a remote PC using 100mps wifi and battling other animals for bandwidth thru CITRIX it takes ~ 2 minutes to read the file.
Anyone have any experience with this type of issue?
06-14-2017 07:11 PM
One variable with 1000 lines?
I just read 700k+ records from a 102MB file in under 30 seconds using SAS Studio...
Can you post your code so we can replicate it? I can upload that file I'm working with now to my public OneDrive and see how slow it is for me.
06-14-2017 07:26 PM
the code couldn't be simpler. I removed the actual url reference for security purposes but here is the code
filename bigdata url "url here redacted for security purposes;
infile bigdata lrecl=65576;
proc contents data=mydata;
06-14-2017 07:41 PM
Well, let's try some very basic fixes first, if you're only reading the first 7 chars can you change LRECL? Apparently I can't test these at home
filename bigdata url "url here redacted for security purposes"; data myData; infile bigdata lrecl=7; input resp_id 1-7; run; proc contents data=mydata; run;
06-14-2017 07:59 PM
Thanks for the initial advice and that would work nicely if the problem was a bit simpler but...
Well, the idea is that eventually we will be reading in vars from across the entire width of the data set and the one variable experiment was just that - an experiment to see if 1) you could read in a data file from ONE DRIVE and 2) what kind of times are we talking here. So to reduce the data file or LRECL in the end would be self-defeating in the end.
06-15-2017 09:57 AM
06-19-2017 01:55 AM