DATA Step, Macro, Functions and more

Random Access within flat file. (Pointer Concept)

Reply
N/A
Posts: 0

Random Access within flat file. (Pointer Concept)

Hi Expert,

Is there any function in SAS which offers parallel access to a flat file.

A functionality exactly similar to as "POINT=variable" for set statment.

Please Help!


Regards,
Abhishek
Super User
Posts: 5,437

Re: Random Access within flat file. (Pointer Concept)

Posted in reply to deleted_user
Not to my knowledge.
But why do you need it?
Describe your case in more detail, there may be an alternative solution that could be acceptable for you.
/Linus
Data never sleeps
Super Contributor
Posts: 474

Re: Random Access within flat file. (Pointer Concept)

I believe random access is achievable with some files on the z/OS system, beside that, it's not to my knowledge either.

Although it is possible to travel "backward" through the file, access to external files is always done sequentially.

Check the info on FPOINT,FNOTE,DROPNOTE and FREWIND:
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a000209714.htm
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a000209721.htm
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a000211377.htm
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a000211061.htm

Cheers from Portugal.

Daniel Santos @ www.cgd.pt
N/A
Posts: 0

Re: Random Access within flat file. (Pointer Concept)

The size of the flat file is in GB's. If I read the file in sequential order it will definitely hit the performance, instead of this if I virtually split the file into multiple parts and read each part in parallel as if there are multiple files whose cumulative number of records are equal to the actual flat file record count. This I suppose will enhace the performance to a considerable extent.

Please help!


Abhishek
N/A
Posts: 0

Re: Random Access within flat file. (Pointer Concept)

Posted in reply to deleted_user
code similar to

data test;
infile filename recl = x firstobs = 100 obs = 150;
input;
run;

In the above code though the output contains records from 100 to 150 but the during execution it also takes (reads) the starting 99 records into buffer which is unlike the functionality of "point = variable" option (in set statement)
Super User
Posts: 5,437

Re: Random Access within flat file. (Pointer Concept)

Posted in reply to deleted_user
I assume that you will search this multiple times?
Then I think it's better to load the data into SAS/SPDE, and the use random access. It's probably worth the little overhead of importing.
/Linus
Data never sleeps
Respected Advisor
Posts: 4,173

Re: Random Access within flat file. (Pointer Concept)

Posted in reply to deleted_user
I believe a flat file is as it says 'flat' so you don't have all this metadata information which would allow random access (i.e. number of observations, variables,...).

As you don't have this information: How could SAS possibly read record 100 without first reading record 1-99 as the only definition what record 100 is will be a count of end-of-line indicators.
N/A
Posts: 0

Re: Random Access within flat file. (Pointer Concept)

I take your point.

Again moving back to the root cause, please suggest the best practices to load a large files size more than 10G. The file is a fixed width file.

Shall I first split the file and then load all files in parallel ?

Also what are infile options which I can harness for reading large flat file?

Thanks,
Abhishek
Valued Guide
Posts: 2,177

Re: Random Access within flat file. (Pointer Concept)

Posted in reply to deleted_user
I understand some languages allow you to read from an address withiin the file - I think the terminology is "offset within the file" on windows.
The nearest SAS appears to offer would be unbuffered data (lrecl greater than 32k or RECFM=N), read with something like INPUT @10000000 @;
That should be able to address the data starting at the millionth byte.
When all data must be read, this approach may not be best because "unbuffered" usually means slower. Hewever, it might be worth trying.

good luck
PeterC
Super User
Posts: 5,437

Re: Random Access within flat file. (Pointer Concept)

Posted in reply to deleted_user
Splitting the file won't probably help you, since SAS would probably read the data as fast as your splitting program would.
If you plan just to do this once, I would say don't bother about performance.
If you plan to import this file on a regularly basis, you might need to design (simple) a ETL flow, where you might optimize by just import changed data in some way.
I don't think that there are any options that will affect the performance that much.
SAS is considered very fast for importing flat files, even compared with various bulk-loaders available with the competition.

/Linus
Data never sleeps
Super Contributor
Posts: 474

Re: Random Access within flat file. (Pointer Concept)

Agree with Linus.

Parallel processing can be very hardware dependent. It may depend on how the file is fragmented through the disks to the system load at execution.

There are still some system/SASparameters that you could fine tune, but I would probably don't bother with that, unless you are seeing some suspicious I/O performance accessing the file.

Cheers from Portugal.

Daniel Santos @ www.cgd.pt
Super Contributor
Super Contributor
Posts: 3,174

Re: Random Access within flat file. (Pointer Concept)

Posted in reply to DanielSantos
What OS platform is running SAS for this application? There are also considerations with parallel-processing about threading your batch processes/jobs, whether there is a comprehensive job-scheduling facility. What other pre-processing (sort package) utilities are available to you? What challenges do you have with intermediate data storage resources which may influence when you run your parallel processes?

It's quite possible that your SAS application may not need to reference all of your flat file contents, so, yes, it is possible that you could determine what specific "unique values" are needed and filter that file as it is being loaded. Your needs could be date-related, so, again, an input-side data filter process may be suitable.

Certainly your SAS application design process will want to take these factors into consideration.

Scott Barry
SBBWorks, Inc.
N/A
Posts: 0

Re: Random Access within flat file. (Pointer Concept)

Data storage is not a concern, we have enough space and memory for parallel execution (64G RAM).

Also complete data is required as part of analysis, so none of the column value or records can be skipped and as the values are fixed length so missing or empty space are least expected.
Super Contributor
Posts: 474

Re: Random Access within flat file. (Pointer Concept)

Posted in reply to deleted_user
Storage space is not the trouble, but the infrastructure setup may be, and there's so many things to consider there (local/SAN? RAID type? Stripping? DirectIO? etc...)

Can you tell us the typical transfer speed (MB/s) to import the data from the file you are getting?

Cheers from Portugal.

Daniel Santos @ www.cgd.pt
Valued Guide
Posts: 2,177

Re: Random Access within flat file. (Pointer Concept)

Posted in reply to deleted_user
the "fixed-width" picture helps with performance - requiring no variable-length-search for end-of-field, nor end-of-line.
Assuming your multi-gb platform can cache the whole file, and share it's memory cache between applications, then you could have parallel processors pull the content, each starting from a different "address" within the memory image. That would allow multiple processors of your platform, to act independently, each reading its own part. When all are read, the remaining steps start with an sql UNION or data step SET (interleaved, if the flat file has useful order).
Of course, sharing memory in that kind of way seems very "op.sys-dependent".
Good luck
PeterC
Ask a Question
Discussion stats
  • 14 replies
  • 309 views
  • 0 likes
  • 6 in conversation