DATA Step, Macro, Functions and more

Creating multiple csv files of size between 0.5GB to 1 GB in UNIX based on input dataset size

Reply
Occasional Contributor
Posts: 11

Creating multiple csv files of size between 0.5GB to 1 GB in UNIX based on input dataset size

We have a requirement of creating multiple CSV files between 0.5 GB to 1 GB size based on the record size in input dataset.

 

We have achieved it through datasteps and PROC export, but we are having issue while viewing file size in unix ,it is much lesser than 0.5GB.

 

Example: if my dataset size is 1.4GB, I need to create 2 files of 0.7GB each. In sas, it is creating 2 files as expected but checking the file size in unix it is coming only as 0.06GB instead of 0.7GB which is not correct.

 

KIndly help us on this.

Please let me know if more details required. 

 

Trusted Advisor
Posts: 1,141

Re: Creating multiple csv files of size between 0.5GB to 1 GB in UNIX based on input dataset size

Hi,

 

I would expect that the number of observations/rows on each file is also not correct but, could you please check? 

Could you find any error on the SAS logs while creating the CSVs? Are the number of observations as expected?

 

If the number of expected observations/rows is correct, maybe the problem is just understaing your filesystem.

 

 

Super User
Super User
Posts: 7,413

Re: Creating multiple csv files of size between 0.5GB to 1 GB in UNIX based on input dataset size

There could be a number of different elements which affect the size in bytes on Unix, compared to the size in bytes on Windows, for instance the line endings on DOS are twice the size of the Unix variety:

http://www.cs.toronto.edu/~krueger/csc209h/tut/line-endings.html

 

May I ask why you need the CSV size to be limited?  It sounds like you have a file size limitation, i.e. for sending via email for instance.  My suggestion would be to note limit the underlying CSV file size, but to use proper File Compression software, WinRAR, 7Zip, WinZip, to compress the file - this will shrink the file size down anyway, but they all offer the option of splitting the archive into separate file chunks of a given size, hence removing the need for you to do it at all.  Use the right tool for the job.

Occasional Contributor
Posts: 11

Re: Creating multiple csv files of size between 0.5GB to 1 GB in UNIX based on input dataset size

Hi,

 

Thanks for replying Smiley Happy

 

I am creating files from an input SAS Dataset. Currently in the code what we are doing is :

 

1. Take the file size in MB of the source input DS.

2.Divide the filesize with 900, to get number of files that needs to be created. Create as many work tables as the no of files to be created with correct number of observations.

3. PROC EXPORT to export work tables to CSV files.

 

Requirement is to create CSV files with size between 0.5 to 1 GB and file size shouldn't cross the range and the DS records will be splitted among the files.

 

 

Trusted Advisor
Posts: 1,141

Re: Creating multiple csv files of size between 0.5GB to 1 GB in UNIX based on input dataset size

Hi,

 

I think you would like to keep in mind: SAS table is not equal to a CSV file, nor in size or type of file (SAS table is a binary, CSV file is a text/ascii file).

 

Therefore the sizes most likely will be different.

 

I am not aware of any proportion to forecast/estimate a size when exported from SAS table to CSV, sorry.

 

 

Super User
Super User
Posts: 7,413

Re: Creating multiple csv files of size between 0.5GB to 1 GB in UNIX based on input dataset size

[ Edited ]

Yes, I am able to read the text: 

Requirement is to create CSV files with size between 0.5 to 1 GB and file size shouldn't cross the range and the DS records will be splitted among the files.

 

However my question is why do you have this requirement.  It does not make sense.  CSV - comma separated file - files are plain text delimited files, which are read in sequentially.  Unless the recipients HDD is only .5gb  large and so can only store files of less than that size there is no point splitting these base text files.  What I think you are faced with is restricitions in Sending files, either by email, or ftp or some other method.  This is a restriction on the file size which can be transmitted, these files could be of any type.  So to solve that problem I propose that use use compression tools to zip your text data up, and split the archive file into the required size files.  Simple, and its what most people do when transferring data.  If there are reason why the recipient cannot handle CSV files of any size, please post these.

Super User
Posts: 6,963

Re: Creating multiple csv files of size between 0.5GB to 1 GB in UNIX based on input dataset size

If you have character variables of considerable length (which are rarely completely filled) in your SAS dataset, and don't use compress=yes, then your output .csv files will automatically shrink, as the empty space is discarded and only the non-blank bytes are written.

 

I'd rather let SAS write one large file, which I'd then split with either operating system tools or a separate step in SAS.

Alternatively you could use the FILEVAR= option in the data step, cumulate the number of bytes for each iteration, and switch the output file when a treshold is reached.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 6 replies
  • 425 views
  • 4 likes
  • 4 in conversation