BookmarkSubscribeRSS Feed
ath_123
Calcite | Level 5

We have a requirement of creating multiple CSV files between 0.5 GB to 1 GB size based on the record size in input dataset.

 

We have achieved it through datasteps and PROC export, but we are having issue while viewing file size in unix ,it is much lesser than 0.5GB.

 

Example: if my dataset size is 1.4GB, I need to create 2 files of 0.7GB each. In sas, it is creating 2 files as expected but checking the file size in unix it is coming only as 0.06GB instead of 0.7GB which is not correct.

 

KIndly help us on this.

Please let me know if more details required. 

 

6 REPLIES 6
JuanS_OCS
Amethyst | Level 16

Hi,

 

I would expect that the number of observations/rows on each file is also not correct but, could you please check? 

Could you find any error on the SAS logs while creating the CSVs? Are the number of observations as expected?

 

If the number of expected observations/rows is correct, maybe the problem is just understaing your filesystem.

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

There could be a number of different elements which affect the size in bytes on Unix, compared to the size in bytes on Windows, for instance the line endings on DOS are twice the size of the Unix variety:

http://www.cs.toronto.edu/~krueger/csc209h/tut/line-endings.html

 

May I ask why you need the CSV size to be limited?  It sounds like you have a file size limitation, i.e. for sending via email for instance.  My suggestion would be to note limit the underlying CSV file size, but to use proper File Compression software, WinRAR, 7Zip, WinZip, to compress the file - this will shrink the file size down anyway, but they all offer the option of splitting the archive into separate file chunks of a given size, hence removing the need for you to do it at all.  Use the right tool for the job.

ath_123
Calcite | Level 5

Hi,

 

Thanks for replying 🙂

 

I am creating files from an input SAS Dataset. Currently in the code what we are doing is :

 

1. Take the file size in MB of the source input DS.

2.Divide the filesize with 900, to get number of files that needs to be created. Create as many work tables as the no of files to be created with correct number of observations.

3. PROC EXPORT to export work tables to CSV files.

 

Requirement is to create CSV files with size between 0.5 to 1 GB and file size shouldn't cross the range and the DS records will be splitted among the files.

 

 

JuanS_OCS
Amethyst | Level 16

Hi,

 

I think you would like to keep in mind: SAS table is not equal to a CSV file, nor in size or type of file (SAS table is a binary, CSV file is a text/ascii file).

 

Therefore the sizes most likely will be different.

 

I am not aware of any proportion to forecast/estimate a size when exported from SAS table to CSV, sorry.

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, I am able to read the text: 

Requirement is to create CSV files with size between 0.5 to 1 GB and file size shouldn't cross the range and the DS records will be splitted among the files.

 

However my question is why do you have this requirement.  It does not make sense.  CSV - comma separated file - files are plain text delimited files, which are read in sequentially.  Unless the recipients HDD is only .5gb  large and so can only store files of less than that size there is no point splitting these base text files.  What I think you are faced with is restricitions in Sending files, either by email, or ftp or some other method.  This is a restriction on the file size which can be transmitted, these files could be of any type.  So to solve that problem I propose that use use compression tools to zip your text data up, and split the archive file into the required size files.  Simple, and its what most people do when transferring data.  If there are reason why the recipient cannot handle CSV files of any size, please post these.

Kurt_Bremser
Super User

If you have character variables of considerable length (which are rarely completely filled) in your SAS dataset, and don't use compress=yes, then your output .csv files will automatically shrink, as the empty space is discarded and only the non-blank bytes are written.

 

I'd rather let SAS write one large file, which I'd then split with either operating system tools or a separate step in SAS.

Alternatively you could use the FILEVAR= option in the data step, cumulate the number of bytes for each iteration, and switch the output file when a treshold is reached.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2433 views
  • 4 likes
  • 4 in conversation