Solved: export large dataset to multiple csv files - Page 2

Tom · Posted 10-18-2013 02:19 PM

CSV files are trivial to write with a SAS datastep. If you ignore the header record you just need:

data _null_;

set MYDATA ;

file MYFILE dsd dlm=',' lrecl=32000 ;

put (_all_) (:) ;

run;

To make multiple files you can use the FILEVAR option on the FILE statement to control the name of the target file with data step logic.

You can build macro variables with the list of variable names from the metadata (SASHELP.VCOLUMN or use PROC CONTENTS) to drive the put statement. You can also build a macro variable with valid code to put the variable names (or labels) as the header.

You can use a double DOW loop to count how many records you have per by group before you start the group so that you can decide if you should start a new file before the current BY value.

So something like this: (UNTESTED)

%let varlist=A B C;

%let titles="A" ',' "B" ',' "C" ;

%let maxobs=10000 ;

data _null_;

do nobs=1 by 1 until (last.group);

set mydata ;

by group ;

end;

do until (last.group) ;

set mydata ;

by group ;

if nrecs=0 or (nrecs + nobs > &maxobs) then do;

filenumber+1;

nrecs=0;

filename = cats('myfile',filenumber,'.csv');

end;

file dummy filevar=filename dsd dlm=',' lrecl=32000 ;

nrecs+1;

if nrecs=1 then put &titles;

put &varlist ;

end;

run;

jdmarino · Posted 10-18-2013 02:32 PM

This blows my mind. I don't know how to think about iterating over set and by statements. Are you looping over the input multiple times?

Tom · Posted 10-18-2013 02:38 PM

It is reading the input dataset twice. But it does it in one pass with two separate pointers (one for each SET statement). Unless your BY groups have a lot of records disk cache or operating system buffering will make it use the same number of physical disk reads as a single pass.

Peter_C · Posted 10-28-2013 07:22 PM

it is worth getting over the apparent complexity (of Tom's method), when you have 47M rows that cause a lot of difficulty for the apparently "simpler" ODS approach

jdmarino · Posted 10-30-2013 01:52 PM

Yep.

I've got the basics working (had problems when the BY group was larger than the row threshold for a file). I'm about to research dataset introspection so I can build a list of the fields to output. _all_ is no good for me because of the extra fields I added.

UrvishShah · Posted 10-21-2013 05:00 AM

Hi,

If you split the data in logical way so that it won't misinterpreat any values, it will be tedious task as in a big data user can't easily make the group based on values...

So my try is to use the attached SAS Macro which creates the one CSV file with a different sheets correspond to each unique values of the variable...It will work for any variable (Charcter or Numeric)...

If this one is not meeting requirement, please let us know the approx observation so that we can think about new logic...

-Urvish

jdmarino · Posted 10-21-2013 08:43 AM

Thanks, Urvish; it's generous of you to share. I don't yet know the split logic (I'm helping a friend and he hasn't yet provided me the spec). But more importantly, I'm not exporting to excel files, but to actual .CSVs, thus the multisheet approach does not apply. (A .csv is a pure text file; there are no sheets.)

Re: export large dataset to multiple csv files

Re: export large dataset to multiple csv files

Re: export large dataset to multiple csv files

Re: export large dataset to multiple csv files

Re: export large dataset to multiple csv files

Re: export large dataset to multiple csv files

Re: export large dataset to multiple csv files

Catch up on SAS Innovate 2026

Re: export large dataset to multiple csv files

Re: export large dataset to multiple csv files

Re: export large dataset to multiple csv files

Re: export large dataset to multiple csv files

Re: export large dataset to multiple csv files

Re: export large dataset to multiple csv files

Re: export large dataset to multiple csv files

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away