I need to export a very large dataset to csv, but I need to break the output into multiple, smaller files for the downstream process. The split need to incorporate logic so that certain groups of records are not split across files. Can anyone suggest ways to do this?
My current plan: look at what proc export sticks in the log (it seems to be a macro, despite the name), scrape that code and doctor it do the split logic. I'm pretty sure this will work (I haven't seen the spec for the logic yet), but there may be a less hacky way. If so, RSVP.export
CSV files are trivial to write with a SAS datastep. If you ignore the header record you just need:
data _null_;
set MYDATA ;
file MYFILE dsd dlm=',' lrecl=32000 ;
put (_all_) (:) ;
run;
To make multiple files you can use the FILEVAR option on the FILE statement to control the name of the target file with data step logic.
You can build macro variables with the list of variable names from the metadata (SASHELP.VCOLUMN or use PROC CONTENTS) to drive the put statement. You can also build a macro variable with valid code to put the variable names (or labels) as the header.
You can use a double DOW loop to count how many records you have per by group before you start the group so that you can decide if you should start a new file before the current BY value.
So something like this: (UNTESTED)
%let varlist=A B C;
%let titles="A" ',' "B" ',' "C" ;
%let maxobs=10000 ;
data _null_;
do nobs=1 by 1 until (last.group);
set mydata ;
by group ;
end;
do until (last.group) ;
set mydata ;
by group ;
if nrecs=0 or (nrecs + nobs > &maxobs) then do;
filenumber+1;
nrecs=0;
filename = cats('myfile',filenumber,'.csv');
end;
file dummy filevar=filename dsd dlm=',' lrecl=32000 ;
nrecs+1;
if nrecs=1 then put &titles;
put &varlist ;
end;
run;
The support.sas.com site has several macros that will split files by groups.
38430 - Exporting BY groups from a SAS data set to multiple Microsoft Excel or text files
I don't know if you have a true "BY" group though without more details.
I think this code can also be modified to export to CSV simply enough:
I do a lot of macro programming, so this approach is, at least, familiar territory. I think I'm going to try the ODS solution, but I worry that I will have mygrp in the output (and it wasn't in the original input).
Hi: Rather than tinkering with PROC EXPORT, I would use the NEWFILE=BYGROUP with ODS CSV. There's a code example below. In the first example, a simple BY AGE was used in PROC PRINT to create a separate output file for each age. In the second example, I used a DATA step program to make some groups using IF logic, then I use the new group variable in my BY statement. In the second example, you should get 3 output files.
cynthia
proc sort data=sashelp.class;
by age;
run;
title; footnote;
** 1) will get 1 csv file for each age;
ods csv file='c:\temp\age11.csv'
newfile=bygroup;
proc print data=sashelp.class;
by age;
run;
ods csv close;
** 2) use my logic to make groups;
data newfile;
** create new groups using logic;
set sashelp.class;
if age in (11,12) then
mygrp = 1;
else if age in (13, 14) then
mygrp = 2;
else mygrp = 3;
run;
proc sort data=newfile;
by mygrp;
run;
** will make mgrp1, mgrp2, mgrp3 files;
ods csv file='c:\temp\mgrp1.csv'
newfile=bygroup;
proc print data=newfile;
by mygrp;
run;
ods csv close;
I've never used ODS, and this seems like the newthink. When I learn more about the split logic, I'll give this a try.
Without a VAR statement all variables except the ones listed in the BY statement will appear.
Use OPTIONS NOBYLINE to suppress any BY line that PROC PRINT might generate.
I've never done this. I've always used drop= or keep= on the input dataset to get the columns I want. (Any way, I will have to enumerate the columns I want, which means being aware of them. It would be nice to avoid that: drop= would only need to know about the columns I create. But I can't BY process on a dropped column.)
It's always good to post here and get help.
Cynthia,
I'm trying your solution because it seems require the least coding on my part. I have to say, however, that ODS is a multi-headed monster, and I can't find anything in the help about "ods csv". (I can find "ods csvall", but that's a different beast.) Can you point me to some documentation?
Second question: I'd like NULLs in my dataset to be represented by consecutive delimiters (i.e. an empty field), but I'm getting a '.' (missing). Can you tell me how to turn off the dot?
One way regarding how to "turn off" the dots:
options missing=' ';
Some documentation: I got a number of related hits doing a Google search on:
ods csv sas
Thanks.
I'd like to write a general purpose macro (although that's a secondary priority),so I'd certainly prefer to only turn off the dots for the proc print and not in general. Maybe I can detect the state of that option and restore it after I do my work.
edit:
%let missopt = %sysfunc(getoption(missing,keyword)); /* store the existing option away */
options missing='';
...
options &missopt;
This seems to be what I wanted.
Interesting about the google search; I've been searching both my local help file and support.sas.com and not getting good results. (I get results, just not relevant one.)
Well, I was disappointed today. I tested my code on a 100K row dataset, splitting into 10k chunks but not splitting across a special set of vars. It worked.
BUT, when I ran it on the real 47M row dataset, I got
ERROR: The SAS System stopped processing this step because of insufficient memory.
on a machine with 24Gig of RAM and a Terabyte of drive space.
Now, to try Tom's approach.
Just out of curiosity I have two questions: (1) Did you sort your real dataset before running the code? AND (2)does your work directory happen to be on a drive that has very limited memory?
1. No: I used an index
2. No. Plenty of drive space
ods csv involves far mode than you need
Other ODS destinations take better advantage of the overheads of the destination approach.
To generate a csv file requires very little memory
Hello, This looks like something that I can use, but I need some help refining the script. I'd like to use a version of the following script: ** 1) will get 1 csv file for each age; ods csv file='c:\temp\age11.csv' newfile=bygroup; proc print data=sashelp.class; by age; run; ods csv close; but I'd like for each exported csv file to have a specific name in the form "Q_age_value.csv", where "age-value" is the value of the variable age. Can someone suggest a way to do this?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.