10-18-2013 01:19 PM
I need to export a very large dataset to csv, but I need to break the output into multiple, smaller files for the downstream process. The split need to incorporate logic so that certain groups of records are not split across files. Can anyone suggest ways to do this?
My current plan: look at what proc export sticks in the log (it seems to be a macro, despite the name), scrape that code and doctor it do the split logic. I'm pretty sure this will work (I haven't seen the spec for the logic yet), but there may be a less hacky way. If so, RSVP.export
10-18-2013 02:19 PM
CSV files are trivial to write with a SAS datastep. If you ignore the header record you just need:
set MYDATA ;
file MYFILE dsd dlm=',' lrecl=32000 ;
put (_all_) ( ;
To make multiple files you can use the FILEVAR option on the FILE statement to control the name of the target file with data step logic.
You can build macro variables with the list of variable names from the metadata (SASHELP.VCOLUMN or use PROC CONTENTS) to drive the put statement. You can also build a macro variable with valid code to put the variable names (or labels) as the header.
You can use a double DOW loop to count how many records you have per by group before you start the group so that you can decide if you should start a new file before the current BY value.
So something like this: (UNTESTED)
%let varlist=A B C;
%let titles="A" ',' "B" ',' "C" ;
%let maxobs=10000 ;
do nobs=1 by 1 until (last.group);
set mydata ;
by group ;
do until (last.group) ;
set mydata ;
by group ;
if nrecs=0 or (nrecs + nobs > &maxobs) then do;
filename = cats('myfile',filenumber,'.csv');
file dummy filevar=filename dsd dlm=',' lrecl=32000 ;
if nrecs=1 then put &titles;
put &varlist ;
10-18-2013 01:44 PM
The support.sas.com site has several macros that will split files by groups.
I don't know if you have a true "BY" group though without more details.
I think this code can also be modified to export to CSV simply enough:
10-18-2013 02:34 PM
I do a lot of macro programming, so this approach is, at least, familiar territory. I think I'm going to try the ODS solution, but I worry that I will have mygrp in the output (and it wasn't in the original input).
10-18-2013 01:52 PM
Hi: Rather than tinkering with PROC EXPORT, I would use the NEWFILE=BYGROUP with ODS CSV. There's a code example below. In the first example, a simple BY AGE was used in PROC PRINT to create a separate output file for each age. In the second example, I used a DATA step program to make some groups using IF logic, then I use the new group variable in my BY statement. In the second example, you should get 3 output files.
proc sort data=sashelp.class;
** 1) will get 1 csv file for each age;
ods csv file='c:\temp\age11.csv'
proc print data=sashelp.class;
ods csv close;
** 2) use my logic to make groups;
** create new groups using logic;
if age in (11,12) then
mygrp = 1;
else if age in (13, 14) then
mygrp = 2;
else mygrp = 3;
proc sort data=newfile;
** will make mgrp1, mgrp2, mgrp3 files;
ods csv file='c:\temp\mgrp1.csv'
proc print data=newfile;
ods csv close;
10-18-2013 02:41 PM
Without a VAR statement all variables except the ones listed in the BY statement will appear.
Use OPTIONS NOBYLINE to suppress any BY line that PROC PRINT might generate.
10-18-2013 02:47 PM
I've never done this. I've always used drop= or keep= on the input dataset to get the columns I want. (Any way, I will have to enumerate the columns I want, which means being aware of them. It would be nice to avoid that: drop= would only need to know about the columns I create. But I can't BY process on a dropped column.)
It's always good to post here and get help.
10-28-2013 12:47 PM
I'm trying your solution because it seems require the least coding on my part. I have to say, however, that ODS is a multi-headed monster, and I can't find anything in the help about "ods csv". (I can find "ods csvall", but that's a different beast.) Can you point me to some documentation?
Second question: I'd like NULLs in my dataset to be represented by consecutive delimiters (i.e. an empty field), but I'm getting a '.' (missing). Can you tell me how to turn off the dot?
10-28-2013 01:06 PM
One way regarding how to "turn off" the dots:
options missing=' ';
Some documentation: I got a number of related hits doing a Google search on:
ods csv sas
10-28-2013 01:13 PM
I'd like to write a general purpose macro (although that's a secondary priority),so I'd certainly prefer to only turn off the dots for the proc print and not in general. Maybe I can detect the state of that option and restore it after I do my work.
%let missopt = %sysfunc(getoption(missing,keyword)); /* store the existing option away */
This seems to be what I wanted.
Interesting about the google search; I've been searching both my local help file and support.sas.com and not getting good results. (I get results, just not relevant one.)
10-28-2013 05:24 PM
Well, I was disappointed today. I tested my code on a 100K row dataset, splitting into 10k chunks but not splitting across a special set of vars. It worked.
BUT, when I ran it on the real 47M row dataset, I got
ERROR: The SAS System stopped processing this step because of insufficient memory.
on a machine with 24Gig of RAM and a Terabyte of drive space.
Now, to try Tom's approach.
10-28-2013 05:31 PM
Just out of curiosity I have two questions: (1) Did you sort your real dataset before running the code? AND (2)does your work directory happen to be on a drive that has very limited memory?
10-28-2013 07:20 PM
ods csv involves far mode than you need
Other ODS destinations take better advantage of the overheads of the destination approach.
To generate a csv file requires very little memory
01-11-2016 02:35 PM
Hello, This looks like something that I can use, but I need some help refining the script. I'd like to use a version of the following script: ** 1) will get 1 csv file for each age; ods csv file='c:\temp\age11.csv' newfile=bygroup; proc print data=sashelp.class; by age; run; ods csv close; but I'd like for each exported csv file to have a specific name in the form "Q_age_value.csv", where "age-value" is the value of the variable age. Can someone suggest a way to do this?
Need further help from the community? Please ask a new question.