DATA Step, Macro, Functions and more

export large dataset to multiple csv files

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 78
Accepted Solution

export large dataset to multiple csv files

I need to export a very large dataset to csv, but I need to break the output into multiple, smaller files for the downstream process.  The split need to incorporate logic so that certain groups of records are not split across files.  Can anyone suggest ways to do this?

My current plan: look at what proc export sticks in the log (it seems to be a macro, despite the name), scrape that code and doctor it do the split logic.  I'm pretty sure this will work (I haven't seen the spec for the logic yet), but there may be a less hacky way.  If so, RSVP.export


Accepted Solutions
Solution
‎10-18-2013 02:19 PM
Super User
Super User
Posts: 6,498

Re: export large dataset to multiple csv files

CSV files are trivial to write with a SAS datastep.  If you ignore the header record you just need:

data _null_;

  set MYDATA ;

  file MYFILE dsd dlm=',' lrecl=32000 ;

  put (_all_) (Smiley Happy ;

run;

To make multiple files you can use the FILEVAR option on the FILE statement to control the name of the target file with data step logic.

You can build macro variables with the list of variable names from the metadata (SASHELP.VCOLUMN or use PROC CONTENTS) to drive the put statement.  You can also build a macro variable with valid code to put the variable names (or labels) as the header.

You can use a double DOW loop to count how many records you have per by group before you start the group so that you can decide if you should start a new file before the current BY value.

So something like this: (UNTESTED)

%let varlist=A B C;

%let titles="A" ',' "B" ',' "C" ;

%let maxobs=10000 ;

data _null_;

  do nobs=1 by 1 until (last.group);

     set mydata ;

     by group ;

  end;

  do until (last.group) ;

     set mydata ;

     by group ;

     if nrecs=0 or (nrecs + nobs > &maxobs) then do;

        filenumber+1;

        nrecs=0;

        filename = cats('myfile',filenumber,'.csv');

     end;

     file dummy filevar=filename dsd dlm=',' lrecl=32000 ;

     nrecs+1;

     if nrecs=1 then put &titles;

     put &varlist ;

  end;

run;

View solution in original post


All Replies
Super User
Posts: 17,750

Re: export large dataset to multiple csv files

The support.sas.com site has several macros that will split files by groups.

38430 - Exporting BY groups from a SAS data set to multiple Microsoft Excel or text files

I don't know if you have a true "BY" group though without more details.

I think this code can also be modified to export to CSV simply enough:

Split Data into Subsets - sasCommunity

Frequent Contributor
Posts: 78

Re: export large dataset to multiple csv files

I do a lot of macro programming, so this approach is, at least, familiar territory.  I think I'm going to try the ODS solution, but I worry that I will have mygrp in the output (and it wasn't in the original input).

SAS Super FREQ
Posts: 8,740

Re: export large dataset to multiple csv files

Hi: Rather than tinkering with PROC EXPORT, I would use the NEWFILE=BYGROUP with ODS CSV. There's a code example below. In the first example, a simple BY AGE was used in PROC PRINT to create a separate output file for each age. In the second example, I used a DATA step program to make some groups using IF logic, then I use the new group variable in my BY statement. In the second example, you should get 3 output files.

cynthia

proc sort data=sashelp.class;
by age;
run;
title; footnote;
 
** 1) will get 1 csv file for each age;
ods csv file='c:\temp\age11.csv'
    newfile=bygroup;

    proc print data=sashelp.class;
    by age;
    run;
ods csv close;

   

** 2) use my logic to make groups;
data newfile;
** create new groups using logic;
  set sashelp.class;
  if age in (11,12) then
     mygrp = 1;
  else if age in (13, 14) then
     mygrp = 2;
  else mygrp = 3;
run;
 
proc sort data=newfile;
  by mygrp;
run;
 
** will make mgrp1, mgrp2, mgrp3 files;
ods csv file='c:\temp\mgrp1.csv'
    newfile=bygroup;

    proc print data=newfile;
    by mygrp;
    run;
ods csv close;

Frequent Contributor
Posts: 78

Re: export large dataset to multiple csv files

I've never used ODS, and this seems like the newthink.  When I learn more about the split logic, I'll give this a try.

Super User
Super User
Posts: 6,498

Re: export large dataset to multiple csv files

Without a VAR statement all variables except the ones listed in the BY statement will appear.

Use OPTIONS NOBYLINE to suppress any BY line that PROC PRINT might generate.

Frequent Contributor
Posts: 78

Re: export large dataset to multiple csv files

I've never done this.  I've always used drop= or keep= on the input dataset to get the columns I want.  (Any way, I will have to enumerate the columns I want, which means being aware of them.  It would be nice to avoid that: drop= would only need to know about the columns I create.  But I can't BY process on a dropped column.)

It's always good to post here and get help.

Frequent Contributor
Posts: 78

Re: export large dataset to multiple csv files

Cynthia,

I'm trying your solution because it seems require the least coding on my part.  I have to say, however, that ODS is a multi-headed monster, and I can't find anything in the help about "ods csv".  (I can find "ods csvall", but that's a different beast.)  Can you point me to some documentation?

Second question: I'd like NULLs in my dataset to be represented by consecutive delimiters (i.e. an empty field), but I'm getting a '.' (missing).  Can you tell me how to turn off the dot?

PROC Star
Posts: 7,356

Re: export large dataset to multiple csv files

One way regarding how to "turn off" the dots:

options missing=' ';

Some documentation: I got a number of related hits doing a Google search on:

ods csv sas

Frequent Contributor
Posts: 78

Re: export large dataset to multiple csv files

Thanks.

I'd like to write a general purpose macro (although that's a secondary priority),so I'd certainly prefer to only turn off the dots for the proc print and not in general.  Maybe I can detect the state of that option and restore it after I do my work.

edit:

    %let missopt = %sysfunc(getoption(missing,keyword)); /* store the existing option away */

    options missing='';

...

    options &missopt;

This seems to be what I wanted.

Interesting about the google search; I've been searching both my local help file and support.sas.com and not getting good results.  (I get results, just not relevant one.)

Frequent Contributor
Posts: 78

Re: export large dataset to multiple csv files

Well, I was disappointed today.  I tested my code on a 100K row dataset, splitting into 10k chunks but not splitting across a special set of vars.  It worked. 

BUT, when I ran it on the real 47M row dataset, I got

ERROR: The SAS System stopped processing this step because of insufficient memory.

on a machine with 24Gig of RAM and a Terabyte of drive space.

Now, to try Tom's approach.

PROC Star
Posts: 7,356

Re: export large dataset to multiple csv files

Just out of curiosity I have two questions: (1) Did you sort your real dataset before running the code? AND (2)does your work directory happen to be on a drive that has very limited memory?

Frequent Contributor
Posts: 78

Re: export large dataset to multiple csv files

1. No: I used an index

2. No.  Plenty of drive space

Valued Guide
Posts: 2,174

Re: export large dataset to multiple csv files

ods csv involves far mode than you need

Other ODS destinations take better advantage of the overheads of the destination approach.

To generate a csv file requires very little memory

Learner
Posts: 1

Re: export large dataset to multiple csv files

Hello, This looks like something that I can use, but I need some help refining the script. I'd like to use a version of the following script: ** 1) will get 1 csv file for each age; ods csv file='c:\temp\age11.csv' newfile=bygroup; proc print data=sashelp.class; by age; run; ods csv close; but I'd like for each exported csv file to have a specific name in the form "Q_age_value.csv", where "age-value" is the value of the variable age. Can someone suggest a way to do this?

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 21 replies
  • 4962 views
  • 4 likes
  • 8 in conversation