BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jdmarino
Fluorite | Level 6

I need to export a very large dataset to csv, but I need to break the output into multiple, smaller files for the downstream process.  The split need to incorporate logic so that certain groups of records are not split across files.  Can anyone suggest ways to do this?

My current plan: look at what proc export sticks in the log (it seems to be a macro, despite the name), scrape that code and doctor it do the split logic.  I'm pretty sure this will work (I haven't seen the spec for the logic yet), but there may be a less hacky way.  If so, RSVP.export

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

CSV files are trivial to write with a SAS datastep.  If you ignore the header record you just need:

data _null_;

  set MYDATA ;

  file MYFILE dsd dlm=',' lrecl=32000 ;

  put (_all_) (:) ;

run;

To make multiple files you can use the FILEVAR option on the FILE statement to control the name of the target file with data step logic.

You can build macro variables with the list of variable names from the metadata (SASHELP.VCOLUMN or use PROC CONTENTS) to drive the put statement.  You can also build a macro variable with valid code to put the variable names (or labels) as the header.

You can use a double DOW loop to count how many records you have per by group before you start the group so that you can decide if you should start a new file before the current BY value.

So something like this: (UNTESTED)

%let varlist=A B C;

%let titles="A" ',' "B" ',' "C" ;

%let maxobs=10000 ;

data _null_;

  do nobs=1 by 1 until (last.group);

     set mydata ;

     by group ;

  end;

  do until (last.group) ;

     set mydata ;

     by group ;

     if nrecs=0 or (nrecs + nobs > &maxobs) then do;

        filenumber+1;

        nrecs=0;

        filename = cats('myfile',filenumber,'.csv');

     end;

     file dummy filevar=filename dsd dlm=',' lrecl=32000 ;

     nrecs+1;

     if nrecs=1 then put &titles;

     put &varlist ;

  end;

run;

View solution in original post

21 REPLIES 21
Reeza
Super User

The support.sas.com site has several macros that will split files by groups.

38430 - Exporting BY groups from a SAS data set to multiple Microsoft Excel or text files

I don't know if you have a true "BY" group though without more details.

I think this code can also be modified to export to CSV simply enough:

Split Data into Subsets - sasCommunity

jdmarino
Fluorite | Level 6

I do a lot of macro programming, so this approach is, at least, familiar territory.  I think I'm going to try the ODS solution, but I worry that I will have mygrp in the output (and it wasn't in the original input).

Cynthia_sas
SAS Super FREQ

Hi: Rather than tinkering with PROC EXPORT, I would use the NEWFILE=BYGROUP with ODS CSV. There's a code example below. In the first example, a simple BY AGE was used in PROC PRINT to create a separate output file for each age. In the second example, I used a DATA step program to make some groups using IF logic, then I use the new group variable in my BY statement. In the second example, you should get 3 output files.

cynthia

proc sort data=sashelp.class;
by age;
run;
title; footnote;
 
** 1) will get 1 csv file for each age;
ods csv file='c:\temp\age11.csv'
    newfile=bygroup;

    proc print data=sashelp.class;
    by age;
    run;
ods csv close;

   

** 2) use my logic to make groups;
data newfile;
** create new groups using logic;
  set sashelp.class;
  if age in (11,12) then
     mygrp = 1;
  else if age in (13, 14) then
     mygrp = 2;
  else mygrp = 3;
run;
 
proc sort data=newfile;
  by mygrp;
run;
 
** will make mgrp1, mgrp2, mgrp3 files;
ods csv file='c:\temp\mgrp1.csv'
    newfile=bygroup;

    proc print data=newfile;
    by mygrp;
    run;
ods csv close;

jdmarino
Fluorite | Level 6

I've never used ODS, and this seems like the newthink.  When I learn more about the split logic, I'll give this a try.

Tom
Super User Tom
Super User

Without a VAR statement all variables except the ones listed in the BY statement will appear.

Use OPTIONS NOBYLINE to suppress any BY line that PROC PRINT might generate.

jdmarino
Fluorite | Level 6

I've never done this.  I've always used drop= or keep= on the input dataset to get the columns I want.  (Any way, I will have to enumerate the columns I want, which means being aware of them.  It would be nice to avoid that: drop= would only need to know about the columns I create.  But I can't BY process on a dropped column.)

It's always good to post here and get help.

jdmarino
Fluorite | Level 6

Cynthia,

I'm trying your solution because it seems require the least coding on my part.  I have to say, however, that ODS is a multi-headed monster, and I can't find anything in the help about "ods csv".  (I can find "ods csvall", but that's a different beast.)  Can you point me to some documentation?

Second question: I'd like NULLs in my dataset to be represented by consecutive delimiters (i.e. an empty field), but I'm getting a '.' (missing).  Can you tell me how to turn off the dot?

art297
Opal | Level 21

One way regarding how to "turn off" the dots:

options missing=' ';

Some documentation: I got a number of related hits doing a Google search on:

ods csv sas

jdmarino
Fluorite | Level 6

Thanks.

I'd like to write a general purpose macro (although that's a secondary priority),so I'd certainly prefer to only turn off the dots for the proc print and not in general.  Maybe I can detect the state of that option and restore it after I do my work.

edit:

    %let missopt = %sysfunc(getoption(missing,keyword)); /* store the existing option away */

    options missing='';

...

    options &missopt;

This seems to be what I wanted.

Interesting about the google search; I've been searching both my local help file and support.sas.com and not getting good results.  (I get results, just not relevant one.)

jdmarino
Fluorite | Level 6

Well, I was disappointed today.  I tested my code on a 100K row dataset, splitting into 10k chunks but not splitting across a special set of vars.  It worked. 

BUT, when I ran it on the real 47M row dataset, I got

ERROR: The SAS System stopped processing this step because of insufficient memory.

on a machine with 24Gig of RAM and a Terabyte of drive space.

Now, to try Tom's approach.

art297
Opal | Level 21

Just out of curiosity I have two questions: (1) Did you sort your real dataset before running the code? AND (2)does your work directory happen to be on a drive that has very limited memory?

jdmarino
Fluorite | Level 6

1. No: I used an index

2. No.  Plenty of drive space

Peter_C
Rhodochrosite | Level 12

ods csv involves far mode than you need

Other ODS destinations take better advantage of the overheads of the destination approach.

To generate a csv file requires very little memory

DanW
Calcite | Level 5

Hello, This looks like something that I can use, but I need some help refining the script. I'd like to use a version of the following script: ** 1) will get 1 csv file for each age; ods csv file='c:\temp\age11.csv' newfile=bygroup; proc print data=sashelp.class; by age; run; ods csv close; but I'd like for each exported csv file to have a specific name in the form "Q_age_value.csv", where "age-value" is the value of the variable age. Can someone suggest a way to do this?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 11153 views
  • 4 likes
  • 8 in conversation