BookmarkSubscribeRSS Feed
dfanella
Calcite | Level 5

Hello,

I'm struggling with figuring out a solution for this. I have a large dataset that can contain several million records (it will vary each week). I need to export this dataset to multiple .txt files, each containing 250,000 records (with the remainder also being placed into a .txt file). So, for example, if I have 2,001,000 records in the original dataset then I would need 8 .txt files of 250,000 records and 1 .txt file with 1,000 records. Also, the .txt files should have incremented names e.g. texfile1, textfile2, textfile3, etc...

Any help would be appreciated.

8 REPLIES 8
LinusH
Tourmaline | Level 20

Create som pre code that calculates how many txt you need to create.

Then create a %do loop that will generate dynamically output statements, which are using the value _n_.

Data never sleeps
dfanella
Calcite | Level 5

Unfortunately I'm fairly new to SAS and I'm not sure how to do that. I found an example online that sounds like it does what I need and I'm trying to tweak it to fit my needs but I'm a bit lost...

%MACRO makecsv (dsn, nvials=300)

* Determine number of records in &DSN *;

%LET dsid=%SYSFUNC(OPEN(&dsn));

%LET numobs=%SYSFUNC(ATTRN(&dsid, NLOBS));

%LET rc=%SYSFUNC(CLOSE(&dsid));

%LET k = 0;

%DO i = 1 %TO &numobs %BY &nvials ;

%LET k = %EVAL(&k+1);

PROC EXPORT DATA=&dsn (FIRSTOBS=&i OBS=%EVAL(&i +&nvials -1))

OUTFILE = "batch&dsn&k..&SYSDATE..csv"

DBMS = CSV REPLACE;

RUN;

%END;

%MEND makecsv;

bwasicak
Calcite | Level 5

I'm thinking you are very close.

proc sql;

  select nobs/250000 into :amax            

  from DICTIONARY.TABLES 

  where (memtype='DATA')

    and memname='CARS'

    and libname='SASHELP';

quit;

OR

dsid=%sysfunc(open(SASHELP.CARS));

num=%sysfunc(attrn(&dsid,nlobs));

rc=%sysfunc(close(&dsid));


Then just iterate thru with a do loop

%put &amax   #

%macro do_loop;

  %do i = 1 %to &amax+1;

    data tmp;

      set work.mybigdsn;

      if 1=%eval(&1) and _n_ le 250K;

      else (&I-1)*250K le _n_ le (&i*250k);

    run;

    proc export data=tmp outfile="C:\output\mytextfile&i..txt"

    run;

    

    proc datasets

       delete tmp;

    quit;

   %end;

%mend do_loop;

do_loop;

         

RickAster
Obsidian | Level 7

If you want to do this in a single data step without macro programming, look up the FILEVAR= option in the FILE statement.

KatZen
Calcite | Level 5

Hi dfanella,

The easy way to do this is to use multiple output statements to create multiple datasets. Let me give you a really simple example.

Using a Data statement you can:

Data USA Australia Other;                              Creates three data sets that you will fill with data

set Kathy.AllSavings;                                      from the AllSavings data set

select(upcase(Country));                                 This just capitalizes the Country code
If Country="US" then output USA;                    This looks for the variable I chose to define the dataset USA

else if Country="AU" then output Austrialia;      This looks for the variable I chose to define the dataset Austrialia

else output other;                                           This is the dataset that data goes to if it does not land in any other dataset.

run;

                                                                    You should always have an "Other" dataset, and you can break it out to as

                                                                    many datasets as you like.

SveinE
Calcite | Level 5

Maybe this could this be an variant using the filevar option.

filename myfile 'tmp'; /*File name or path does not matter. Only virtual and no used*/

data _null_;
  retain rec_per_file 100;
  new_file_count = _N_ - 1;
  if mod( new_file_count, rec_per_file) = 0 then do;
    out_file = '/data3/sas01/workarea/svein/tmp_' || strip(( new_file_count+rec_per_file)/100) || '_file.txt';
    put out_file;
  end;
  file myfile filevar=out_file  dlm=";";
  set sashelp.cars;
  put _N_ make model type;
run;

Tom
Super User Tom
Super User

Why not just use DO loops?

Here is an example using SASHELP.CLASS dataset splitting with 5 observations per output file.

%let dir=~/temp ;

%let nobs=5 ;

data _null_;

  do fileno=1 by 1 ;

    filevar = catx('/',"&dir",cats('file',fileno,'.txt'));

    file out filevar=filevar dsd dlm='|' lrecl=2000000 ;

    do _n_=1 to &nobs ;

      set sashelp.class ;

      put (name -- weight) (:);

    end;

  end;

run;

pangea17
Quartz | Level 8

/* Create text files for Whatever Purpose */
/* Specify the input dataset and the number of rows per file */
%let input_dataset = your_dataset_name;
%let rows_per_file = 250000; /* you can change this if your file size needs change */

 

/* Create a macro variable for the total number of records in the dataset */
proc sql noprint;
select count(*) into :total_records
from &input_dataset;
quit;

%put &total_records;

 

/* Calculate the number of files needed */
%let num_files = %sysevalf((&total_records/&rows_per_file),integer);
%put &num_files;

 

%macro export_to_text_files;
%do i = 1 %to &num_files+1;

/* Calculate the starting and ending record numbers for each file */
%let start_record = %eval((&i - 1) * &rows_per_file + 1);
%let end_record = %eval(&i * &rows_per_file);

 

        /* Create a data step to export a subset of the dataset to a text file */

        data _null_;

            file "output_file_&i..txt";  /** ad your directory here in front of the file name **/

            set &input_dataset (firstobs=&start_record obs=&end_record);

            put (_all_)(=);  /* this makes sure only the variables you created are part of the output file */

        run;

 

    %end;

%mend;

 

/* Call the macro to export the dataset to text files */

%export_dataset_to_text_files;

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 6676 views
  • 0 likes
  • 8 in conversation