BookmarkSubscribeRSS Feed
dfanella
Calcite | Level 5

Hello,

I'm struggling with figuring out a solution for this. I have a large dataset that can contain several million records (it will vary each week). I need to export this dataset to multiple .txt files, each containing 250,000 records (with the remainder also being placed into a .txt file). So, for example, if I have 2,001,000 records in the original dataset then I would need 8 .txt files of 250,000 records and 1 .txt file with 1,000 records. Also, the .txt files should have incremented names e.g. texfile1, textfile2, textfile3, etc...

Any help would be appreciated.

8 REPLIES 8
LinusH
Tourmaline | Level 20

Create som pre code that calculates how many txt you need to create.

Then create a %do loop that will generate dynamically output statements, which are using the value _n_.

Data never sleeps
dfanella
Calcite | Level 5

Unfortunately I'm fairly new to SAS and I'm not sure how to do that. I found an example online that sounds like it does what I need and I'm trying to tweak it to fit my needs but I'm a bit lost...

%MACRO makecsv (dsn, nvials=300)

* Determine number of records in &DSN *;

%LET dsid=%SYSFUNC(OPEN(&dsn));

%LET numobs=%SYSFUNC(ATTRN(&dsid, NLOBS));

%LET rc=%SYSFUNC(CLOSE(&dsid));

%LET k = 0;

%DO i = 1 %TO &numobs %BY &nvials ;

%LET k = %EVAL(&k+1);

PROC EXPORT DATA=&dsn (FIRSTOBS=&i OBS=%EVAL(&i +&nvials -1))

OUTFILE = "batch&dsn&k..&SYSDATE..csv"

DBMS = CSV REPLACE;

RUN;

%END;

%MEND makecsv;

bwasicak
Calcite | Level 5

I'm thinking you are very close.

proc sql;

  select nobs/250000 into :amax            

  from DICTIONARY.TABLES 

  where (memtype='DATA')

    and memname='CARS'

    and libname='SASHELP';

quit;

OR

dsid=%sysfunc(open(SASHELP.CARS));

num=%sysfunc(attrn(&dsid,nlobs));

rc=%sysfunc(close(&dsid));


Then just iterate thru with a do loop

%put &amax   #

%macro do_loop;

  %do i = 1 %to &amax+1;

    data tmp;

      set work.mybigdsn;

      if 1=%eval(&1) and _n_ le 250K;

      else (&I-1)*250K le _n_ le (&i*250k);

    run;

    proc export data=tmp outfile="C:\output\mytextfile&i..txt"

    run;

    

    proc datasets

       delete tmp;

    quit;

   %end;

%mend do_loop;

do_loop;

         

RickAster
Obsidian | Level 7

If you want to do this in a single data step without macro programming, look up the FILEVAR= option in the FILE statement.

KatZen
Calcite | Level 5

Hi dfanella,

The easy way to do this is to use multiple output statements to create multiple datasets. Let me give you a really simple example.

Using a Data statement you can:

Data USA Australia Other;                              Creates three data sets that you will fill with data

set Kathy.AllSavings;                                      from the AllSavings data set

select(upcase(Country));                                 This just capitalizes the Country code
If Country="US" then output USA;                    This looks for the variable I chose to define the dataset USA

else if Country="AU" then output Austrialia;      This looks for the variable I chose to define the dataset Austrialia

else output other;                                           This is the dataset that data goes to if it does not land in any other dataset.

run;

                                                                    You should always have an "Other" dataset, and you can break it out to as

                                                                    many datasets as you like.

SveinE
Calcite | Level 5

Maybe this could this be an variant using the filevar option.

filename myfile 'tmp'; /*File name or path does not matter. Only virtual and no used*/

data _null_;
  retain rec_per_file 100;
  new_file_count = _N_ - 1;
  if mod( new_file_count, rec_per_file) = 0 then do;
    out_file = '/data3/sas01/workarea/svein/tmp_' || strip(( new_file_count+rec_per_file)/100) || '_file.txt';
    put out_file;
  end;
  file myfile filevar=out_file  dlm=";";
  set sashelp.cars;
  put _N_ make model type;
run;

Tom
Super User Tom
Super User

Why not just use DO loops?

Here is an example using SASHELP.CLASS dataset splitting with 5 observations per output file.

%let dir=~/temp ;

%let nobs=5 ;

data _null_;

  do fileno=1 by 1 ;

    filevar = catx('/',"&dir",cats('file',fileno,'.txt'));

    file out filevar=filevar dsd dlm='|' lrecl=2000000 ;

    do _n_=1 to &nobs ;

      set sashelp.class ;

      put (name -- weight) (:);

    end;

  end;

run;

pangea17
Quartz | Level 8

/* Create text files for Whatever Purpose */
/* Specify the input dataset and the number of rows per file */
%let input_dataset = your_dataset_name;
%let rows_per_file = 250000; /* you can change this if your file size needs change */

 

/* Create a macro variable for the total number of records in the dataset */
proc sql noprint;
select count(*) into :total_records
from &input_dataset;
quit;

%put &total_records;

 

/* Calculate the number of files needed */
%let num_files = %sysevalf((&total_records/&rows_per_file),integer);
%put &num_files;

 

%macro export_to_text_files;
%do i = 1 %to &num_files+1;

/* Calculate the starting and ending record numbers for each file */
%let start_record = %eval((&i - 1) * &rows_per_file + 1);
%let end_record = %eval(&i * &rows_per_file);

 

        /* Create a data step to export a subset of the dataset to a text file */

        data _null_;

            file "output_file_&i..txt";  /** ad your directory here in front of the file name **/

            set &input_dataset (firstobs=&start_record obs=&end_record);

            put (_all_)(=);  /* this makes sure only the variables you created are part of the output file */

        run;

 

    %end;

%mend;

 

/* Call the macro to export the dataset to text files */

%export_dataset_to_text_files;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 5004 views
  • 0 likes
  • 8 in conversation