Desktop productivity for business analysts and programmers

Splitting dataset on total no of observations

Reply
Frequent Contributor
Posts: 120

Splitting dataset on total no of observations

Hi

i have a sas dataset which has 3 million records but I want to split to 1 million each

can anyone let me know the process and I have to export each 1 million dataset results into a text file ( comma separated )

Super Contributor
Posts: 251

Re: Splitting dataset on total no of observations

Use FIRSTOBS = and OBS =

Like: FIRSTOBS = 1 OBS = 1000000

Firstobs = 1000001 OBS = 2000000

Fiirstobs = 2000001 OBS = 3000000

Esteemed Advisor
Esteemed Advisor
Posts: 7,232

Re: Splitting dataset on total no of observations

Hi,

Assuming your base data is in work.xyz:

data _null_;

     set sashelp.vtable (where=(libname="WORK" and memname="XYZ"));

     do I=1 to floor(obs / 1000000) + 1;

          call execute('data xyz'||strip(put(I,best.))||'; set work.xyz; if '||strip(put((I-1) * 1000000))||' <= _n_ < '||strip(put(I *1000000))||' then output; run;');

     end;

run;

Haven't tested, but something like the above should work.

Frequent Contributor
Posts: 120

Re: Splitting dataset on total no of observations

HI

this also I don't understand but can we use _n_ <=1million then output x

else if _n_. >1million <=2million then output y

can I use something like this

Esteemed Advisor
Esteemed Advisor
Posts: 7,232

Re: Splitting dataset on total no of observations

You can, but then you need to manually do each one.  I had the loop over 1 to num obs / 1mil so its automatic.

Respected Advisor
Posts: 3,775

Re: Splitting dataset on total no of observations

You don't really need to create new data sets if you just want to create CSV files.  This is a generic program that will create K CSV files for any input data set.  You will need to insure that LRECL if large enough to accommodate the width of your data.  And of course depending on you OS you need to adjust the value of &FILE.



%let data=sashelp.heart;
%let file=~/heart;
%let k=3; *parts;

data _null_;
  
if _N_ eq 2 then stop;
  
link main;
   if 0 then set &data;
putvals:
  
put (_all_)(Smiley Happy;
   return;
putnames:
  
length _NAME_ $32;
  
retain _OK_ 1;
  
do while(1);
      call vnext(_NAME_);
      if missing(_NAME_) then leave;
      if upcase(_NAME_) eq '_NAME_' then _OK_=0;
     
if _OK_  then put _NAME_ @;
      end;
  
put;
   _OK_=
1;
  
return;
main:  
  
putlog "NOTE: &data has " _NOBS_ 'observations.';
  
do _n_ = 1 by 1 while(not _EOF_);
      set &data end=_EOF_ nobs=_NOBS_;
      if mod(_n_,ceil(_NOBS_/&k)) eq 1 then do;
         _FILEINDEX_+
1;
         _FILEVAR_ = cats(
"&file",vvalue(_FILEINDEX_),'.csv');
         putlog _FILEVAR_=;
         file dummy filevar=_FILEVAR_ dsd lrecl=256 /*adjust lrecl as needed*/;
        
link putnames;
         end;
     
link putvals;
      end;
  
stop;
  
format _FILEINDEX_ z2.;
  
run;


You should see something like this in the log.


1-22-2015 10-39-27 AM.png

Message was edited by: data _null_ It occurred to me that if I was to call this program a generic solution I needed to at least make an effort to guard against variable name clashes with the user's data.  I changed the program's "internal" variables to be _variable-name_, which should suffice for most situations.

Frequent Contributor
Posts: 120

Re: Splitting dataset on total no of observations

hi

oh its difficult to understand

Respected Advisor
Posts: 3,775

Re: Splitting dataset on total no of observations

You can always RTM.

Ask a Question
Discussion stats
  • 7 replies
  • 415 views
  • 1 like
  • 4 in conversation