BookmarkSubscribeRSS Feed
victorleehc0
Calcite | Level 5

I am trying to split a big data file into several subsets. The subsets only have two repeated observations. Why my code does not work? How come the two duplicated records on each subsets?  Thank you for being so helpful!

 
options mlogic ;
options MPRINT;
options SYMBOLGEN;
data tt;
  do i=1 to 1e7;
  KEY = int(ranuni(0)*1e5);
  output;
  end;
  run;
 
 
%macro split_1(N=3, data=tt);
%do i=1 %to &N;
    data or_sub&i;
    IF i < &N  THEN  do;
  %let   a = %eval(10000*&i);
  %let b = %eval(10000+10000*&i);
         set tt(FirstObs=&a obs=&b  );
   end;
  
  run;
 %put ERROR: &i;
%end;
%mend split_1;
%split_1(N=3, data=Tt)
2 REPLIES 2
Tom
Super User Tom
Super User

What SAS code do you think that macro is going to generate?

Let's reformat the code so it is easier to tell the difference between the macro statements and the SAS statements it is going to generate. 

Indenting the macro statements and SAS statements independently will make it clearer the scope of your code blocks make it harder to confuse the macro code for the SAS code.

Let's also move the macro statements to where they are actually going to execute relative to when the generated SAS code will execute.

%macro split_1(N=3, data=tt);
  %do i=1 %to &N;
    %let a = %eval(10000*&i);
    %let b = %eval(10000+10000*&i);
data or_sub&i;
  IF i < &N  THEN  do;
    set tt(FirstObs=&a obs=&b  );
  end;
run;
    %put ERROR: &i;
  %end;
%mend split_1;
%split_1(N=3, data=Tt)

What do you think that IF statement is doing?  

Why do you write an ERROR message to the log every time the %DO loop executes?

 

After you work that out then look at the LOGIC you appear to be trying to generate.  It looks like perhaps you meant to use the macro to generate SAS code like this:

data or_sub1;
  set tt(FirstObs=10000 obs=20000  );
run;
data or_sub2;
  set tt(FirstObs=20000 obs=30000  );
run;
data or_sub3;
  set tt(FirstObs=30000 obs=40000  );
run;

Why did you skip the first 9,999 observations?  Why did you write the 10,000th observation to both OR_SUB1 and to OR_SUB2?

 

What is the actual goal of the macro?  Split it HOW?

 

Let's assume the goal is to split into datasets with no more than 10,000 observations each. A much faster way to do that would be something like this that only reads the dataset once creating all of the subsets at once.

data or_sub1 or_sub2 or_sub3 ;
  set tt;
  select;
    when (_n_ < 10001) output or_sub1;
    when (_n_ < 20001) output or_sub2;
    when (_n_ < 30001) output or_sub3;
    otherwise do; put 'ERROR: More observations than expected.'; stop; end;
  end;
run;

Example:

67   data class1 class2 class3 ;
68     set sashelp.class;
69     select;
70       when (_n_ < 6)  output class1;
71       when (_n_ < 11) output class2;
72       when (_n_ < 16) output class3;
73       otherwise do; put 'ERROR: More observations than expected.'; stop; end;
74     end;
75   run;

ERROR: More observations than expected.
NOTE: There were 16 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS1 has 5 observations and 5 variables.
NOTE: The data set WORK.CLASS2 has 5 observations and 5 variables.
NOTE: The data set WORK.CLASS3 has 5 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds

So to generate that step you will need two separate %DO loops.  One to make the list of datasets in the DATA statement. And the second to generate the series of WHEN () statements for the SELECT block.

PaigeMiller
Diamond | Level 26

@victorleehc0 wrote:

I am trying to split a big data file into several subsets.


Why? How are you going to analyze this big data when it is separated into many data sets? Depending on your answer, there may be much better solutions than using a macro.

--
Paige Miller

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 490 views
  • 1 like
  • 3 in conversation