Hi all,
My problem is to break up a large data-set into smaller ones as below.
I have a data set containing about ~1000 variables. I converted each of these variable names to macro-variables called: varname1,varname2,,,varname100,...etc.
Now I need to generate smaller data-sets , each containing only 5 variables. For example, I tried the following SAS macro code but I get an error.
%macro batch_Chk(start,fin);
%do i =&start %to &fin by 5;
%let res=%evalf(&i. + 4);
%do j =&i. %to &res. by 1;
data words_&i.; /* Words_i are the sub-data-sets with 5 variables each */
set transf_vars; /* Transf_vars is the Original data */
%put &res; /* check res value...*/
keep &&varname&i.- &&varname&res.;
%end;
run;
%end;
%mend;
%batch_Chk (1,10);
SYMBOLGEN: Macro variable START resolves to 1
SYMBOLGEN: Macro variable FIN resolves to 10
ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric
operand is required. The condition was: &fin by 5
ERROR: The %TO value of the %DO I loop is invalid.
ERROR: The macro BATCH_CHK will stop executing.
My goal is to have the 1000 variables (along with their content) broken into smaller data-sets with just 5 variables each.
I initially thought I could change the variable names to macro-variables and manipulate them inside a DO loop - but I am not getting the results I hoped for. Would really appreciate any advise or help about this problem!
Thanks very much!
Now that can be solved with an inner loop:
%macro batch_Chk(start,fin);
%do i = &start %to &fin %by 5;
%let res=%eval(&i. + 4);
data words_&i.; /* words_i is a subset data with 5 variables*/
set transf_vars; /* transf_vars is the Original Data w/all Variables*/
keep
%do j = &i %to &res;
&&varname&j.
%end;
;
run;
%end;
%mend;
Proper visual formatting will make solving your issues easier:
%macro batch_Chk(start,fin);
%do i = &start %to &fin by 5;
%let res=%evalf(&i. + 4);
%do j =&i. %to &res. by 1;
data words_&i.; /* words_i is a subset data with 5 variables*/
set transf_vars; /* transf_vars is the Original Data w/all Variables*/
%put &res;
%put &res;
keep &&varname&i.- &&varname&res.;
%end;
run;
%mend;
Now you can immediately see that a %end is missing.
But your macro already coughs up at "by", as that is not a macro keyword.
Fixing that:
%macro batch_Chk(start,fin);
%do i = &start %to &fin %by 5;
%let res=%evalf(&i. + 4);
%do j =&i. %to &res. by 1;
%put &res;
data words_&i.; /* words_i is a subset data with 5 variables*/
set transf_vars; /* transf_vars is the Original Data w/all Variables*/
keep &&varname&i.- &&varname&res.;
run;
%end;
%end;
%mend;
Moved the %put &res; to a place where it is easier to understand, and removed the second identical statement.
But where do you get the 1000 macro variables varname1 to varname1000 from?
That's because there is no %evalf function, you probably meant %sysevalf, but that is overkill, as we only work with integer values here:
%macro batch_Chk(start,fin);
%do i = &start %to &fin %by 5;
%let res=%eval(&i. + 4);
%put &res;
data words_&i.; /* words_i is a subset data with 5 variables*/
set transf_vars; /* transf_vars is the Original Data w/all Variables*/
keep &&varname&i.- &&varname&res.;
run;
%end;
%mend;
Thanks for the help!
Your version runs, though it throws an error which I should have anticipated:
SYMBOLGEN: Macro variable VARNAME5 resolves to AVGEARN_NSA
ERROR: Missing numeric suffix on a numbered variable list (AUTO_REPO_DIRECT-AVGEARN_NSA).
The problem is that though I have the macro-variables names set up as varname1, varname2,...,varname1000,
when SAS dereferences them, the ordering is lost -the actual names inside varname1 , varname2 , etc. are not similar to each other at all.
Thanks again for your help , but if you have any ideas how I go about creating these sus-data-set form the original data (actually it has exactly 726 names), please do help!
Thanks!
Now that can be solved with an inner loop:
%macro batch_Chk(start,fin);
%do i = &start %to &fin %by 5;
%let res=%eval(&i. + 4);
data words_&i.; /* words_i is a subset data with 5 variables*/
set transf_vars; /* transf_vars is the Original Data w/all Variables*/
keep
%do j = &i %to &res;
&&varname&j.
%end;
;
run;
%end;
%mend;
Hi Kurt - thank , this works!
I am now going to try and extend this to my original data-set, which actually has 726 variables (not 1000, and so the last loop may have some issues). Will try to work around or post back here!
Hi Kurt,
I took your solution, but I have to add that that I shifted the indexes to 6 and 5 instead of 5 and 4 - since I actually have 726 variables (divisible by 6 and only one more that 5, so the final report can still be read - just barely), and not 1000.
Thanks very much again! Especially for catching all the small mistakes which I hadn't noticed despite staring at them for a long time!!!
If you think of a better way to break-up a data-set with ~1000 variables into smaller datasets, please let me know on this thread!
Use the SASHELP.VTABLE data which includes the VARNUM so you can control the ordering of your variables.
Ups, just saw that the inner %do loop makes no sense at all:
%macro batch_Chk(start,fin);
%do i = &start %to &fin %by 5;
%let res=%evalf(&i. + 4);
%put &res;
data words_&i.; /* words_i is a subset data with 5 variables*/
set transf_vars; /* transf_vars is the Original Data w/all Variables*/
keep &&varname&i.- &&varname&res.;
run;
%end;
%mend;
The outer loop is supposed to handle the next group of 5 words.
The inner loop indexed by J runs from ` to 5, then the outer index I becomes 6: so that J now runs from 6 to 10.
Obviously eventually I hope to extend it to handle 1000 variables (in my dataset transf_vars).
I'm not sure how I would do the breaking down (keeping adjacent sets of 5 variables together)
@nstdt wrote:
The outer loop is supposed to handle the next group of 5 words.
The inner loop indexed by J runs from ` to 5, then the outer index I becomes 6: so that J now runs from 6 to 10.
Obviously eventually I hope to extend it to handle 1000 variables (in my dataset transf_vars).
I'm not sure how I would do the breaking down (keeping adjacent sets of 5 variables together)
You don't need the "j" loop at all.
The first iteration of the outer loop will create dataset words_1 with variables 1 to 5, the next will create words_6 with vars 6 to 10, and so on.
As posted that code will not compile the macro as there is an unclosed %do statement.
You want %by not by in the loop
Since you aren't actually using the macro variable J any where why is it there? As written it will overwrite the data set words&i 5 times (if the by issue gets fixed)
I strongly suggest that before actually manipulating your data that you just print out the index values and resolved values of the variable names.
I have a hard time seeing a good use for this operation though. Especially since most of the data sets following the logic you attempted would not have identification variables to allow easy realignment.
Tested code would require an example input data set and the macro variables varname1 to varnamennn.
And if the number of variables is not an integer multiple of 5 you'll get errors for the last "set".
You said:
I have a hard time seeing a good use for this operation though. Especially since most of the data sets following the logic you attempted would not have identification variables to allow easy realignment.
You're right, the resolved names of the macro-variables do not have any oredring pattern and I get errors.
So I'm back to Square-One - How do I break down a dataset of 726 variables into smaller data-sets with just 5 variables each?
Why are you doing this in the first place? If there's a valid reason, there's usually a way to organize the data that makes sense contextually. Are you trying to make a fact/dim type structure to reduce the size of the data set?
Breaking things into groups of 5 seems like a strange decision. Why not 10 or 12?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.