BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
dxiao2017
Lapis Lazuli | Level 10

😀Have already had many posts on this topic about the split table macro solution regarding SAS SQL1 and Macro1 practice question. 😀This maybe the last one, s106s04,SAS SQL1,pdf p321, the solution (split table according to levels of one column) is as follows:

dxiao2017_0-1750918048531.png

I think the simpler solution is as follows:

%macro split;
proc sql;
select distinct region
   into :reglist1-
   from sq.globalmetadata;
quit;
%do i=1 %to &sqlobs;
data gm&i;
   set sq.globalmetadata;
   where region="&&reglist&i";
run;
proc print data=gm&i(obs=2);
run;
%end;
%mend split;
%split;

dxiao2017_1-1750918284230.png

dxiao2017_3-1750918351133.png

dxiao2017_5-1750918411569.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
%macro split;
proc sql noprint;
   select distinct region
   into :reglist1-
   from sq.globalmetadata;
quit;

data %do i=1 %to &sqlobs; gm&i %end; ;
   set sq.globalmetadata;
   %do i=1 %to &sqlobs; if region="&&reglist&i" then output gm&i%str(;) %end;
run;
%mend;
%split

 

Your data steps pass through the data &sqlobs times. In my code, the data step passes through the data once. Thus, my code will be faster, particularly if your real world data is huge. So I don't like splitting the data up using many data sets.

 

Your code and my code also creates data sets whose names are gm1 gm2 ... and it may be hard to identify which data set to use when you are looking for "Latin America & Caribbean" data. So I don't like splitting the data up.

 

I realize that this is an assignment of some sort, but in almost all cases, leaving the data in one big data set and then using BY statements seems to be a preferable way to go, rather than macro writing to create many smaller data sets (yes, there may be occasional exceptions to this rule). Reason number three why I don't like splitting the data up.

--
Paige Miller

View solution in original post

13 REPLIES 13
PaigeMiller
Diamond | Level 26
%macro split;
proc sql noprint;
   select distinct region
   into :reglist1-
   from sq.globalmetadata;
quit;

data %do i=1 %to &sqlobs; gm&i %end; ;
   set sq.globalmetadata;
   %do i=1 %to &sqlobs; if region="&&reglist&i" then output gm&i%str(;) %end;
run;
%mend;
%split

 

Your data steps pass through the data &sqlobs times. In my code, the data step passes through the data once. Thus, my code will be faster, particularly if your real world data is huge. So I don't like splitting the data up using many data sets.

 

Your code and my code also creates data sets whose names are gm1 gm2 ... and it may be hard to identify which data set to use when you are looking for "Latin America & Caribbean" data. So I don't like splitting the data up.

 

I realize that this is an assignment of some sort, but in almost all cases, leaving the data in one big data set and then using BY statements seems to be a preferable way to go, rather than macro writing to create many smaller data sets (yes, there may be occasional exceptions to this rule). Reason number three why I don't like splitting the data up.

--
Paige Miller
dxiao2017
Lapis Lazuli | Level 10
I think you are right. I'll use this technique when need to split dataset. The syntax is a bit more complex comparing to mine (and I was not familiar with a loop within a statement and I once put extra semicolon and also put the semicolon in wrong place). Thanks a lot for reply me!
Tom
Super User Tom
Super User

The %MACRO is doing nothing in the example code in your photograph.  Once you have the set of commands in the macro variable ALLSTEPS you can just expand it

&allsteps;

There is not need to first wrap it into a macro defintion and then call the macro.

 

If you did want to define a macro it should probable include the SQL step and take as input parameters the source dataset and variable names (what for some reason it is calling table and column). 

dxiao2017
Lapis Lazuli | Level 10

Hi @Tom , thanks for reply! The code in the screen capture consumes a lot of eyesight and also it was a bit too complex for me😀, so I did not run and test it while learning the material. I did not follow that solution (the code in the screen capture) in the material and wrote my own answer (the code in the text box) to that practice question.

 

I think you are right that, to resolve macro variable &allsteps, the %runSteps macro was not necessary, just use %put &allsteps; (or the %put is also unnecessary maybe) and one can solve it. And after reading your reply, I look at the code (in the screen capture) again, I can see that the code was intended to generate the code for a repeated data step through the proc sql step. Perhaps it is a technique worth learning , in case of some unusual and rare scenarios (I hope not😀) you just need to use sql to generate character strings for a data step.

Tom
Super User Tom
Super User

There are at least three problems with that approach.

 

The code being generated into the macro variable ALLSTEPS might exceed the limit of 64K bytes that a single macro variable can hold.  And cutting that down to only 500 bytes  by using the LENGTH= option makes that even more likely.

 

The values of the variable named in the COLUMN macro variable might not contain values that can be used to name the new datasets.

 

And finally as @PaigeMiller showed it is making multiple passes through the source datasets to generate the target datasets.

 

To solve the second problem you could use numeric suffixes for the new dataset names.

 

In general I find it much easier to use a data step to write code. Then you can take advantage of the power of the data step and especially the power of the PUT statement to generate the desired code.  And you can even make the code human readable by adding indentations.

 

dxiao2017
Lapis Lazuli | Level 10

Hi @Tom , thanks for this further explain! So using sql to generate code for data step's character string is totally(or absolutely) unstable and unreliable approach. In any case when I can use data step, never try that approach. In simpler words, when I need to write data step's code, I just write data step, do not write other steps such as proc sql or marcos😀.

PaigeMiller
Diamond | Level 26

@dxiao2017 wrote:

Hi @Tom , thanks for this further explain! So using sql to generate code for data step's character string is totally(or absolutely) unstable and unreliable approach. In any case when I can use data step, never try that approach. In simpler words, when I need to write data step's code, I just write data step, do not write other steps such as proc sql or marcos [sic] 😀.


I'm not sure Tom said anything about it being unstable and unreliable, although your limit of 500 characters may be a problem on larger real world data sets. Your code is neither unstable or unreliable for this problem, it is simply inefficient. There's also nothing wrong with using SQL or macros in the right situation.

 

You seem to be ignoring the comments by myself and @Kurt_Bremser that there is rarely a reason to split data up like this. The first programming task in this situation is to determine in your mind if BY statements can handle the work better than separate data sets.

--
Paige Miller
dxiao2017
Lapis Lazuli | Level 10

Hi @PaigeMiller , thanks a lot for your comments!😀 Although I do not quiet understand what you mean😀

PaigeMiller
Diamond | Level 26

There is rarely a need to split data sets like this. I know this is an assignment you have to work on, but some day you are going to be in the real world and there will be a problem just like this, and what I am saying (and I think @Kurt_Bremser is saying) is that you shouldn't split the data set. Why? Because (except for rare cases) the BY statement does exactly what this macro splitting the data sets does and then some; also the BY statement works in most PROCs, so you can perform the task separately for each category (in your data set, the category was REGION) without splitting the data set.

 

Here's an example of not splitting the data set and making use of the BY statement.

 

proc sort data=sashelp.cars out=cars;
    by origin;
run;

proc freq data=cars;
    by origin;
    table type drivetrain make;
run;

proc means data=cars;
    by origin;
    var msrp invoice horsepower weight wheelbase length;
run;

 

Here's the same analysis, if you have to split the data and you don't use the BY statement. Please examine the code above and the code below and determine which code is shorter and easier.

 

%macro split;
proc sql noprint;
   select distinct origin
   into :origlist1-
   from sashelp.cars;
quit;

data %do i=1 %to &sqlobs; gm&i %end; ;
   set sashelp.cars;
   %do i=1 %to &sqlobs; if origin="&&origlist&i" then output gm&i%str(;) %end;
run;
%mend;
%split

proc freq data=gm1;
    table type drivetrain make;
run;

proc freq data=gm2;
    table type drivetrain make;
run;

proc freq data=gm3;
    table type drivetrain make;
run;

proc means data=gm1;
    var msrp invoice horsepower weight wheelbase length;
run;

proc means data=gm2;
    var msrp invoice horsepower weight wheelbase length;
run;

proc means data=gm3;
    var msrp invoice horsepower weight wheelbase length;
run;

 

 

Unfortunately, the person who designed your assignment created an assignment that teaches the wrong lesson. After you finish the assignment, and you go out into the real world and you want to analyze data split by some category, a person who completed this assignment knows: "I can write a macro and split the data set". That is the wrong lesson. The right lesson to learn is that you can do this without splitting the data set (except for rare cases) by using the BY statement to split and analyze the data.

--
Paige Miller
dxiao2017
Lapis Lazuli | Level 10

Hi Paige, thanks a lot for reply, so in most cases, split table is not necessary, unless one really need to do so.

dxiao2017
Lapis Lazuli | Level 10
Hi Tom, I think this thread (your 2nd thread) is the best answer for my post (solves major problems regarding the codings in my post and my thinking and question in mind). PaigeMiller replied me first and offered a very good solution and possible improvement for my answer (and also the loop within a statement is a technique I just need to learn) and I marked as the answer.
Kurt_Bremser
Super User

To provide some perspective:

In my professional career spanning more than two decades of SAS work, I did a split like this exactly once.

The reason was this:

  • the batch job program needed to be frequently adapted by data warehouse end users
  • DWH users had limits on storage which were not applied to the the batch job user
  • a dataset reached a size during processing where the utility file of PROC SORT (or SQL) would crack the quota on the UTILLOC location

So I split the dataset along the primary sort key, sorted each sub-dataset on the secondary keys, and created a view combining all sub-datasets.

dxiao2017
Lapis Lazuli | Level 10

Hi @Kurt_Bremser , thanks a lot for reply and your comments! Although I do not quiet understand what you said, I appreciate 😀!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Autotuning Deep Learning Models Using SAS

Follow along as SAS’ Robert Blanchard explains three aspects of autotuning in a deep learning context: globalized search, localized search and an in parallel method using SAS.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 2562 views
  • 6 likes
  • 4 in conversation