BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pangea17
Quartz | Level 8

 

Using Enterprise Guide 7.1 and SAS 9.4.  I have a dataset with 3 variables one of them being UPC7 which signifies different product types.  I want to create a dataset named for each the product type. The types may change from month to month, so I want it based on what is in the main dataset.  So each dataset will only have several observations of that product type. 

 

&MACRO sing (solar);

 data test_&solar;

set perm.jupiter;

if UPC7 = &solar;

RUN;

%MEND; %sing (upc7name); 

instead of listing each product code

%sing(AA);  %sing(AB); %sing(CG);  %sing(BA); etc

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Why are you splitting the data?

 

Do you want to select the records where the variable UPC7 is equal to the variable AA?  Or has the value "AA"?

%macro sing(solar);
data test_&solar;
  set perm.jupiter;
  if UPC7 = "&solar";
run;
%mend;

To make the splitting dynamic get the list of possible values and use that to generate the macro calls.

proc sort data=perm.jupiter(keep=upc7) out=list nodupkey;
  by upc7;
run;
data _null_;
  set list;
  call execute('%nrstr(%sing)('||upc7||')');
run;

 

 

View solution in original post

6 REPLIES 6
Astounding
PROC Star

The best way to do this is with hashing.  Might as well listen to the experts:

 

https://communities.sas.com/t5/SAS-Communities-Library/Splitting-a-SAS-data-set-based-on-the-value-o...

 

Another alternative is to leave the data as is, and process using a BY statement.  If absolutely necessary, you can split the data using macro language, but it isn't necessary given the hashing solution.

Tom
Super User Tom
Super User

Why are you splitting the data?

 

Do you want to select the records where the variable UPC7 is equal to the variable AA?  Or has the value "AA"?

%macro sing(solar);
data test_&solar;
  set perm.jupiter;
  if UPC7 = "&solar";
run;
%mend;

To make the splitting dynamic get the list of possible values and use that to generate the macro calls.

proc sort data=perm.jupiter(keep=upc7) out=list nodupkey;
  by upc7;
run;
data _null_;
  set list;
  call execute('%nrstr(%sing)('||upc7||')');
run;

 

 

pangea17
Quartz | Level 8

I am splitting the data to run some proc arima code. I could use a by statement, but that would mean that each by group has to use the same model, which might not be the way to go depending on the data.

Tom
Super User Tom
Super User
In that case just use a WHERE statement in your model.
Kurt_Bremser
Super User

If you need to perform analysis on single subsets of the data, create an index on the variable in question to speed up where conditions.

If you need analysis grouped on that variable (across the whole or most of the dataset), use by or class in the respective procedures. Splitting is not needed in most cases, needs extra steps (and therefore degrades performance), and increases storage consumption.

Astounding
PROC Star

OK, with a few warnings ... it may not be right to split up the data at all, hashing is a better way, and macro language contains pitfalls ... here is a long-winded explanation that eventually illustrates a macro language solution.

 

First, look at the objective you are setting up.  You are trying to set up 100 or so DATA steps along these lines:

 

data test_AA;
   set perm.jupiter;
   if UPC7 = "AA";
run;

data test_AB;
   set perm.jupiter;
   if UPC7 = "AB";
run;

data test_CG;
   set perm.jupiter;
   if UPC7 = "CG";
run;

That plan illustrates one of the pitfalls of macro language:  it's easy to abuse.  Carried out, that plan would have to read in all the data from perm.jupiter 100 times.  It would be much faster to read in the data once, along these lines:

 

data test_AA test_AB test_CG;
   set perm.jupiter;
   if UPC7 = "AA" then output test_AA;
   else if UPC7 = "AB" then output test_AB;
   else if UPC7 = "CG" then output test_CG;
run;

Since your data sets are small, it might not make a difference to you.  Perhaps the flawed version with 100 DATA steps takes a minute to run, while the one-data-step version takes a couple of seconds to run.  However, now is the time to practice good habits.  If your data sets become larger, perhaps you are taking an hour to run instead of a minute or two.  So practice when you have small data sets and can afford the time to experiment.

 

On a side note, switching your original DATA steps from an IF statement to a WHERE statement would probably run faster as well.  But that's another topic for another day.

 

So how do you get macro language to generate one DATA step?  Here's an approach.

 

Macro language can get a list of all UPC7 codes, and store that list in a macro variable.  It can process the list to generate the required one-data-step program.

 

%macro many_upc7;

%local k next_code;

proc sql;
   select distinct(upc7) into : upc7_list  separated by " "  from perm.jupiter;
quit;

data 
   %do k=1 %to %sysfunc(countw(&upc7_list));
      %let next_code = %scan(&upc7_list, &k);
      test_&next_code
   %end;
   ;
   set perm.jupiter;
   %do k=1 %to %sysfunc(countw(&upc7_list));
      %let next_code = %scan(&upc7_list, &k);
      %if &k > 1 %then else;
      if upc7 = "&next_code" then output test_&next_code;
   %end;
run;

%mend many_upc7;

%many_upc7

Admittedly, it uses some sophisticated coding.  (It is untested, since you have access to the data.)  On the other hand, it's realistic in terms of the tools that would be used in real life.  So if you have questions about it, fire away.

 

Also note, SQL counts how many observations it returns.  It should be possible to use the automatic variable &SQLOBS instead of counting with %sysfunc(countw(&upc7_list))

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1330 views
  • 3 likes
  • 4 in conversation