BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
danielchoi626
Calcite | Level 5

I've been trying to split a master dataset into several smaller datasets based on a category variable, problem is this category variable contains over 50 different categories and I would like to keep my code as simple as possible using a loop. To demonstrate what I mean, this is the method I would've used if I were using Python:

df_A = []
for i in df_1["Category"].unique():
    df_A.append(df_1[df_1["Category"] == i])

Is there a way to do something similar in SAS using loops? I've tried the following code to no success.

%MACRO DATA_SEPARATOR(CATEGORY = );
	DATA DUMMY_&CATEGORY;
	SET DUMMY;
			IF CATEGORY = "&CATEGORY";
	RUN;
%MEND;
					

DATA DUMMY;
INPUT INDEX CATEGORY $4. FIGURES;
CARDS;
0	A	2744
1	A	2874
2	A	823
3	A	1411
4	A	2168
5	A	2816
6	A	1212
7	A	2294
8	A	433
9	B	1137
10	B	2857
11	B	2417
12	B	2348
13	B	762
14	B	836
15	B	684
16	B	1869
17	B	912
18	B	2159
19	B	1388
20	B	1477
21	C	836
22	C	2846
23	C	1173
24	C	1138
;
RUN;

%DO I = "A", "B", "C";
%DATA_SEPARATOR(CATEGORY = &I);
%END;

On a related note, is there a way to create a unique table or list of values in the "Category" column for use in the loop? I'm well aware that PROC SQL or PROC SORT NODUPKEY can be used for this purpose, I'm just not sure how to incorporate said values into the loop from a table, which is why I manually typed out the unique values in the category variable

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello @danielchoi626 ,

There are much shorter and more performant ways than the below (in fact it can be done in just one data-step with as much output datasets as there are categories), but the below has the advantage of being clear and readable :

PROC FREQ data=DUMMY noprint;
 tables CATEGORY / missing out=work.count_CATEGORY;
run;

data _NULL_;
 if 0 then set work.count_CATEGORY nobs=count;
 call symputx('number_of_categories',put(count,8.));
 STOP;
run;
%PUT &=number_of_categories;

%MACRO LOOP_OVER_CATEGORIES;
%DO i=1 %TO &number_of_categories.;
 data _NULL_;
  set work.count_CATEGORY(firstobs=&i. obs=&i.);
  call symputx('current_category',CATEGORY);
 run;
 %PUT &=current_category;

 data work.dummy_category_&current_category.;
  set work.DUMMY;
  if category="&current_category." then output;
  else delete;
 run;
%END; %MEND LOOP_OVER_CATEGORIES; %LOOP_OVER_CATEGORIES QUIT; /* end of program */

Koen

View solution in original post

4 REPLIES 4
sbxkoenk
SAS Super FREQ

Hello @danielchoi626 ,

There are much shorter and more performant ways than the below (in fact it can be done in just one data-step with as much output datasets as there are categories), but the below has the advantage of being clear and readable :

PROC FREQ data=DUMMY noprint;
 tables CATEGORY / missing out=work.count_CATEGORY;
run;

data _NULL_;
 if 0 then set work.count_CATEGORY nobs=count;
 call symputx('number_of_categories',put(count,8.));
 STOP;
run;
%PUT &=number_of_categories;

%MACRO LOOP_OVER_CATEGORIES;
%DO i=1 %TO &number_of_categories.;
 data _NULL_;
  set work.count_CATEGORY(firstobs=&i. obs=&i.);
  call symputx('current_category',CATEGORY);
 run;
 %PUT &=current_category;

 data work.dummy_category_&current_category.;
  set work.DUMMY;
  if category="&current_category." then output;
  else delete;
 run;
%END; %MEND LOOP_OVER_CATEGORIES; %LOOP_OVER_CATEGORIES QUIT; /* end of program */

Koen

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @danielchoi626 

 

You cannot use %do in open code, the statement is only valid in a macro definition. The simplest way to get a list of categories is to use Proc SQL with Select Into.

 

The following macro is coded to accept any input dataset and any variable containing the categories, as long as the combined length of the input dataset name + longest category does not exceed 31, because the output data set  names cannot be longer than 32 chars. It works on your test data set.

 

%MACRO DATA_SEPARATOR(input_ds, CategoryVar);
	%* Get list of different Categories, set count of Categories;
	proc sql noprint;
		select distinct &CategoryVar into: CatList separated by ';' from &input_ds;
		select count(distinct &CategoryVar) into: CatCount from &input_ds;
	quit;

	%* Loop over Categories;
	%do i = 1 %to &CatCount;

		%* Pick Category [loop_index] from list;
		%let ThisCat = %scan(&CatList,&i,%str(;));

		%* Create new dataset with Category-subset from input; 
		data &input_ds._&thiscat;
			set &input_ds (where=(&CategoryVar = "&ThisCat"));
		run;
	%end;
%MEND;
%DATA_SEPARATOR(dummy,CATEGORY);

 

 

 

 

 

 

Patrick
Opal | Level 21

@danielchoi626 

With SAS it's normally better and much more efficient to use by-group processing instead of splitting data into multiple tables. 

The following code should do the job for splitting the data into multiple tables as long as you can fit the data for a single category into memory. I'm using a hash table because this allows to create output data sets within a SAS data step "on-the-fly".

DATA DUMMY;
  INPUT INDEX CATEGORY $4. FIGURES;
CARDS;
0 A 2744
1 A 2874
2 A 823
3 A 1411
4 A 2168
5 A 2816
6 A 1212
7 A 2294
8 A 433
9 B 1137
10 B 2857
11 B 2417
12 B 2348
13 B 762
14 B 836
15 B 684
16 B 1869
17 B 912
18 B 2159
19 B 1388
20 B 1477
21 C 836
22 C 2846
23 C 1173
24 C 1138
;

proc sql;
  create view dummy_by_cat as
  select *
  from dummy
  order by category
  ;
quit;

data _null_;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'DUMMY(obs=0)', ordered:'y', multidata:'y');
      h1.defineKey('INDEX');
      h1.defineData(all:'y');
      h1.defineDone();
    end;

  set dummy_by_cat;
  by category;
  _rc=h1.add();

  if last.category then
    do;
      _rc=h1.output(dataset:cats('dummy_',category));
      _rc=h1.clear();
    end;
run;

 

As for creating a distinct list of categories: Using a Hash table allows you to create such a list without the need to sort the source data.

data _null_;
  if 0 then set DUMMY(keep=category);
  dcl hash h1(dataset:'DUMMY(keep=category)', ordered:'y');
  h1.defineKey('category');
  h1.defineData('category');
  h1.defineDone();

  h1.output(dataset:'Distinct_Categories');
  stop;
run;

  

PaigeMiller
Diamond | Level 26

@danielchoi626 

 

Have you considered not splitting the data set like this and using BY statements in your PROCs to do analysis over all groups?

 

Have you considered not splitting the data set like this and use WHERE statements in your PROCs to limit the analysis to specific groups?

--
Paige Miller

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 510 views
  • 6 likes
  • 5 in conversation