BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
hellohere
Pyrite | Level 9

I write one macro which works fine to split data into 2-subgroups.

Logic is 1) sort by target var 2) based on weight/wt, split into 2 subgroups.

 

BUT how to split into N-Subgroups with N as a parameter, say N=3 or 7?!

 

%macro bisect(ds, byvar, cutvar, note);
proc sort data=&ds.; by &byvar.; run;quit;
 
data &ds.;
set  &ds.; 
retain wt_sum;
wt_sum=coalesce(wt_sum,0)+&cutvar.;
run;quit;
 
data  &ds.(drop=wt_sum);
set  &ds.; 
if wt_sum<=50 then &note.=1;
else &note.=0;
run;quit;
%mend;
1 ACCEPTED SOLUTION

Accepted Solutions
yabwon
Onyx | Level 15

Try this:

data have;
  do kg=1 to 100;
    output;
  end;
run;


%macro N_sect(ds, byvar, cutlist);
proc sort data=&ds.; 
  by &byvar.; 
run;quit;

%local N i; 
%let N=%eval(%sysfunc(countw(&cutlist.))+1);

data  
%do i=1 %to &n.;
  result_&i.
%end;
;
set  &ds.; 
select;
  %do i=1 %to %eval(&n.-1);
    when( &byvar. < %scan(&cutlist.,&i.,%str( )) ) output result_&i.;
  %end;
  otherwise output result_&n.;
end;
run;
%mend;

%N_sect(have, kg, 27 42 99)
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



View solution in original post

13 REPLIES 13
Patrick
Opal | Level 21

Your explanations and code leave some unanswered questions for me.

Suggest you post sample data, desired result and explain us the logic to populate the grouping variable. This allows us to provide tested code that creates the desired result.

hellohere
Pyrite | Level 9
data _temp;
do i=1 to 1000;
	val=round(ranuni(i)*100,0.001);
	_wt=ranuni(i-1);
	output;
end;
run;quit;

proc sql noprint;
select sum(_wt) into: wttot from _temp;
quit;

data _temp(drop=_wt);
set _temp;
wt=round(_wt/&wttot.*100,0.0001);
run;quit;

sample data below:

Patrick
Opal | Level 21

@hellohere 

And what's the desired result for this sample data?

hellohere
Pyrite | Level 9

Desired Results, say N=3,

 

1) sort by val

2)cumsum(wt)

3) if <100/3.0 then group_=1; else if <100*2/3.0 then group_=2; else group_=3;

 

Note, N is passed through as a parameter. It can be 5/7 ... 

One quick way is group_=floor(cumsum_wt/N)+1; BUT, this is not exactly I need, because I need 2-tier split/sect, split conditional on another group_label.
Patrick
Opal | Level 21

It would really help if you could explain a bit better what you really got and what you're trying to achieve. 

For example if you just want to assign randomly a group or if you want to group your rows based on some variable value (or the result of an expression) like quartiles etc. 

 

Here how to assign rows randomly to 4 groups. Btw: Start using rand() instead of ranuni() which is "obsolete".

%let n=4;
data demo;
  do i=1 to 1000;
    val=rand('uniform',1,100);
    group=rand('integer',1,&n);
    output;
  end;
run;

 

hellohere
Pyrite | Level 9

The macro below splits into N groups (cutvar=wt). 

 

But this does not work on 2-tier split, conditional on group_1st_tier_flag. 

 &note.=floor(wt_sum/(100/&N.))+1; cannot apply to to 2nd-tier split. 

 

%macro bisectN(ds, byvar, cutvar=wt,N, note);
	proc sort data=&ds.; by &byvar.; run;quit;

	data &ds.;
	set  &ds.; 
	retain wt_sum;
	wt_sum=coalesce(wt_sum,0)+&cutvar.;
	run;quit;
				
	data  &ds.(drop=wt_sum);
	set  &ds.; 
	&note.=floor(wt_sum/(100/&N.))+1;
	run;quit;
%mend;
yabwon
Onyx | Level 15

Try this:

data have;
  do kg=1 to 100;
    output;
  end;
run;


%macro N_sect(ds, byvar, cutlist);
proc sort data=&ds.; 
  by &byvar.; 
run;quit;

%local N i; 
%let N=%eval(%sysfunc(countw(&cutlist.))+1);

data  
%do i=1 %to &n.;
  result_&i.
%end;
;
set  &ds.; 
select;
  %do i=1 %to %eval(&n.-1);
    when( &byvar. < %scan(&cutlist.,&i.,%str( )) ) output result_&i.;
  %end;
  otherwise output result_&n.;
end;
run;
%mend;

%N_sect(have, kg, 27 42 99)
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



hellohere
Pyrite | Level 9
Thanks, this surely works. But the cutlist is typically now know forehand.
ballardw
Super User

@hellohere wrote:
Thanks, this surely works. But the cutlist is typically now know forehand.

If the "cutlist" is known beforehand what does it look like? Details matter.

 

If the groups are to be based on the value of a single variable then a FORMAT using the values in the cutlist may be the most flexible. But you have not shown what your "cutlist" or rules involving a more than 2 values at this time.

 

If your cutlist is all related to single variable then the format is easiest as just changing the name of format will perform different analysis using the groups created by the format. Here is an example that you can run as you should have the SASHELP.CLASS dataset supplied by SAS. The analysis isn't terribly interesting as the same size is small but it demonstrates the principal of using different formats with the same variable.

Proc format library=work;
value group_one
11,12,13 = "11 to 13"
14-high="14 and older"
;
value group_two
12,14,16 = '12,14,16'
other= 'everything else'
;
value group_three
11,12='11 & 12'
13,14='13 & 14'
15,16='15 & 16'
;

run;

proc freq data=sashelp.class; 
   tables sex* age/chisq;
   format age group_one.;
   title 'Analysis with format group_one';
run;title;

proc freq data=sashelp.class; 
   tables sex* age/chisq;
   format age group_two.;
   title 'Analysis with format group_two';
run;title;

proc freq data=sashelp.class; 
   tables sex* age/chisq;
   format age group_three.;
   title 'Analysis with format group_three';
run;title;

The groups created by formats will be honored by almost every analysis or modeling procedure, report procedures and most graphing procedures.

 

Also note that the syntax for formats may be easier than for a bunch of if/then/else statements.

 

 

 

 

PaigeMiller
Diamond | Level 26

Why do this at all?  What use will you make of your seven separate data sets? How is seven data sets better for any situation, compared to one data set where you have a group number assigned 1 through 7, and then doing analysis with a BY variable for the groups 1 through 7, or with a WHERE statement? (And since you didn't ask for random sampling to make the 7 groups, I can't even imagine any use).

--
Paige Miller
hellohere
Pyrite | Level 9

ee, this is for research.

In research, you slice and dice to see any improvement, and what is the driving force...

Before slice and dice, you donot know much. 

PaigeMiller
Diamond | Level 26

@hellohere wrote:

ee, this is for research.

In research, you slice and dice to see any improvement, and what is the driving force...

Before slice and dice, you donot know much. 


This is not my understanding of how research or data analysis works. You don't slice and dice arbitrarily based upon some artificial group number. You can slice and dice at random to see if the random slices are different and to see how variable the slices are. You can slice and dice purposefully based upon some attribute, for example are the farms in Southwest Getzville more productive than farms in Northwest Getzville. Simply assigning a number (not at random) 1 through 7 for slicing and dicing doesn't fit my understanding of "research" and doesn't fit my understanding of statistics or data analysis.


But anyway, even if you are right and I am wrong about research, this type of slicing and dicing does not require separate data sets and in fact separate data sets makes programming more difficult.

--
Paige Miller
ballardw
Super User

With thinking as shown in your code you really need to provide a small worked example.

 


@hellohere wrote:

I write one macro which works fine to split data into 2-subgroups.

Logic is 1) sort by target var 2) based on weight/wt, split into 2 subgroups.

 

BUT how to split into N-Subgroups with N as a parameter, say N=3 or 7?!

 
data &ds.;
set  &ds.; 
retain wt_sum;
wt_sum=coalesce(wt_sum,0)+&cutvar.;
run;quit;
 

That highlighted line really needs much clarification. The value of &cutvar would never change so basically you have two point scale regardless. Second at some point  you might be considering comparisons between wt_sum=0 and Cutvar=4 with another set wt_sum=1 and cutvar=3. Both would have a value of 4 so extremely hard to tell what the "subgroup" difference might be.

 

Really , provide a small dummy example data set. Tell us what the 3 or more cutvar might be. Then demonstrate how the resulting data set(s) are to actually be used.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 1805 views
  • 0 likes
  • 5 in conversation