BookmarkSubscribeRSS Feed
ciro
Quartz | Level 8

Hi,

I have to allocate a sample of prefixed size across strata proportionally to the size of strata (number of units in the popopulation).

I found out an example with proc surveyselect:

proc surveyselect data=Customers
         n=1000 out=SampleSizes;
      strata State Type / alloc=prop nosample;
   run;

My problem is that I have to do this within groups. So I have another dataset with the group variable and the samplesize (the "n" in  the example) needed for each group.

The problem is proc surveyselect does not have by processing.

So I appreciate any suggestion to writhe down a macro that simulates the by processing.

thank you very much in advance 

 

6 REPLIES 6
PaigeMiller
Diamond | Level 26

@ciro wrote:

Hi,

I have to allocate a sample of prefixed size across strata proportionally to the size of strata (number of units in the popopulation).

I found out an example with proc surveyselect:

proc surveyselect data=Customers
         n=1000 out=SampleSizes;
      strata State Type / alloc=prop nosample;
   run;

My problem is that I have to do this within groups. So I have another dataset with the group variable and the samplesize (the "n" in  the example) needed for each group.

The problem is proc surveyselect does not have by processing.

So I appreciate any suggestion to writhe down a macro that simulates the by processing.

thank you very much in advance 

 


This is a rather brief description, and so I can't be 100% sure this is the right answer, but the N= option allows you specify a SAS data set which contains the GROUP sample sizes. I have never used this, so I leave it up to you to experiment and see if it does what you want, but I think if you add GROUP into the STRATA statement and use N=dataset which contains the group sizes needed, it ought to do what you want (if I am understanding you properly).

--
Paige Miller
ciro
Quartz | Level 8

Thank you Paige for the aswer.

However it does not seem the case. If I understand the documentation the proc seems to handle one of two possibilities: either provide a dataset with the sample size requested for each stratum (that is with a predetermined allocation among the strata) or provide a fixed total sample size to allocate among the strata (for instance proportionally) with the use of alloc option in the strata statement.

My group variable is hierarchically higher than the strata variable and I have a fixed sample size for each group variable.

Since there are strata within the group variable with a small number of observations I cannot use a simple proportion because it will end up with many strata with 0 sample units.

what I need is something that allocate per each group a given number of observations among the strata. the allocation should be as similar as possible to a proportional allocation with the constraint of a minimum number of sample units in each strata. if this minimum number is lower than the population size in a stratum then the sample size in that stratum should be equal to the population size.

 

I imagine that can also be done with proc optmodel, but I do not have much time to learn it and the necessary theory.

that's why I imagined to use proc survey select in a my group macro. 

 

 

 

 

 

 

 

 

 

PaigeMiller
Diamond | Level 26

I can't write macro code since I don't have your data. So ... you write an example without macros and without macro variables for two of the groups, hard code PROC SURVEYSELECT for group 1, followed by hard code PROC SURVEYSELECT for group 2, make sure it does what you want, do not skip this step. Then if you show me this code, turning this into a macro ought to be something I can do.

--
Paige Miller
ballardw
Super User

 

Please don't use GROUP as you are referencing strata variables of State and Type in the example code. So we cannot tell what role "group" would relate to in that sample code.

 

Provide an example of 1) your actual strata variables and 2) your actual needed sample sizes.

Or your "another dataset with the group variable and the samplesize ". Proc surveyselect will allow use of a SAMPSIZE= dataset naem to have the sample sizes needed if the data set is structured correctly such as exact same strata variables, names and values and the proper name for the variable.

 

So very likely there is no macro coding needed.

 

ciro
Quartz | Level 8

HI, 

thank you very much for all the replies. 

 

I attach the input dataset where the group variable , the strata variable and the ideintifier of the unit to be selected are respectively

ID1,str, ID2

and the allocation dataset that set for, each group, the number of units desired, n. 

 

I have found a possible solution (accepting a minimum number of units in each strata equal to 1):

the problem could be solved more easily by allocating a unit in each strata and then the remaining units proportionally to the population, but I prefer a solution with surveyselect since I might possible change the allocation method.

 

In the following there is my code. Any suggestions for improvements and issues is welcome.

Moreover I would appreciate a solution using proc optmodel in order to start learning it and use for other problems.

thank you  in advance.

 

%macro alloc(id,n);

proc surveyselect data=input
n=&n out=samplegroup;
where id1=&id;
strata str/ alloc=prop nosample allocmin=1;
run;

%if %sysfunc(exist(samplesizes)) %then %do;
%if %sysfunc(exist(samplegroup)) %then %do;
data samplesizes;
set samplesizes samplegroup;
run;
%end;
%end;
%else %do;
%if %sysfunc(exist(samplegroup)) %then %do;
data samplesizes;;
set samplegroup;
run;
run;
%end;
%end;
%mend alloc;

 

proc datasets library=work;
delete samplegroup samplesizes;
run;
quit;


proc sort data=allocds;
by id1;

data _null_;
set allocds;
by id1;
rc = dosubl(cats('%alloc(',id1,',',n,')'));
run;

 

 

 

 

 

 

 

 

 

 

ballardw
Super User

From your response to @PaigeMiller I suspect that perhaps instead of ALLOCMIN on the strata that perhaps you want the Proc option SELECTALL to select what is available when the available sample does not meet the desired sample size. The option is not available for all methods but does work for several.

 

I still think that a proper SAMPSIZE data set would be way to go.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1243 views
  • 0 likes
  • 3 in conversation