BookmarkSubscribeRSS Feed
Question
Fluorite | Level 6

Hi, I would like to use weight in proc surveyselect, instead of using n (number of observations), I would like to use the variable weight instead. I know in proc tabulate you can specify the weight instead of N but not sure here. The values below correspond to the sum of weight in each age_group

Age Weight
Under 50 3,093
50 To 60 9,537
60 To 65 1,418
65 To 70 4,361
70 To 75 7,217
75 To 80 0
7.> 80 6,080

 

proc sort data = test ; by gender age descending weight; run;

 

proc surveyselect data = test out = samp method = srs n = (3093 9537 1418 4361 7217 0 6080 ) seed = 9876; strata gender age; run;

 

Your help would be much appreciated. Thank you

6 REPLIES 6
ballardw
Super User

@Question wrote:

Hi, I would like to use weight in proc surveyselect, instead of using n (number of observations), I would like to use the variable weight instead. I know in proc tabulate you can specify the weight instead of N but not sure here. The values below correspond to the sum of weight in each age_group

Age Weight
Under 50 3,093
50 To 60 9,537
60 To 65 1,418
65 To 70 4,361
70 To 75 7,217
75 To 80 0
7.> 80 6,080

 

proc sort data = test ; by gender age descending weight; run;

 

proc surveyselect data = test out = samp method = srs n = (3093 9537 1418 4361 7217 0 6080 ) seed = 9876; strata gender age; run;

 

Your help would be much appreciated. Thank you


Surveyselect creates weights based on probability of selection for the individual records, not use "weights".

 

You can specify Proportions , SAMPRATE (i.e. .27 means select 27% of the records in the strata) instead of SAMPSIZE if that is what you mean. Otherwise you will have to show a manually worked out example of what you expect.

Or are you looking to provide a data set for use with the SAMPSIZE (or SAMPRATE) option. If you create data set that has all the combinations of the strata variables, sorted by the strata variables as they appear on the Strata statement, with a variable named SampleSize that takes the place of N= (or SAMPSIZE= ) option.

Question
Fluorite | Level 6

Hi,

 

Thank you for your reply. I am actually interested in the volume not proportion. Basically I am working on a survey and the weight represent the number of total population. The estimation of real number of participants in the survey is  as below (It's an estimation not exact). But need to select people base on the number of total population etc...not sure if I make sense...

 

I don't know yet the number of participants in each group, the weight volume below is my target and once I have the target I will be able to count the number of participants.

 

Age Number of participants Weight
Under 50 2 3,093
50 To 60 7 9,537
60 To 65 1 1,418
65 To 70 3 4,361
70 To 75 6 7,217
75 To 80 0 0
7.> 80 5 6,080
  24 31,707

 

What I mean  is how to use the weight instead of N...like in proc tabulate below, for each age group, I will get the sum of weight (Total population) instead of (count of people in a survey). Thank you

 

proc tabulate

data =  test missing;
class age ;

var weight;

table (age ALL)
, weight;

run;

 

 

ballardw
Super User

Still not following.

 

The weighted frequencies in the output of the survey procs apply the weight to the strata variables so that they represent the population sampled from.

proc sort data=sashelp.class
     out=sortclass;
   by sex;
proc surveyselect data=sortclass noprint
   out=selected stats sampsize=4;
   strata sex;
run;

proc surveyfreq data=selected;
   weight  samplingweight;
   tables sex age;
run;

If you run proc freq on the SASHELP.Class data set you see that the total number of Females is 9 and Males 10, matching the weighted frequency in the output of Surveyfreq for the SEX variable.

 

So the weights should have been calculated at the time they were generated, especially if you had stratified sampling.

If you are trying to add a stratified sample weight AFTER the data is collected you need to provide a population data set with all the strata and the appropriate number of records, select the sample to create and then match the weights back to your collected data.

But that still requires a POPULATION data set with all of the Strata variable combinations and the information used to create the sample.

 

Question
Fluorite | Level 6

Hi,

 

Sorry I didn't explain properly.

So in my case the weight variable represent the number of people in population. The weight is already there in my data, I am not recreating it ...basically I would like to select people base on the weight variable which represent the number of people in population (let's say pop of US).

 

I would like to select M=Sex and Age=12 and sum of weight  <=`183

and Sex=F and Age=12 and sum of weight <= 162. Not just using sampsize=2...I am not interested in the actual number of people in the survey, but the number of US population as specified...sorry not easy to explain but I can have a private chat if you wish. Thank you

 

Sex Age Weight
M 12 183
F 13 162

 

Name Sex Age Height Weight
Thomas M 11 57.5 85
James M 12 57.3 83
John M 12 59 100
Robert M 12 64.8 128
Jeffrey M 13 62.5 84
Alfred M 14 69 113
Henry M 14 63.5 103
Ronald M 15 67 133
William M 15 66.5 112
Philip M 16 72 150
Joyce F 11 51.3 51
Jane F 12 59.8 85
Louise F 12 56.3 77
Alice F 13 56.5 84
Barbara F 13 65.3 98
Carol F 14 62.8 103
Judy F 14 64.3 90
Janet F 15 62.5 113
Mary F 15 66.5 112

 

 

FreelanceReinh
Jade | Level 19

Hi @Question,

 

Do you want to allocate a given sample size (of participants) such as 24 to the strata so that the proportions of the strata in the sample match (approximately) the corresponding proportions in the total population (N=31,707)? If so, the ALLOC= option of the STRATA statement might be useful.

 

Here's an example using (for simplicity) only Age, not gender, to define the strata:

/* Create test data and format for demonstration */

data participants;
call streaminit(27182818);
do id=1 to 1000;
  Age=rand('uniform',18,99);
  output;
end;
run;

proc format;
value agefmt
low-<50  = 'Under 50'
50 -<60  = '50 To 60'
60 -<65  = '60 To 65'
65 -<70  = '65 To 70'
70 -<75  = '70 To 75'
75 -<80  = '75 To 80'
80 -high = '>=80';
run;

%let n=24; /* Sample size */

/* Create dataset with population weights per age group
   (Note: Age values are arbitrary representatives of the respective age group.) */

data weights;
input age weight; 
cards;
40 3093
50 9537
60 1418
65 4361
70 7217
75 0
80 6080
;

/* Create dataset assigning proportions to age groups */

proc freq data=weights noprint;
weight weight;
format age agefmt.;
tables age / out=proportions(drop=count rename=(percent=_alloc_));
run;

/* Select eligible participants and sort by stratum variable */

proc sql;
create table eligible as
select p.*
from participants p, weights w
where put(p.age,agefmt.)=put(w.age,agefmt.) & w.weight
order by age;
quit;

/* Draw random sample using stratum allocation proportions */

proc surveyselect data=eligible
method=srs n=&n
seed=2718 out=want;
format age agefmt.;
strata age / alloc=proportions;
run;

/* Check frequencies */

proc report data=want completerows headline;
column age n;
define age / group preloadfmt order=internal;
define n / 'Number of participants' width=12;
rbreak after / ol summarize;
run;

PROC REPORT output:

               Number of
       Age  participants
  ----------------------
  Under 50             2
  50 To 60             7
  60 To 65             1
  65 To 70             3
  70 To 75             6
  75 To 80             0
  >=80                 5
            ------------
                      24
Question
Fluorite | Level 6

Hi Reinhard,

 

Thank you for your code...

 

It gives me what I need after tweaking it a bit...basically my n=24 is actually n=31,707. My target is 31,707(The population, not the survey participants number). I know not easy to explain but your code has helped me to get roughly what I want 🙂

 

Best wishes

 

 

 

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 4931 views
  • 0 likes
  • 3 in conversation