I'm trying to divide a large dataset into smaller 3 unequal sized groups. I used the following code:
PROC SURVEYSELECT data = Original out = sample9 method = SRS seed = 12345678 sampsize = (1000 52175 13044); strata ID notsorted; *title; RUN;
However, I'm getting the following error:
ERROR: The sample size, 1000, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
IID=TV20_2018.
ERROR: The sample size, 52175, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
IID=TV20_2018.
ERROR: The sample size, 13044, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
IID=TV20_2018.
ERROR: The number of values in the SAMPSIZE= list must equal the number of strata. There are more strata than SAMPSIZE=
values.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.SAMPLE9 may be incomplete. When this step was stopped there were 0 observations and 75 variables.
Can you please suggest, how I can address this? TIA
Considering the way the errors are reading I am wondering if your data is GROUPED properly. With the notsorted option all of the like values should be adjacent in the data set otherwise each time a value repeated value appears it is a "new" strata. The clue is repeated mention of the same stratum value:
IID=TV20_2018. (appears you code posted uses a different variable than then when the LOG was created)
I think this code should demonstrate the grouping issue (unless you have some how sorted the SASHELP.CLASS data set by age):
proc surveyselect data=sashelp.class
out=work.sel
sampsize = (2 3 2 2 2 1)
;
strata age notsorted;
run;
There are 6 different ages in the data with counts of 2,5,3,4,4 and 1 for ages 11 to 16 set but the data is sorted by name by default and the age groups are not adjacent so you get a log similar to yours:
ERROR: The sample size, 2, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
Age=14.
ERROR: The sample size, 3, is greater than the number of sampling units, 2.
NOTE: The above message was for the following stratum:
Age=13.
NOTE: The sample size equals the number of sampling units. All units are included in the sample.
NOTE: The above message was for the following stratum:
Age=14.
NOTE: The sample size equals the number of sampling units. All units are included in the sample.
NOTE: The above message was for the following stratum:
Age=12.
ERROR: The sample size, 2, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
Age=15.
NOTE: The sample size equals the number of sampling units. All units are included in the sample.
NOTE: The above message was for the following stratum:
Age=13.
ERROR: The number of values in the SAMPSIZE= list must equal the number of strata. There are more
strata than SAMPSIZE= values.
So either sort your data by the stratum variable (probably best). or re-examine the order of the values and provide matching number of strata definitions with appropriate sizes for each of the existing strata.
If there is a serious reason that your source data set Original should not be sorted then sort it and create a different set to do the selection from and use it survey select.
Proc sort data=original out=toselectfrom; by id; run;
Not sure that i understood your starting position correctly. You want to divide a dataset creating three new datasets using the variable ID and the variable has only three different values. Correct? Sounds strange.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.