I'm trying to divide a large dataset into smaller 3 unequal sized groups. I used the following code:
PROC SURVEYSELECT data = Original out = sample9 method = SRS seed = 12345678 sampsize = (1000 52175 13044); strata ID notsorted; *title; RUN;
However, I'm getting the following error:
ERROR: The sample size, 1000, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
IID=TV20_2018.
ERROR: The sample size, 52175, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
IID=TV20_2018.
ERROR: The sample size, 13044, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
IID=TV20_2018.
ERROR: The number of values in the SAMPSIZE= list must equal the number of strata. There are more strata than SAMPSIZE=
values.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.SAMPLE9 may be incomplete. When this step was stopped there were 0 observations and 75 variables.
Can you please suggest, how I can address this? TIA
Considering the way the errors are reading I am wondering if your data is GROUPED properly. With the notsorted option all of the like values should be adjacent in the data set otherwise each time a value repeated value appears it is a "new" strata. The clue is repeated mention of the same stratum value:
IID=TV20_2018. (appears you code posted uses a different variable than then when the LOG was created)
I think this code should demonstrate the grouping issue (unless you have some how sorted the SASHELP.CLASS data set by age):
proc surveyselect data=sashelp.class
out=work.sel
sampsize = (2 3 2 2 2 1)
;
strata age notsorted;
run;
There are 6 different ages in the data with counts of 2,5,3,4,4 and 1 for ages 11 to 16 set but the data is sorted by name by default and the age groups are not adjacent so you get a log similar to yours:
ERROR: The sample size, 2, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
Age=14.
ERROR: The sample size, 3, is greater than the number of sampling units, 2.
NOTE: The above message was for the following stratum:
Age=13.
NOTE: The sample size equals the number of sampling units. All units are included in the sample.
NOTE: The above message was for the following stratum:
Age=14.
NOTE: The sample size equals the number of sampling units. All units are included in the sample.
NOTE: The above message was for the following stratum:
Age=12.
ERROR: The sample size, 2, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
Age=15.
NOTE: The sample size equals the number of sampling units. All units are included in the sample.
NOTE: The above message was for the following stratum:
Age=13.
ERROR: The number of values in the SAMPSIZE= list must equal the number of strata. There are more
strata than SAMPSIZE= values.
So either sort your data by the stratum variable (probably best). or re-examine the order of the values and provide matching number of strata definitions with appropriate sizes for each of the existing strata.
If there is a serious reason that your source data set Original should not be sorted then sort it and create a different set to do the selection from and use it survey select.
Proc sort data=original out=toselectfrom; by id; run;
Not sure that i understood your starting position correctly. You want to divide a dataset creating three new datasets using the variable ID and the variable has only three different values. Correct? Sounds strange.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.