BookmarkSubscribeRSS Feed
mantubiradar19
Quartz | Level 8

I'm trying to divide a large dataset into smaller 3 unequal sized groups. I used the following code:

PROC SURVEYSELECT data = Original
		out = sample9
			method = SRS
			seed = 12345678
			sampsize = (1000 52175 13044);
	strata ID notsorted;
	*title;
RUN;

However, I'm getting the following error:

ERROR: The sample size, 1000, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
IID=TV20_2018.
ERROR: The sample size, 52175, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
IID=TV20_2018.
ERROR: The sample size, 13044, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
IID=TV20_2018.
ERROR: The number of values in the SAMPSIZE= list must equal the number of strata. There are more strata than SAMPSIZE=
values.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.SAMPLE9 may be incomplete. When this step was stopped there were 0 observations and 75 variables.

Can you please suggest, how I can address this? TIA

4 REPLIES 4
Ksharp
Super User
Add an option :

sampsize = (1000 52175 13044) SELECTALL ;
ballardw
Super User

Considering the way the errors are reading I am wondering if your data is GROUPED properly. With the notsorted option all of the like values should be adjacent in the data set otherwise each time a value repeated value appears it is a "new" strata. The clue is repeated mention of the same stratum value:

IID=TV20_2018. (appears you code posted uses a different variable than then when the LOG was created)

 

I think this code should demonstrate the grouping issue (unless you have some how sorted the SASHELP.CLASS data set by age):

proc surveyselect data=sashelp.class 
out=work.sel
sampsize = (2 3 2 2 2 1)
;
strata age notsorted;
run;

There are 6 different ages in the data with counts of 2,5,3,4,4 and 1 for ages 11 to 16 set but the data is sorted by name by default and the age groups are not adjacent so you get a log similar to yours:


ERROR: The sample size, 2, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
Age=14.
ERROR: The sample size, 3, is greater than the number of sampling units, 2.
NOTE: The above message was for the following stratum:
Age=13.
NOTE: The sample size equals the number of sampling units. All units are included in the sample.
NOTE: The above message was for the following stratum:
Age=14.
NOTE: The sample size equals the number of sampling units. All units are included in the sample.
NOTE: The above message was for the following stratum:
Age=12.
ERROR: The sample size, 2, is greater than the number of sampling units, 1.
NOTE: The above message was for the following stratum:
Age=15.
NOTE: The sample size equals the number of sampling units. All units are included in the sample.
NOTE: The above message was for the following stratum:
Age=13.
ERROR: The number of values in the SAMPSIZE= list must equal the number of strata. There are more
strata than SAMPSIZE= values.

So either sort your data by the stratum variable (probably best). or re-examine the order of the values and provide matching number of strata definitions with appropriate sizes for each of the existing strata.

 

If there is a serious reason that your source data set Original should not be sorted then sort it and create a different set to do the selection from and use it survey select.

 

Proc sort data=original out=toselectfrom;
   by id;
run;
Reeza
Super User
You're trying to pick 1000 samples from your first ID, 52175 from the second ID and 13044 from your third ID.
Is that what you're trying to do? I suspect your STRATA statement is wrong somehow.
andreas_lds
Jade | Level 19

Not sure that i understood your starting position correctly. You want to divide a dataset creating three new datasets using the variable ID and the variable has only three different values. Correct? Sounds strange.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1395 views
  • 2 likes
  • 5 in conversation