BookmarkSubscribeRSS Feed
leex1514
Calcite | Level 5

Hello all,

 I have known parameter values (mean and std of population distribution) and have a particular year's data (data1) of the population. I am trying to use bootstrap sampling to draw a sample from the data 1 to meet the population parameter values of mean and std. The data 1 has pretty big size about n=60,000. Is it possible?

thank you

leex1514

12 REPLIES 12
PaigeMiller
Diamond | Level 26

Start here:

https://blogs.sas.com/content/iml/2018/12/12/essential-guide-bootstrapping-sas.html

 

This part doesn't sound like bootstrapping at all: 

to draw a sample from the data 1 to meet the population parameter values of mean and std

and maybe you should explain further.

--
Paige Miller
Ksharp
Super User

Interesting question. 

Calling @Rick_SAS 

Rick_SAS
SAS Super FREQ

By "bootstrap sample" I assume you mean "sample with replacement" with size n.

 

In general, for small samples, you shouldn't expect to be able to get the exact values. For example, in a sample that has two observations with values 0 and 1, the only possible means are 0, 0.5, and 1.

 

However, there are n! different "bootstrap samples" in a sample of size n, so that's a lot of combinations. So, yes, you should be able to get reasonably close to the parameter values, assuming that the sample is representative of the population. 

 

But before we talk about possible ways to make this happen, may I ask what you are trying to achieve and why? What is the purpose of manufacturing a new set of data that has exactly the same mean and SD as some parameters? The field of statistics was developed to analyze the data that you have and make inferences about the population parameters. Modifying the data is not required or recommended.

 

leex1514
Calcite | Level 5
I am trying to use this sample to calibrate items (of students of certain ability distribution). The purpose is to compare item parameter estimates of my data1 and of matched sample to the population distribution of ability.
PaigeMiller
Diamond | Level 26

@leex1514 wrote:
I am trying to use this sample to calibrate items (of students of certain ability distribution). The purpose is to compare item parameter estimates of my data1 and of matched sample to the population distribution of ability.

Compare these two things to learn what? Why do you need a sample if you have the entire population?

--
Paige Miller
leex1514
Calcite | Level 5
Items are different each year. Knowing population ability distribution does not solve estimation of items each year students take. It is item response theory and item calibration/linking.
PaigeMiller
Diamond | Level 26

Okay, I know nothing about Item Response Theory, but I do know that SAS has PROC IRT, does that help?

 

The IRT Procedure

Basic Features

The IRT procedure enables you to estimate various item response theory models. The following list summarizes some of the basic features of the IRT procedure:

  • uses the Rasch model; one-, two-, three-, and four-parameter models; graded response model with logistic or probit link; and generalized partial credit model

  • enables different items to have different response models

  • performs multidimensional exploratory and confirmatory analysis

  • performs multiple-group analysis, with fixed values and equality constraints within and between groups

  • estimates factor scores by using maximum likelihood (ML), maximum a posteriori (MAP), and expected a posteriori (EAP) methods

--
Paige Miller
leex1514
Calcite | Level 5
Thank you but I know the IRT. I just need to have a calibration sample with same with population ability distribution so that when a sample is off on the ability distribution (data 1) how that would impact on item parameter estimation vs when a sample is on the ability distribution( population mean and std).
SteveDenham
Jade | Level 19

This sounds something like a simulation of a dataset with known mean and sd, but with an unknown distribution  If that is the case, then you can rely on the central limit theorem, and do something like:

 

data sample(keep=X);
	call streaminit(123);
	do j=1 to 60000;		
			X = rand('Normal', known_mean, known_sd);
			output;		
	end;
run;

where known_mean and known_sd are the parameter values you have.

 

I would also recommend looking at @Rick_SAS 's blog and especially this paper, should you decide on a simulation approach:

https://support.sas.com/resources/papers/proceedings15/SAS1387-2015.pdf 

 

If your data1 looks like a mixture of distributions or something unusual, this paper gives you some approaches.

 

SteveDenham

 

leex1514
Calcite | Level 5
It is not simulation. I need to use my data to draw a sample because there are item response variables associated with each row(person).
SteveDenham
Jade | Level 19

Then the means and sd's are for each iterm?

 

SteveDenham

leex1514
Calcite | Level 5
The mean and std for student performances not of each item.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 874 views
  • 0 likes
  • 5 in conversation