The IRT Procedure

leex1514 · Posted 08-11-2021 07:10 PM

Hello all,

I have known parameter values (mean and std of population distribution) and have a particular year's data (data1) of the population. I am trying to use bootstrap sampling to draw a sample from the data 1 to meet the population parameter values of mean and std. The data 1 has pretty big size about n=60,000. Is it possible?

thank you

leex1514

PaigeMiller · Posted 08-12-2021 08:36 AM

Start here:

https://blogs.sas.com/content/iml/2018/12/12/essential-guide-bootstrapping-sas.html

This part doesn't sound like bootstrapping at all:

to draw a sample from the data 1 to meet the population parameter values of mean and std

and maybe you should explain further.

--
Paige Miller

Ksharp · Posted 08-12-2021 08:38 AM

Interesting question.

Calling @Rick_SAS

Rick_SAS · Posted 08-12-2021 09:21 AM

By "bootstrap sample" I assume you mean "sample with replacement" with size n.

In general, for small samples, you shouldn't expect to be able to get the exact values. For example, in a sample that has two observations with values 0 and 1, the only possible means are 0, 0.5, and 1.

However, there are n! different "bootstrap samples" in a sample of size n, so that's a lot of combinations. So, yes, you should be able to get reasonably close to the parameter values, assuming that the sample is representative of the population.

But before we talk about possible ways to make this happen, may I ask what you are trying to achieve and why? What is the purpose of manufacturing a new set of data that has exactly the same mean and SD as some parameters? The field of statistics was developed to analyze the data that you have and make inferences about the population parameters. Modifying the data is not required or recommended.

leex1514 · Posted 08-12-2021 10:03 AM

I am trying to use this sample to calibrate items (of students of certain ability distribution). The purpose is to compare item parameter estimates of my data1 and of matched sample to the population distribution of ability.

PaigeMiller · Posted 08-12-2021 10:32 AM

@leex1514 wrote:
I am trying to use this sample to calibrate items (of students of certain ability distribution). The purpose is to compare item parameter estimates of my data1 and of matched sample to the population distribution of ability.

Compare these two things to learn what? Why do you need a sample if you have the entire population?

--
Paige Miller

leex1514 · Posted 08-12-2021 10:58 AM

Items are different each year. Knowing population ability distribution does not solve estimation of items each year students take. It is item response theory and item calibration/linking.

PaigeMiller · Posted 08-12-2021 02:16 PM

Okay, I know nothing about Item Response Theory, but I do know that SAS has PROC IRT, does that help?

The IRT Procedure

Basic Features

The IRT procedure enables you to estimate various item response theory models. The following list summarizes some of the basic features of the IRT procedure:

uses the Rasch model; one-, two-, three-, and four-parameter models; graded response model with logistic or probit link; and generalized partial credit model
enables different items to have different response models
performs multidimensional exploratory and confirmatory analysis
performs multiple-group analysis, with fixed values and equality constraints within and between groups
estimates factor scores by using maximum likelihood (ML), maximum a posteriori (MAP), and expected a posteriori (EAP) methods

--
Paige Miller

leex1514 · Posted 08-12-2021 04:17 PM

Thank you but I know the IRT. I just need to have a calibration sample with same with population ability distribution so that when a sample is off on the ability distribution (data 1) how that would impact on item parameter estimation vs when a sample is on the ability distribution( population mean and std).

SteveDenham · Posted 08-13-2021 10:57 AM

This sounds something like a simulation of a dataset with known mean and sd, but with an unknown distribution If that is the case, then you can rely on the central limit theorem, and do something like:

data sample(keep=X);
	call streaminit(123);
	do j=1 to 60000;		
			X = rand('Normal', known_mean, known_sd);
			output;		
	end;
run;

where known_mean and known_sd are the parameter values you have.

I would also recommend looking at @Rick_SAS 's blog and especially this paper, should you decide on a simulation approach:

https://support.sas.com/resources/papers/proceedings15/SAS1387-2015.pdf

If your data1 looks like a mixture of distributions or something unusual, this paper gives you some approaches.

SteveDenham

leex1514 · Posted 08-13-2021 06:19 PM

It is not simulation. I need to use my data to draw a sample because there are item response variables associated with each row(person).

SteveDenham · Posted 08-16-2021 08:19 AM

Then the means and sd's are for each iterm?

SteveDenham

leex1514 · Posted 08-16-2021 04:24 PM

The mean and std for student performances not of each item.

Resample so that sample statistics match the parameter values

Re: parametric bootstrap sampling

Re: parametric bootstrap sampling

Re: parametric bootstrap sampling

Re: parametric bootstrap sampling

Re: parametric bootstrap sampling

Re: parametric bootstrap sampling

Re: parametric bootstrap sampling

The IRT Procedure

Basic Features

Re: parametric bootstrap sampling

Re: parametric bootstrap sampling

Re: parametric bootstrap sampling

Re: parametric bootstrap sampling

Re: parametric bootstrap sampling

The IRT Procedure

Basic Features

SAS Innovate 2025: Call for Content