Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- monte carlo for contingency tables for a large data set

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 02-01-2023 02:00 PM
(526 views)

I have this code to run a monte carlo for contingency tables for a large data set.

```
proc freq data=remove_CF;
table Rows*Cols / norow nocol nopct;
exact pchi / MC N=1000; /* set sample size, default is 10k */
run;
```

It evaluates the chi-square statistic on each table for 1000 random tables that have the same row and column sum as the observed table. But I do not want them to have the same row and column sum as the observed table. I want to make 1000 random tables with 1000 random samples from the original data set. Is there a parameter that I can add to this code that would allow me to do that?

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

OK. You didn't reference the blog post, so I didn't know if you had seen it.

The subsamples are not going to be good for much. For example, if you have rare events in the big data, you might have subsamples for which the rare events never appear. Nevertheless, here's how you get the samples that you want: use the previous program that I posted for the Bootstrap, but use N=*number *instead of SAMPRATE=1 on the PROC SURVEYSELECT statement.

```
/* The basic bootstrap in SAS.
https://blogs.sas.com/content/iml/2016/08/10/bootstrap-confidence-interval-sas.html
*/
data Have(keep=x y); /* data for crosstab */
set Sashelp.Cars(rename=(Origin=x Type=y));
if y^='Hybrid';
run;
proc freq data=Have;
tables x*y / chisq norow nocol nopercent;
run;
%let NumSamples = 1000; /* number of resamples */
%let NSubset = 100; /* size of each resample */
/* 2. Generate many bootstrap samples */
proc surveyselect data=Have NOPRINT seed=1
out=BootSamp(rename=(Replicate=SampleID))
method=urs /* resample with replacement */
N=&NSubset /* each bootstrap sample has N observations */
OUTHITS
reps=&NumSamples; /* generate NumSamples bootstrap resamples */
run;
/* 3. Compute the statistic for each bootstrap sample */
ods select none;
proc freq data=BootSamp;
by SampleID;
table x*y / chisq out=FreqOut;
ods output ChiSq=ChiSq(where=(Statistic='Chi-Square'));
run;
ods select all;
/* distribution of chi-square statistics */
proc means data=ChiSq P5 Mean P95;
var Value;
label Value='Chi-Square';
run;
```

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

For a discussion of exact tests and simulating contingency tables with fixed marginal frequencies, see https://blogs.sas.com/content/iml/2015/10/28/simulation-exact-tables.html

I am unclear how you want to generate random samples. Let's say you have N observations in your big data set. Do you want each of the 1000 random samples to also have N observations (a bootstrap sample), or do you want a smaller number of observations in each random sample? if so, what size is the sample?

And what statistic are you trying to get for these samples? If you use bootstrap samples, you can get the sampling distribution of the chi-square statistic (and a confidence interval). Is that what you want?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Rick_SASthanks for the response. I got the code from that blog post, but it says "generates 10,000 random tables that have the same row and column sum as the observed table." But I want 10,000 random tables with 10,000 random samples each (or whatever number I select). Entering the crosstabs in the proc iml code is not feasible considering the data set.

The data set has 900k observations looking for associations between binary variables. There are 35 total variable combinations that I am looking at, using chi sq test. As expected, running chi sq on all 900k observations will yield significance due to sample size. I am planning to subsample using monte carlo, looking at confidence intervals to determine the number and size of the random tables, and Cramer's V to look at effect size.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

OK. You didn't reference the blog post, so I didn't know if you had seen it.

The subsamples are not going to be good for much. For example, if you have rare events in the big data, you might have subsamples for which the rare events never appear. Nevertheless, here's how you get the samples that you want: use the previous program that I posted for the Bootstrap, but use N=*number *instead of SAMPRATE=1 on the PROC SURVEYSELECT statement.

```
/* The basic bootstrap in SAS.
https://blogs.sas.com/content/iml/2016/08/10/bootstrap-confidence-interval-sas.html
*/
data Have(keep=x y); /* data for crosstab */
set Sashelp.Cars(rename=(Origin=x Type=y));
if y^='Hybrid';
run;
proc freq data=Have;
tables x*y / chisq norow nocol nopercent;
run;
%let NumSamples = 1000; /* number of resamples */
%let NSubset = 100; /* size of each resample */
/* 2. Generate many bootstrap samples */
proc surveyselect data=Have NOPRINT seed=1
out=BootSamp(rename=(Replicate=SampleID))
method=urs /* resample with replacement */
N=&NSubset /* each bootstrap sample has N observations */
OUTHITS
reps=&NumSamples; /* generate NumSamples bootstrap resamples */
run;
/* 3. Compute the statistic for each bootstrap sample */
ods select none;
proc freq data=BootSamp;
by SampleID;
table x*y / chisq out=FreqOut;
ods output ChiSq=ChiSq(where=(Statistic='Chi-Square'));
run;
ods select all;
/* distribution of chi-square statistics */
proc means data=ChiSq P5 Mean P95;
var Value;
label Value='Chi-Square';
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you want the basic bootstrap, you study the following program:

```
/* The basic bootstrap in SAS.
https://blogs.sas.com/content/iml/2016/08/10/bootstrap-confidence-interval-sas.html
*/
data Have(keep=x y); /* data for crosstab */
set Sashelp.Cars(rename=(Origin=x Type=y));
if y^='Hybrid';
run;
proc freq data=Have;
tables x*y / chisq norow nocol nopercent;
run;
%let NumSamples = 1000; /* number of bootstrap resamples */
/* 2. Generate many bootstrap samples */
proc surveyselect data=Have NOPRINT seed=1
out=BootSamp(rename=(Replicate=SampleID))
method=urs /* resample with replacement */
samprate=1 /* each bootstrap sample has N observations */
/* OUTHITS option to suppress the frequency var */
reps=&NumSamples; /* generate NumSamples bootstrap resamples */
run;
/* 3. Compute the statistic for each bootstrap sample */
ods select none;
proc freq data=BootSamp;
by SampleID;
weight NumberHits;
table x*y / chisq out=FreqOut;
ods output ChiSq=ChiSq(where=(Statistic='Chi-Square'));
run;
ods select all;
/* distribution of chi-square statistics */
proc means data=ChiSq P5 Mean P95;
var Value;
label Value='Chi-Square';
run;
title "Bootstrap Distribution of Chi-Square Statistic";
%let Est = 30.3003;
proc sgplot data=ChiSq;
label Value='Chi-Square';
histogram value;
/* Optional: draw reference line at observed value and draw 90% CI */
refline &Est / axis=x lineattrs=(color=red)
name="Est" legendlabel="Observed Statistic = &Est";
refline 23.2081132 56.0521988 / axis=x lineattrs=(color=blue)
name="CI" legendlabel="90% CI";
keylegend "Est" "CI";
run;
```

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.