BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TPayne
Fluorite | Level 6

Hello,

 

i have a dataset where regions are listed under the "Name" column, their mean under the "Value" column and their lower limit and upper limit of their confidence interval. I want to take the lower and upper limits of a section of the "Name" column and compare it to another section in the same column to determine statistical significance. The only way I can think of involves way more if/then statements than i'd like to do- I am sure there is a more efficient way. Below is my dataset, where I would like to create a variable "SIGNIFICANCE" that is either 1 if the value of REG_FLU_COMPLETE_MEAN is statistically significantly different than the value of STATE_FLU_COMPLETE_MEAN etc.....

 

Name Value Lower_ Limit Upper_Limit
REG_FLU_COMPLETE_Mean 0.834782609 0.76587853 0.903686689
REG_HIB_COMPLETE_Mean 0.765217391 0.68657518 0.843859607
REG_RTV_COMPLETE_Mean 0.765217391 0.68657518 0.843859607
REG_DTP_COMPLETE_Mean 0.87826087 0.81759321 0.938928527
REG_OPV_COMPLETE_Mean 0.869565217 0.80707988 0.932050559
REG_HBV_COMPLETE_Mean 0.886956522 0.82820701 0.945706036
REG_VAX_COMPLETE_Mean 0.826086957 0.75576202 0.896411889
STATE_FLU_COMPLETE_Mean 0.848092643 0.82971023 0.866475057
STATE_HIB_COMPLETE_Mean 0.801089918 0.78064617 0.821533667
STATE_RTV_COMPLETE_Mean 0.773160763 0.75171283 0.794608698
STATE_DTP_COMPLETE_Mean 0.801089918 0.78064617 0.821533667
STATE_OPV_COMPLETE_Mean 0.910081744 0.89543114 0.924732351
STATE_HBV_COMPLETE_Mean 0.914168937 0.89982306 0.92851481
STATE_VAX_COMPLETE_Mean 0.873297003 0.85626107 0.890332936

 

Thank you!!!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

So you are actually trying to compare proportions (which can only vary from zero to 1) rather than means (which can vary from negative infinity to positive infinity).

 

It's always good to state the problem you are trying to solve, rather than the SAS output that you have achieved so far.

 

A solution (actually several solutions) is given here: https://blogs.sas.com/content/iml/2017/07/05/test-equality-two-proportions-sas.html

I would recommend that top section entitled "A chi-square test for association in SAS"

--
Paige Miller

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

@TPayne wrote:

Hello,

 

i have a dataset where regions are listed under the "Name" column, their mean under the "Value" column and their lower limit and upper limit of their confidence interval. I want to take the lower and upper limits of a section of the "Name" column and compare it to another section in the same column to determine statistical significance. The only way I can think of involves way more if/then statements than i'd like to do- I am sure there is a more efficient way. Below is my dataset, where I would like to create a variable "SIGNIFICANCE" that is either 1 if the value of REG_FLU_COMPLETE_MEAN is statistically significantly different than the value of STATE_FLU_COMPLETE_MEAN etc.....

 


I don't think you can actually do this statistical test based upon the results you show. To see if the mean of REG_FLU_COMPLETE is statistically different than STATE_FLU_COMPLETE mean, we would need the value of N for each group (number of observations), and the standard deviation of the data in each group, plus the means (of course).

 

Maybe if you went back to the beginning and showed us a portion of the original data, rather than these summary statistics, we could figure out how to get what you want.

--
Paige Miller
TPayne
Fluorite | Level 6

Hi Paige!

 

Thank you for pointing this out - I get these numbers by taking a dataset of boolean variables where each line is a record_id, and 1 is if that record is considered complete for a vaccine or not. I then break that dataset apart by region and run a proc summary for each region, then add that output as a variable to eachbroken apart dataset. I merge datasets back together, transpose, and end up with my originally posted dataset. I will do my best to explain it below:

 

I start with dataset STANDARD:

 

record_id region DTaP OPV
1163 REG  1 1
1198 REG  1 1
1560 REG  0 0
1660 REG  1 1
1678 REG  0 0
1163 STATE 1 1
1198 STATE 1 1
1560 STATE 0 1
1660 STATE 1 1
1678 STATE 1 1

 

and I break it apart to be two datasets: REG and STATE, respectively (based off of region)

 

I then run a proc summary on each of these datasets to that looks like this:

 

proc summary data=reg

var DTaP OPV;

output out=CI MEAN= LCLM= UCLM= /AUTONAME:

run;

 

data reg2;

if _n_=1 then set CI(drop= _TYPE_ _FREQ_);

set reg;

run;

 

proc summary data=state

var DTaP OPV;

output out=CI MEAN= LCLM= UCLM= /AUTONAME:

run;

 

data state2;

if _n_=1 then set CI(drop= _TYPE_ _FREQ_);

set state;

run;

 

data confidenceintervals;

set reg2 state2;

run;

 

this creates a final dataset that looks like this: 

record_id  region DTaP_Mean DTaP_UCLM DTaP_LCLM OPV_Mean OPV_UCLM OPV_LCLM
1163 REG  0.76521739 0.84385961 0.68657518 0.86956522 0.93205056 0.8070799
1198 REG  0.76521739 0.84385961 0.68657518 0.86956522 0.93205056 0.8070799
1560 REG  0.76521739 0.84385961 0.68657518 0.86956522 0.93205056 0.8070799
1660 REG  0.76521739 0.84385961 0.68657518 0.86956522 0.93205056 0.8070799
1678 REG  0.76521739 0.84385961 0.68657518 0.86956522 0.93205056 0.8070799
1163 STATE 0.80108992 0.82153367 0.78064617 0.91008174 0.92473235 0.8954311
1198 STATE 0.80108992 0.82153367 0.78064617 0.91008174 0.92473235 0.8954311
1560 STATE 0.80108992 0.82153367 0.78064617 0.91008174 0.92473235 0.8954311
1660 STATE 0.80108992 0.82153367 0.78064617 0.91008174 0.92473235 0.8954311
2202 STATE 0.80108992 0.82153367 0.78064617 0.91008174 0.92473235

0.8954311

 

through a series of transpositions and a few other manipulations, I end up with a dataset that is formatted for the application I ultimately have to upload the information in to. this is the dataset I was referring to earlier. I was thinking I would want to wait until this step to look at statistical significance because I am only curious about results at the regional level, not if a record_id was different than state results or not

 

Name Value Lower_ Limit Upper_Limit
REG_DTP_COMPLETE_Mean 0.87826087 0.81759321 0.938928527
REG_OPV_COMPLETE_Mean 0.869565217 0.80707988 0.932050559
STATE_DTP_COMPLETE_Mean 0.801089918 0.78064617 0.821533667
STATE_OPV_COMPLETE_Mean 0.910081744 0.89543114 0.924732351
PaigeMiller
Diamond | Level 26

So you are actually trying to compare proportions (which can only vary from zero to 1) rather than means (which can vary from negative infinity to positive infinity).

 

It's always good to state the problem you are trying to solve, rather than the SAS output that you have achieved so far.

 

A solution (actually several solutions) is given here: https://blogs.sas.com/content/iml/2017/07/05/test-equality-two-proportions-sas.html

I would recommend that top section entitled "A chi-square test for association in SAS"

--
Paige Miller
TPayne
Fluorite | Level 6

Hi Paige,

 

Thanks for sharing this resource- I am unable to perform this test with the way my data is currently formatted but can manipulate it. Additionally, Is there a way to print the output of the results as a variable in my final dataset? I ultimately will need to perform this many times.

PaigeMiller
Diamond | Level 26

You can perform this test as many times as is needed using PROC FREQ, for example if you had 22 different measures, or you could also just type all the variable names of the different measures:`

 

tables Group1*(Response1-Response22) / chisq; 

 In order to save the results in a SAS data set, you could use as the line of code before PROC FREQ

 

ods output chisq=chisq;

 

 

--
Paige Miller
ballardw
Super User

@TPayne wrote:

Hi Paige,

 

Thanks for sharing this resource- I am unable to perform this test with the way my data is currently formatted but can manipulate it. Additionally, Is there a way to print the output of the results as a variable in my final dataset? I ultimately will need to perform this many times.


Your "Standard" data set is set up to do Chisq testing if you are comparing region to state for each of your antigens:

data have;
input record_id $ region $ DTaP OPV ;
datalines;
1163 REG  1 1 
1198 REG  1 1 
1560 REG  0 0 
1660 REG  1 1 
1678 REG  0 0 
1163 STATE 1 1 
1198 STATE 1 1 
1560 STATE 0 1 
1660 STATE 1 1 
1678 STATE 1 1 
;

proc freq data=have;
   tables region *(dtap opv)/chisq;
run;

The example you provided is too small but the code is generally what you want.

If you need the statistics in data sets then look into the ODS Output options.

Or for not very pretty output;

proc freq data=have noprint;
   tables region *(dtap opv)/chisq;
   output out=work.statistics chisq;
run;
bstarr
Quartz | Level 8

Will you always be comparing FLU to FLU and HIB to HIB (e.g. REG_FLU to STATE_FLU as opposed to, say REG_FLU to REG_HIB or STATE_HIB)? If so, if I'm understanding what you want to do, one way might be to merge on the substring of Name [by doing something like: substr(name,index(name,"_")+1,length(name))], renaming Lower_limit and upper_limit for at least one of them, then compare the confidence bands with a single if/then: if Lower_limit_reg > upper_limit_state or lower_limit_state > upper limit_reg then SIGNIFICANCE = 1; Assuming your upper and lower limits are the 95% confidence intervals, of course.

PaigeMiller
Diamond | Level 26

I don't think seeing if the confidence intervals overlap is the proper way to compare means. The proper way to compare means uses the pooled variance, and the sample sizes in each group, as well as the means.

--
Paige Miller

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1509 views
  • 0 likes
  • 4 in conversation