SAS Programming

TPayne · Posted 12-17-2019 10:24 AM

Hello,

i have a dataset where regions are listed under the "Name" column, their mean under the "Value" column and their lower limit and upper limit of their confidence interval. I want to take the lower and upper limits of a section of the "Name" column and compare it to another section in the same column to determine statistical significance. The only way I can think of involves way more if/then statements than i'd like to do- I am sure there is a more efficient way. Below is my dataset, where I would like to create a variable "SIGNIFICANCE" that is either 1 if the value of REG_FLU_COMPLETE_MEAN is statistically significantly different than the value of STATE_FLU_COMPLETE_MEAN etc.....

Name	Value	Lower_ Limit	Upper_Limit
REG_FLU_COMPLETE_Mean	0.834782609	0.76587853	0.903686689
REG_HIB_COMPLETE_Mean	0.765217391	0.68657518	0.843859607
REG_RTV_COMPLETE_Mean	0.765217391	0.68657518	0.843859607
REG_DTP_COMPLETE_Mean	0.87826087	0.81759321	0.938928527
REG_OPV_COMPLETE_Mean	0.869565217	0.80707988	0.932050559
REG_HBV_COMPLETE_Mean	0.886956522	0.82820701	0.945706036
REG_VAX_COMPLETE_Mean	0.826086957	0.75576202	0.896411889
STATE_FLU_COMPLETE_Mean	0.848092643	0.82971023	0.866475057
STATE_HIB_COMPLETE_Mean	0.801089918	0.78064617	0.821533667
STATE_RTV_COMPLETE_Mean	0.773160763	0.75171283	0.794608698
STATE_DTP_COMPLETE_Mean	0.801089918	0.78064617	0.821533667
STATE_OPV_COMPLETE_Mean	0.910081744	0.89543114	0.924732351
STATE_HBV_COMPLETE_Mean	0.914168937	0.89982306	0.92851481
STATE_VAX_COMPLETE_Mean	0.873297003	0.85626107	0.890332936

Thank you!!!

PaigeMiller · Posted 12-17-2019 12:40 PM

So you are actually trying to compare proportions (which can only vary from zero to 1) rather than means (which can vary from negative infinity to positive infinity).

It's always good to state the problem you are trying to solve, rather than the SAS output that you have achieved so far.

A solution (actually several solutions) is given here: https://blogs.sas.com/content/iml/2017/07/05/test-equality-two-proportions-sas.html

I would recommend that top section entitled "A chi-square test for association in SAS"

--
Paige Miller

View solution in original post

PaigeMiller · Posted 12-17-2019 10:29 AM

@TPayne wrote:

Hello,

i have a dataset where regions are listed under the "Name" column, their mean under the "Value" column and their lower limit and upper limit of their confidence interval. I want to take the lower and upper limits of a section of the "Name" column and compare it to another section in the same column to determine statistical significance. The only way I can think of involves way more if/then statements than i'd like to do- I am sure there is a more efficient way. Below is my dataset, where I would like to create a variable "SIGNIFICANCE" that is either 1 if the value of REG_FLU_COMPLETE_MEAN is statistically significantly different than the value of STATE_FLU_COMPLETE_MEAN etc.....

I don't think you can actually do this statistical test based upon the results you show. To see if the mean of REG_FLU_COMPLETE is statistically different than STATE_FLU_COMPLETE mean, we would need the value of N for each group (number of observations), and the standard deviation of the data in each group, plus the means (of course).

Maybe if you went back to the beginning and showed us a portion of the original data, rather than these summary statistics, we could figure out how to get what you want.

--
Paige Miller

TPayne · Posted 12-17-2019 12:03 PM

Hi Paige!

Thank you for pointing this out - I get these numbers by taking a dataset of boolean variables where each line is a record_id, and 1 is if that record is considered complete for a vaccine or not. I then break that dataset apart by region and run a proc summary for each region, then add that output as a variable to eachbroken apart dataset. I merge datasets back together, transpose, and end up with my originally posted dataset. I will do my best to explain it below:

I start with dataset STANDARD:

record_id	region	DTaP	OPV
1163	REG	1	1
1198	REG	1	1
1560	REG	0	0
1660	REG	1	1
1678	REG	0	0
1163	STATE	1	1
1198	STATE	1	1
1560	STATE	0	1
1660	STATE	1	1
1678	STATE	1	1

and I break it apart to be two datasets: REG and STATE, respectively (based off of region)

I then run a proc summary on each of these datasets to that looks like this:

proc summary data=reg

var DTaP OPV;

output out=CI MEAN= LCLM= UCLM= /AUTONAME:

run;

data reg2;

if _n_=1 then set CI(drop= _TYPE_ _FREQ_);

set reg;

run;

proc summary data=state

var DTaP OPV;

output out=CI MEAN= LCLM= UCLM= /AUTONAME:

run;

data state2;

if _n_=1 then set CI(drop= _TYPE_ _FREQ_);

set state;

run;

data confidenceintervals;

set reg2 state2;

run;

this creates a final dataset that looks like this:

record_id	region	DTaP_Mean	DTaP_UCLM	DTaP_LCLM	OPV_Mean	OPV_UCLM	OPV_LCLM
1163	REG	0.76521739	0.84385961	0.68657518	0.86956522	0.93205056	0.8070799
1198	REG	0.76521739	0.84385961	0.68657518	0.86956522	0.93205056	0.8070799
1560	REG	0.76521739	0.84385961	0.68657518	0.86956522	0.93205056	0.8070799
1660	REG	0.76521739	0.84385961	0.68657518	0.86956522	0.93205056	0.8070799
1678	REG	0.76521739	0.84385961	0.68657518	0.86956522	0.93205056	0.8070799
1163	STATE	0.80108992	0.82153367	0.78064617	0.91008174	0.92473235	0.8954311
1198	STATE	0.80108992	0.82153367	0.78064617	0.91008174	0.92473235	0.8954311
1560	STATE	0.80108992	0.82153367	0.78064617	0.91008174	0.92473235	0.8954311
1660	STATE	0.80108992	0.82153367	0.78064617	0.91008174	0.92473235	0.8954311
2202	STATE	0.80108992	0.82153367	0.78064617	0.91008174	0.92473235	0.8954311

through a series of transpositions and a few other manipulations, I end up with a dataset that is formatted for the application I ultimately have to upload the information in to. this is the dataset I was referring to earlier. I was thinking I would want to wait until this step to look at statistical significance because I am only curious about results at the regional level, not if a record_id was different than state results or not

Name	Value	Lower_ Limit	Upper_Limit
REG_DTP_COMPLETE_Mean	0.87826087	0.81759321	0.938928527
REG_OPV_COMPLETE_Mean	0.869565217	0.80707988	0.932050559
STATE_DTP_COMPLETE_Mean	0.801089918	0.78064617	0.821533667
STATE_OPV_COMPLETE_Mean	0.910081744	0.89543114	0.924732351

PaigeMiller · Posted 12-17-2019 12:40 PM

So you are actually trying to compare proportions (which can only vary from zero to 1) rather than means (which can vary from negative infinity to positive infinity).

It's always good to state the problem you are trying to solve, rather than the SAS output that you have achieved so far.

A solution (actually several solutions) is given here: https://blogs.sas.com/content/iml/2017/07/05/test-equality-two-proportions-sas.html

I would recommend that top section entitled "A chi-square test for association in SAS"

--
Paige Miller

TPayne · Posted 12-17-2019 01:52 PM

Hi Paige,

Thanks for sharing this resource- I am unable to perform this test with the way my data is currently formatted but can manipulate it. Additionally, Is there a way to print the output of the results as a variable in my final dataset? I ultimately will need to perform this many times.

PaigeMiller · Posted 12-17-2019 02:01 PM

You can perform this test as many times as is needed using PROC FREQ, for example if you had 22 different measures, or you could also just type all the variable names of the different measures:`

tables Group1*(Response1-Response22) / chisq;

In order to save the results in a SAS data set, you could use as the line of code before PROC FREQ

ods output chisq=chisq;

--
Paige Miller

ballardw · Posted 12-17-2019 06:10 PM

@TPayne wrote:

Hi Paige,

Thanks for sharing this resource- I am unable to perform this test with the way my data is currently formatted but can manipulate it. Additionally, Is there a way to print the output of the results as a variable in my final dataset? I ultimately will need to perform this many times.

Your "Standard" data set is set up to do Chisq testing if you are comparing region to state for each of your antigens:

data have;
input record_id $ region $ DTaP OPV ;
datalines;
1163 REG  1 1 
1198 REG  1 1 
1560 REG  0 0 
1660 REG  1 1 
1678 REG  0 0 
1163 STATE 1 1 
1198 STATE 1 1 
1560 STATE 0 1 
1660 STATE 1 1 
1678 STATE 1 1 
;

proc freq data=have;
   tables region *(dtap opv)/chisq;
run;

The example you provided is too small but the code is generally what you want.

If you need the statistics in data sets then look into the ODS Output options.

Or for not very pretty output;

proc freq data=have noprint;
   tables region *(dtap opv)/chisq;
   output out=work.statistics chisq;
run;

bstarr · Posted 12-17-2019 11:13 AM

Will you always be comparing FLU to FLU and HIB to HIB (e.g. REG_FLU to STATE_FLU as opposed to, say REG_FLU to REG_HIB or STATE_HIB)? If so, if I'm understanding what you want to do, one way might be to merge on the substring of Name [by doing something like: substr(name,index(name,"_")+1,length(name))], renaming Lower_limit and upper_limit for at least one of them, then compare the confidence bands with a single if/then: if Lower_limit_reg > upper_limit_state or lower_limit_state > upper limit_reg then SIGNIFICANCE = 1; Assuming your upper and lower limits are the 95% confidence intervals, of course.

PaigeMiller · Posted 12-17-2019 11:18 AM

I don't think seeing if the confidence intervals overlap is the proper way to compare means. The proper way to compare means uses the pooled variance, and the sample sizes in each group, as well as the means.

--
Paige Miller

bstarr · Posted 12-17-2019 11:28 AM

Ah right, good point.

SAS Programming

calculate statistical significance and turn it in to a variable

Re: calculate statistical significance and turn it in to a variable

Re: calculate statistical significance and turn it in to a variable

Re: calculate statistical significance and turn it in to a variable

Re: calculate statistical significance and turn it in to a variable

Re: calculate statistical significance and turn it in to a variable

Re: calculate statistical significance and turn it in to a variable

Re: calculate statistical significance and turn it in to a variable

Re: calculate statistical significance and turn it in to a variable

Re: calculate statistical significance and turn it in to a variable

Re: calculate statistical significance and turn it in to a variable

calculate statistics for each numeric var

Significant Figures

Comparing means for statistical significance

Conversing with Data: Turning Queries into Conversations with SAS Viya...

Significant digit

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...