Re: Adding a P-Value Column to a Dataset

Mgarret · Posted 05-05-2014 06:13 PM

Hi--

I have a dataset with Test and Control response scores for an email test. I want to generate a column which gives the P-Value for the difference between the Control and Test proportions. I normally calculate P-Values using Proc Freq with the chisq option but that only works for one observation at a time.

This is what I am looking for (with the P-Value column filled in of course):

Email_Type	Control	Test	P-Value
Spotlight Email	0.1838	0.1924
Movers	0.4559	0.3519
Event	0.1806	0.1820
Customer	0.2216	0.2118
Promo Roll	0.2813	0.3214
Contract Renewal	0.1599	0.1653
HMC Email	0.1531	0.1599
Speed	0.1805	0.1960
Acquisition	0.0984	0.0781
Rewards	0.1781	0.1885
Redbox	0.2479	0.2527
Onboarding	0.1311	0.1401

Reeza · Posted 05-05-2014 06:29 PM

You need more than that to calculate a p-value - the N or count specifically.

Once you have that the Proc T-Test has a way to calculate the test based on pre-calculated values, see the examples.

More than likely you'll have to merge the results in though.

Mgarret · Posted 05-05-2014 06:33 PM

Thanks Rezza. Do you happen to have a link to good examples of proc ttest?

Reeza · Posted 05-05-2014 06:40 PM

I generally refer to the documentation.

SAS/STAT(R) 9.2 User's Guide, Second Edition

If you have proportion data you may want a different test, ie proc freq though. But then you have the same problem of getting all results in one table. I have a macro that does it that I can link to if you're interested.

Mgarret · Posted 05-06-2014 09:28 AM

Hi Reeza. That would be great if you could send me your link to the Macro. Thanks!!!!

RW9 · Posted 05-06-2014 09:33 AM

Could you not just wrap your code up in a call execute block, e.g (very pseudo):

proc sql;
create table WANT (EMAIL_TYPE char(200),CONTROL num,TEST num,P-VALUE num);
quit;

data _null_;
do i="Spotlight Email","Movers","Event"...;
    call execute('proc ttest...; where email='||strip(i)||';...;');
    call execute('proc sql;
                    insert into WANT
                    select "'||strip(i)||'",
                            RESULT
                    from    OUTPUT_FROM_TTEST;
                  quit;');
end;
run;

Note that if you don't have a fixed list (from the do part) then you can sql distinct list and then set that dataset in the data null.

Mgarret · Posted 05-06-2014 10:18 AM

Hi RW9 thanks I actually don't have a fix list, will change every time. . I am not sure how to write the syntax to select a distinct list at place that list in the data null statement??

Reeza · Posted 05-06-2014 10:20 AM

What does your input data look like?

Mgarret · Posted 05-06-2014 11:38 AM

Hi Reeza-

It should look something like this:

Email_Type	Panel	Sample	Score
Spotlight Email	Test	222	0.192
Spotlight Email	Control	232	0.184
Movers	Test	456	0.352
Movers	Control	422	0.456
Event	Test	789	0.182
Event	Control	736	0.181
Customer	Test	383	0.212
Customer	Control	376	0.222
Promo Roll	Test	235	0.321
Promo Roll	Control	236	0.281
Contract Renewal	Test	222	0.165
Contract Renewal	Control	211	0.16
HMC Email	Test	245	0.16
HMC Email	Control	345	0.153
Speed	Test	322	0.196
Speed	Control	435	0.181
Acquisition	Test	452	0.078
Acquisition	Control	345	0.098
Rewards	Test	340	0.189
Rewards	Control	339	0.178
Redbox	Test	222	0.253
Redbox	Control	232	0.248
Onboarding	Test	456	0.14
Onboarding	Control	422	0.131

Thanks!

Mgarret · Posted 05-08-2014 11:38 AM

Hi Reeza--

I was just wondering if you were able to find that link to the macro you mentioned. I have still been trying to figure this out. Thanks!

Reeza · Posted 05-08-2014 01:49 PM

My macro won't work for your data structure, but here's a different way. If you can avoid macro's always easier anyways!

It first reformats the data to a different structure and then calculates the p-values in one shot and you can merge those in.

Hope it's helpful.

If you have questions post back

data have;

informat email_type $18.;

input Email_Type $ Panel $ Sample Score;

cards;

Spotlight Test 222 0.192

Spotlight Control 232 0.184

Movers Test 456 0.352

Movers Control 422 0.456

Event Test 789 0.182

Event Control 736 0.181

Customer Test 383 0.212

Customer Control 376 0.222

Promo Test 235 0.321

Promo Control 236 0.281

Contract Test 222 0.165

Contract Control 211 0.16

HMC Test 245 0.16

HMC Control 345 0.153

Speed Test 322 0.196

Speed Control 435 0.181

Acquisition Test 452 0.078

Acquisition Control 345 0.098

Rewards Test 340 0.189

Rewards Control 339 0.178

Redbox Test 222 0.253

Redbox Control 232 0.248

Onboarding Test 456 0.14

Onboarding Control 422 0.131

;;;;

run;

data step1;

set have;

status="Success";

count=floor(score*sample);

output;

count=Sample-count;

status="Failure";

output;

run;

*get the chisq p-values;

ods table chisq=chisq_Summary;

proc freq data=step1;

by email_type notsorted;

table panel*status/chisq;

weight count;

run;

data chisq_summary2;

set chisq_summary;

where statistic="Chi-Square";

run;

*merge in with original table;

*Left for you to do;

RW9 · Posted 05-06-2014 10:27 AM

Hi,

So something like:

proc sql;
create table WANT (EMAIL_TYPE char(200),CONTROL num,TEST num,P-VALUE num);
quit;

proc sql;

create table LOOP as

select distinct EMAIL

from HAVE;

quit;

data _null_;

set LOOP;

    call execute('proc ttest...; where email='||strip(EMAIL)||';...;');
    call execute('proc sql;
                    insert into WANT
                    select "'||strip(EMAIL)||'",
                            RESULT
                    from    OUTPUT_FROM_TTEST;
                  quit;');
end;
run;

data_null__ · Posted 05-06-2014 12:14 PM

Reeza wrote:

I generally refer to the documentation.

That's a bit "old school" don't you think. :smileygrin:

SteveDenham · Posted 05-09-2014 09:19 AM

In reply to @data_null: "Old school" defines me. I would guess 80% of the answers I have given here are because I read the f'in manual, and lo and behold, there was the answer...

Steve Denham