BookmarkSubscribeRSS Feed
Mgarret
Obsidian | Level 7

Hi--


I have a dataset with Test and Control response scores for an email test. I want to generate a column  which gives the P-Value for the difference between the Control and Test proportions.  I normally calculate P-Values using Proc Freq with the chisq option but that only works for one observation at a time.


This is what I am looking for (with the P-Value column filled in of course):

Email_Type ControlTestP-Value
Spotlight Email0.18380.1924
Movers0.45590.3519
Event 0.18060.1820
Customer 0.22160.2118
Promo Roll 0.28130.3214
Contract Renewal 0.15990.1653
HMC Email0.15310.1599
Speed 0.18050.1960
Acquisition 0.09840.0781
Rewards 0.17810.1885
Redbox 0.24790.2527
Onboarding 0.13110.1401
13 REPLIES 13
Reeza
Super User

You need more than that to calculate a p-value - the N or count specifically.

Once you have that the Proc T-Test has a way to calculate the test based on pre-calculated values, see the examples.

More than likely you'll have to merge the results in though.

Mgarret
Obsidian | Level 7

Thanks Rezza. Do you happen to have a link to good examples of proc ttest?

Reeza
Super User

I generally refer to the documentation.

SAS/STAT(R) 9.2 User's Guide, Second Edition

If you have proportion data you may want a different test, ie proc freq though. But then you have the same problem of getting all results in one table. I have a macro that does it that I can link to if you're interested.

Mgarret
Obsidian | Level 7

Hi Reeza. That would be great if you could send me your link to the Macro. Thanks!!!!

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Could you not just wrap your code up in a call execute block, e.g (very pseudo):

proc sql;
  create table WANT (EMAIL_TYPE char(200),CONTROL num,TEST num,P-VALUE num);
quit;

data _null_;
  do i="Spotlight Email","Movers","Event"...;
    call execute('proc ttest...; where email='||strip(i)||';...;');
    call execute('proc sql;
                    insert into WANT
                    select  "'||strip(i)||'",
                            RESULT
                    from    OUTPUT_FROM_TTEST;
                  quit;');
  end;
run;

Note that if you don't have a fixed list (from the do part) then you can sql distinct list and then set that dataset in the data null.

Mgarret
Obsidian | Level 7

Hi RW9 thanks I actually don't have a fix list, will change every time. . I am not sure how to write the syntax to select a distinct list at place that list in the data null statement??

Reeza
Super User

What does your input data look like?

Mgarret
Obsidian | Level 7

Hi Reeza-

It should look something like this:

Email_TypePanelSampleScore
Spotlight EmailTest 2220.192
Spotlight EmailControl2320.184
MoversTest 4560.352
MoversControl4220.456
EventTest 7890.182
EventControl7360.181
CustomerTest 3830.212
CustomerControl3760.222
Promo RollTest 2350.321
Promo RollControl2360.281
Contract RenewalTest 2220.165
Contract RenewalControl2110.16
HMC EmailTest 2450.16
HMC EmailControl3450.153
SpeedTest 3220.196
SpeedControl4350.181
AcquisitionTest 4520.078
AcquisitionControl3450.098
RewardsTest 3400.189
RewardsControl3390.178
RedboxTest 2220.253
RedboxControl2320.248
OnboardingTest 4560.14
OnboardingControl4220.131

Thanks!

Mgarret
Obsidian | Level 7

Hi Reeza--

I was just wondering if you were able to find that link to the macro you mentioned. I have still been trying to figure this out.  Thanks!

Reeza
Super User

My macro won't work for your data structure, but here's a different way. If you can avoid macro's always easier anyways!

It first reformats the data to a different structure and then calculates the p-values in one shot and you can merge those in.

Hope it's helpful.

If you have questions post back Smiley Happy

data have;

informat email_type $18.;

input Email_Type $    Panel $    Sample    Score;

cards;

Spotlight    Test     222    0.192

Spotlight    Control    232    0.184

Movers    Test     456    0.352

Movers    Control    422    0.456

Event    Test     789    0.182

Event    Control    736    0.181

Customer    Test     383    0.212

Customer    Control    376    0.222

Promo    Test     235    0.321

Promo    Control    236    0.281

Contract    Test     222    0.165

Contract    Control    211    0.16

HMC    Test     245    0.16

HMC    Control    345    0.153

Speed    Test     322    0.196

Speed    Control    435    0.181

Acquisition    Test     452    0.078

Acquisition    Control    345    0.098

Rewards    Test     340    0.189

Rewards    Control    339    0.178

Redbox    Test     222    0.253

Redbox    Control    232    0.248

Onboarding    Test     456    0.14

Onboarding    Control    422    0.131

;;;;

run;

data step1;

    set have;

    status="Success";

    count=floor(score*sample);

    output;

    count=Sample-count;

    status="Failure";

    output;

run;

*get the chisq p-values;

ods table chisq=chisq_Summary;

proc freq data=step1;

    by email_type notsorted;

    table panel*status/chisq;

    weight count;

run;

data chisq_summary2;

    set chisq_summary;

    where statistic="Chi-Square";

run;

*merge in with original table;

*Left for you to do;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

So something like:

proc sql;
  create table WANT (EMAIL_TYPE char(200),CONTROL num,TEST num,P-VALUE num);
quit;

proc sql;

     create table LOOP as

     select     distinct EMAIL

     from       HAVE;

quit;

data _null_;

     set LOOP;

    call execute('proc ttest...; where email='||strip(EMAIL)||';...;');
    call execute('proc sql;
                    insert into WANT
                    select  "'||strip(EMAIL)||'",
                            RESULT
                    from    OUTPUT_FROM_TTEST;
                  quit;');
  end;
run;

data_null__
Jade | Level 19

Reeza wrote:

I generally refer to the documentation.

That's a bit "old school" don't you think. :smileygrin:

SteveDenham
Jade | Level 19

In reply to @data_null: "Old school" defines me.  I would guess 80% of the answers I have given here are because I read the f'in manual, and lo and behold, there was the answer...

Steve Denham

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 2890 views
  • 7 likes
  • 5 in conversation