Programming the statistical procedures from SAS

Adding a P-Value Column to a Dataset

Reply
Frequent Contributor
Posts: 142

Adding a P-Value Column to a Dataset

Hi--


I have a dataset with Test and Control response scores for an email test. I want to generate a column  which gives the P-Value for the difference between the Control and Test proportions.  I normally calculate P-Values using Proc Freq with the chisq option but that only works for one observation at a time.


This is what I am looking for (with the P-Value column filled in of course):

Email_Type ControlTestP-Value
Spotlight Email0.18380.1924
Movers0.45590.3519
Event 0.18060.1820
Customer 0.22160.2118
Promo Roll 0.28130.3214
Contract Renewal 0.15990.1653
HMC Email0.15310.1599
Speed 0.18050.1960
Acquisition 0.09840.0781
Rewards 0.17810.1885
Redbox 0.24790.2527
Onboarding 0.13110.1401
Grand Advisor
Posts: 16,884

Re: Adding a P-Value Column to a Dataset

You need more than that to calculate a p-value - the N or count specifically.

Once you have that the Proc T-Test has a way to calculate the test based on pre-calculated values, see the examples.

More than likely you'll have to merge the results in though.

Frequent Contributor
Posts: 142

Re: Adding a P-Value Column to a Dataset

Thanks Rezza. Do you happen to have a link to good examples of proc ttest?

Grand Advisor
Posts: 16,884

Re: Adding a P-Value Column to a Dataset

I generally refer to the documentation.

SAS/STAT(R) 9.2 User's Guide, Second Edition

If you have proportion data you may want a different test, ie proc freq though. But then you have the same problem of getting all results in one table. I have a macro that does it that I can link to if you're interested.

Frequent Contributor
Posts: 142

Re: Adding a P-Value Column to a Dataset

Hi Reeza. That would be great if you could send me your link to the Macro. Thanks!!!!

Esteemed Advisor
Esteemed Advisor
Posts: 6,977

Re: Adding a P-Value Column to a Dataset

Could you not just wrap your code up in a call execute block, e.g (very pseudo):

proc sql;
  create table WANT (EMAIL_TYPE char(200),CONTROL num,TEST num,P-VALUE num);
quit;

data _null_;
  do i="Spotlight Email","Movers","Event"...;
    call execute('proc ttest...; where email='||strip(i)||';...;');
    call execute('proc sql;
                    insert into WANT
                    select  "'||strip(i)||'",
                            RESULT
                    from    OUTPUT_FROM_TTEST;
                  quit;');
  end;
run;

Note that if you don't have a fixed list (from the do part) then you can sql distinct list and then set that dataset in the data null.

Frequent Contributor
Posts: 142

Re: Adding a P-Value Column to a Dataset

Hi RW9 thanks I actually don't have a fix list, will change every time. . I am not sure how to write the syntax to select a distinct list at place that list in the data null statement??

Grand Advisor
Posts: 16,884

Re: Adding a P-Value Column to a Dataset

What does your input data look like?

Frequent Contributor
Posts: 142

Re: Adding a P-Value Column to a Dataset

Hi Reeza-

It should look something like this:

Email_TypePanelSampleScore
Spotlight EmailTest 2220.192
Spotlight EmailControl2320.184
MoversTest 4560.352
MoversControl4220.456
EventTest 7890.182
EventControl7360.181
CustomerTest 3830.212
CustomerControl3760.222
Promo RollTest 2350.321
Promo RollControl2360.281
Contract RenewalTest 2220.165
Contract RenewalControl2110.16
HMC EmailTest 2450.16
HMC EmailControl3450.153
SpeedTest 3220.196
SpeedControl4350.181
AcquisitionTest 4520.078
AcquisitionControl3450.098
RewardsTest 3400.189
RewardsControl3390.178
RedboxTest 2220.253
RedboxControl2320.248
OnboardingTest 4560.14
OnboardingControl4220.131

Thanks!

Frequent Contributor
Posts: 142

Re: Adding a P-Value Column to a Dataset

Hi Reeza--

I was just wondering if you were able to find that link to the macro you mentioned. I have still been trying to figure this out.  Thanks!

Grand Advisor
Posts: 16,884

Re: Adding a P-Value Column to a Dataset

My macro won't work for your data structure, but here's a different way. If you can avoid macro's always easier anyways!

It first reformats the data to a different structure and then calculates the p-values in one shot and you can merge those in.

Hope it's helpful.

If you have questions post back Smiley Happy

data have;

informat email_type $18.;

input Email_Type $    Panel $    Sample    Score;

cards;

Spotlight    Test     222    0.192

Spotlight    Control    232    0.184

Movers    Test     456    0.352

Movers    Control    422    0.456

Event    Test     789    0.182

Event    Control    736    0.181

Customer    Test     383    0.212

Customer    Control    376    0.222

Promo    Test     235    0.321

Promo    Control    236    0.281

Contract    Test     222    0.165

Contract    Control    211    0.16

HMC    Test     245    0.16

HMC    Control    345    0.153

Speed    Test     322    0.196

Speed    Control    435    0.181

Acquisition    Test     452    0.078

Acquisition    Control    345    0.098

Rewards    Test     340    0.189

Rewards    Control    339    0.178

Redbox    Test     222    0.253

Redbox    Control    232    0.248

Onboarding    Test     456    0.14

Onboarding    Control    422    0.131

;;;;

run;

data step1;

    set have;

    status="Success";

    count=floor(score*sample);

    output;

    count=Sample-count;

    status="Failure";

    output;

run;

*get the chisq p-values;

ods table chisq=chisq_Summary;

proc freq data=step1;

    by email_type notsorted;

    table panel*status/chisq;

    weight count;

run;

data chisq_summary2;

    set chisq_summary;

    where statistic="Chi-Square";

run;

*merge in with original table;

*Left for you to do;

Esteemed Advisor
Esteemed Advisor
Posts: 6,977

Re: Adding a P-Value Column to a Dataset

Hi,

So something like:

proc sql;
  create table WANT (EMAIL_TYPE char(200),CONTROL num,TEST num,P-VALUE num);
quit;

proc sql;

     create table LOOP as

     select     distinct EMAIL

     from       HAVE;

quit;

data _null_;

     set LOOP;

    call execute('proc ttest...; where email='||strip(EMAIL)||';...;');
    call execute('proc sql;
                    insert into WANT
                    select  "'||strip(EMAIL)||'",
                            RESULT
                    from    OUTPUT_FROM_TTEST;
                  quit;');
  end;
run;

Respected Advisor
Posts: 3,773

Re: Adding a P-Value Column to a Dataset

Reeza wrote:

I generally refer to the documentation.

That's a bit "old school" don't you think. :smileygrin:

Respected Advisor
Posts: 2,655

Re: Adding a P-Value Column to a Dataset

In reply to @data_null: "Old school" defines me.  I would guess 80% of the answers I have given here are because I read the f'in manual, and lo and behold, there was the answer...

Steve Denham

Ask a Question
Discussion stats
  • 13 replies
  • 635 views
  • 7 likes
  • 5 in conversation