Hi--
I have a dataset with Test and Control response scores for an email test. I want to generate a column which gives the P-Value for the difference between the Control and Test proportions. I normally calculate P-Values using Proc Freq with the chisq option but that only works for one observation at a time.
This is what I am looking for (with the P-Value column filled in of course):
Email_Type | Control | Test | P-Value |
Spotlight Email | 0.1838 | 0.1924 | |
Movers | 0.4559 | 0.3519 | |
Event | 0.1806 | 0.1820 | |
Customer | 0.2216 | 0.2118 | |
Promo Roll | 0.2813 | 0.3214 | |
Contract Renewal | 0.1599 | 0.1653 | |
HMC Email | 0.1531 | 0.1599 | |
Speed | 0.1805 | 0.1960 | |
Acquisition | 0.0984 | 0.0781 | |
Rewards | 0.1781 | 0.1885 | |
Redbox | 0.2479 | 0.2527 | |
Onboarding | 0.1311 | 0.1401 |
You need more than that to calculate a p-value - the N or count specifically.
Once you have that the Proc T-Test has a way to calculate the test based on pre-calculated values, see the examples.
More than likely you'll have to merge the results in though.
Thanks Rezza. Do you happen to have a link to good examples of proc ttest?
I generally refer to the documentation.
SAS/STAT(R) 9.2 User's Guide, Second Edition
If you have proportion data you may want a different test, ie proc freq though. But then you have the same problem of getting all results in one table. I have a macro that does it that I can link to if you're interested.
Hi Reeza. That would be great if you could send me your link to the Macro. Thanks!!!!
Could you not just wrap your code up in a call execute block, e.g (very pseudo):
proc sql;
create table WANT (EMAIL_TYPE char(200),CONTROL num,TEST num,P-VALUE num);
quit;
data _null_;
do i="Spotlight Email","Movers","Event"...;
call execute('proc ttest...; where email='||strip(i)||';...;');
call execute('proc sql;
insert into WANT
select "'||strip(i)||'",
RESULT
from OUTPUT_FROM_TTEST;
quit;');
end;
run;
Note that if you don't have a fixed list (from the do part) then you can sql distinct list and then set that dataset in the data null.
Hi RW9 thanks I actually don't have a fix list, will change every time. . I am not sure how to write the syntax to select a distinct list at place that list in the data null statement??
What does your input data look like?
Hi Reeza-
It should look something like this:
Email_Type | Panel | Sample | Score |
Spotlight Email | Test | 222 | 0.192 |
Spotlight Email | Control | 232 | 0.184 |
Movers | Test | 456 | 0.352 |
Movers | Control | 422 | 0.456 |
Event | Test | 789 | 0.182 |
Event | Control | 736 | 0.181 |
Customer | Test | 383 | 0.212 |
Customer | Control | 376 | 0.222 |
Promo Roll | Test | 235 | 0.321 |
Promo Roll | Control | 236 | 0.281 |
Contract Renewal | Test | 222 | 0.165 |
Contract Renewal | Control | 211 | 0.16 |
HMC Email | Test | 245 | 0.16 |
HMC Email | Control | 345 | 0.153 |
Speed | Test | 322 | 0.196 |
Speed | Control | 435 | 0.181 |
Acquisition | Test | 452 | 0.078 |
Acquisition | Control | 345 | 0.098 |
Rewards | Test | 340 | 0.189 |
Rewards | Control | 339 | 0.178 |
Redbox | Test | 222 | 0.253 |
Redbox | Control | 232 | 0.248 |
Onboarding | Test | 456 | 0.14 |
Onboarding | Control | 422 | 0.131 |
Thanks!
Hi Reeza--
I was just wondering if you were able to find that link to the macro you mentioned. I have still been trying to figure this out. Thanks!
My macro won't work for your data structure, but here's a different way. If you can avoid macro's always easier anyways!
It first reformats the data to a different structure and then calculates the p-values in one shot and you can merge those in.
Hope it's helpful.
If you have questions post back
data have;
informat email_type $18.;
input Email_Type $ Panel $ Sample Score;
cards;
Spotlight Test 222 0.192
Spotlight Control 232 0.184
Movers Test 456 0.352
Movers Control 422 0.456
Event Test 789 0.182
Event Control 736 0.181
Customer Test 383 0.212
Customer Control 376 0.222
Promo Test 235 0.321
Promo Control 236 0.281
Contract Test 222 0.165
Contract Control 211 0.16
HMC Test 245 0.16
HMC Control 345 0.153
Speed Test 322 0.196
Speed Control 435 0.181
Acquisition Test 452 0.078
Acquisition Control 345 0.098
Rewards Test 340 0.189
Rewards Control 339 0.178
Redbox Test 222 0.253
Redbox Control 232 0.248
Onboarding Test 456 0.14
Onboarding Control 422 0.131
;;;;
run;
data step1;
set have;
status="Success";
count=floor(score*sample);
output;
count=Sample-count;
status="Failure";
output;
run;
*get the chisq p-values;
ods table chisq=chisq_Summary;
proc freq data=step1;
by email_type notsorted;
table panel*status/chisq;
weight count;
run;
data chisq_summary2;
set chisq_summary;
where statistic="Chi-Square";
run;
*merge in with original table;
*Left for you to do;
Hi,
So something like:
proc sql;
create table WANT (EMAIL_TYPE char(200),CONTROL num,TEST num,P-VALUE num);
quit;
proc sql;
create table LOOP as
select distinct EMAIL
from HAVE;
quit;
data _null_;
set LOOP;
call execute('proc ttest...; where email='||strip(EMAIL)||';...;');
call execute('proc sql;
insert into WANT
select "'||strip(EMAIL)||'",
RESULT
from OUTPUT_FROM_TTEST;
quit;');
end;
run;
Reeza wrote:
I generally refer to the documentation.
That's a bit "old school" don't you think. :smileygrin:
In reply to @data_null: "Old school" defines me. I would guess 80% of the answers I have given here are because I read the f'in manual, and lo and behold, there was the answer...
Steve Denham
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.