Hello,
I have some basic data below showing PatientID, Visit and the Score the patient has for some metric such as heart rate or pulse (I just made this up as a simple example of what I'm doing).
If PatientID 1 has a higher score than everyone else, it would be helpful to compare him with all the other patients. So I was hoping to create something like a TTEST that would show summary stats for Patient 1 including N, Mean, Median etc and then compare these results with the summary stats for everyone (so the overall mean, median etc) to see if his results are different to the overall group.
So it would look something like:
Variable N Mean Median
Patient1 10 99 98
Everyone 100 66 55
Diff(1-2) 33 43
Here is the simple data, and I'm not sure how to compare only Patient1 with everyone else. Thanks for your help or any ideas about other ways I can do this.
data have; input PatientID $ Visit Score; datalines; 1 1 22 1 2 44 2 1 63 2 2 20 3 1 48 3 2 61 ; run; title 'T-Test'; proc ttest data=have; class ?; var Score; run;
What is your sample size for any given Id? If your sample is as small as you show for Patient1 then TTest is likely not the appropriate test.
To do one value against all others you could create a format for the Patient Id such as:
Proc format; value $pid 1='Patient 1' other='Everyone else' ; run;
Then apply that format to your PatientID Class variable in Proc Ttest. Groups formed by formats are generally usable by most procedures.
proc ttest data=have; class patientid; format patientid $pid.; var Score; run;
Caveat: You cannot name the format ending with a number. That would be treated as a display length.
What is your sample size for any given Id? If your sample is as small as you show for Patient1 then TTest is likely not the appropriate test.
To do one value against all others you could create a format for the Patient Id such as:
Proc format; value $pid 1='Patient 1' other='Everyone else' ; run;
Then apply that format to your PatientID Class variable in Proc Ttest. Groups formed by formats are generally usable by most procedures.
proc ttest data=have; class patientid; format patientid $pid.; var Score; run;
Caveat: You cannot name the format ending with a number. That would be treated as a display length.
That's perfect. It correctly splits the results into scores for just Patient 1 and then shows results for everyone else (excluding Patient 1's scores).
Yes - you are correct about the smallish sample size. The data would only look at around 4 visits in total, and there would be around 10 patients. I was under the impression that t-tests do not have a minimum size exactly, as they're often used with small sample sizes? It's not for a publication of any sort, more just something to show managers in meetings when they want a comparison. The data is fairly normally distributed (even though my example over-exaggerates Patient 1's scores).
@Buzzy_Bee wrote:
That's perfect. It correctly splits the results into scores for just Patient 1 and then shows results for everyone else (excluding Patient 1's scores).
Yes - you are correct about the smallish sample size. The data would only look at around 4 visits in total, and there would be around 10 patients. I was under the impression that t-tests do not have a minimum size exactly, as they're often used with small sample sizes? It's not for a publication of any sort, more just something to show managers in meetings when they want a comparison. The data is fairly normally distributed (even though my example over-exaggerates Patient 1's scores).
Typically TTest is used with somewhat larger samples. When the sample approaches 30 then the distribution of means starts becoming normally distributed. If you "know" or have tested that your variable is (fairly close) to normal distribution than you are okay. The thing is that TTest expects normally distributed data and quite often small samples are hard to test to be sure of that.
For extremely small samples I would tend to grab a non-parametric approach like a sign-test or similar if the discussion was to be rigorous about statistical significance. Or perhaps just use PROC MEANS or even a reporting proc like Proc Report or Tabulate that will calculate means of variables. Use the same Format approach on a Group variable (proc report) or Class variable (Proc Tabulate or Means).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.