BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Buzzy_Bee
Quartz | Level 8

Hello,

I have some basic data below showing PatientID, Visit and the Score the patient has for some metric such as heart rate or pulse (I just made this up as a simple example of what I'm doing).

If PatientID 1 has a higher score than everyone else, it would be helpful to compare him with all the other patients. So I was hoping to create something like a TTEST that would show summary stats for Patient 1 including N, Mean, Median etc and then compare these results with the summary stats for everyone (so the overall mean, median etc) to see if his results are different to the overall group.

So it would look something like:

Variable      N     Mean Median

Patient1      10    99      98

Everyone    100  66      55

Diff(1-2)               33      43

 

Here is the simple data, and I'm not sure how to compare only Patient1 with everyone else. Thanks for your help or any ideas about other ways I can do this.

 

 

data have;                     
   input PatientID $ Visit Score;      
   datalines; 
1 1 22 
1 2 44 
2 1 63 
2 2 20 
3 1 48 
3 2 61
;
run;

title 'T-Test';
proc ttest data=have;
	class ?; 
	var Score; 
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

What is your sample size for any given Id? If your sample is as small as you show for Patient1 then TTest is likely not the appropriate test.

 

To do one value against all others you could create a format for the Patient Id such as:

 

Proc format;
value $pid
1='Patient 1'
other='Everyone else'
;
run;

Then apply that format to your PatientID Class variable in Proc Ttest. Groups formed by formats are generally usable by most procedures.

proc ttest data=have;
	class patientid; 
        format patientid $pid.;
	var Score; 
run;

 

Caveat: You cannot name the format ending with a number. That would be treated as a display length.

 

View solution in original post

3 REPLIES 3
ballardw
Super User

What is your sample size for any given Id? If your sample is as small as you show for Patient1 then TTest is likely not the appropriate test.

 

To do one value against all others you could create a format for the Patient Id such as:

 

Proc format;
value $pid
1='Patient 1'
other='Everyone else'
;
run;

Then apply that format to your PatientID Class variable in Proc Ttest. Groups formed by formats are generally usable by most procedures.

proc ttest data=have;
	class patientid; 
        format patientid $pid.;
	var Score; 
run;

 

Caveat: You cannot name the format ending with a number. That would be treated as a display length.

 

Buzzy_Bee
Quartz | Level 8

That's perfect. It correctly splits the results into scores for just Patient 1 and then shows results for everyone else (excluding Patient 1's scores).

Yes - you are correct about the smallish sample size. The data would only look at around 4 visits in total, and there would be around 10 patients. I was under the impression that t-tests do not have a minimum size exactly, as they're often used with small sample sizes? It's not for a publication of any sort, more just something to show managers in meetings when they want a comparison. The data is fairly normally distributed (even though my example over-exaggerates Patient 1's scores).

ballardw
Super User

@Buzzy_Bee wrote:

That's perfect. It correctly splits the results into scores for just Patient 1 and then shows results for everyone else (excluding Patient 1's scores).

Yes - you are correct about the smallish sample size. The data would only look at around 4 visits in total, and there would be around 10 patients. I was under the impression that t-tests do not have a minimum size exactly, as they're often used with small sample sizes? It's not for a publication of any sort, more just something to show managers in meetings when they want a comparison. The data is fairly normally distributed (even though my example over-exaggerates Patient 1's scores).


Typically TTest is used with somewhat larger samples. When the sample approaches 30 then the distribution of means starts becoming normally distributed. If you "know" or have tested that your variable is (fairly close) to normal distribution than you are okay. The thing is that TTest expects normally distributed data and quite often small samples are hard to test to be sure of that.

 

For extremely small samples I would tend to grab a  non-parametric approach like a sign-test or similar if the discussion was to be rigorous about statistical significance. Or perhaps just use PROC MEANS or even a reporting proc like Proc Report or Tabulate that will calculate means of variables. Use the same Format approach on a Group variable (proc report) or Class variable (Proc Tabulate or Means).

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 708 views
  • 1 like
  • 2 in conversation