Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Corinthian94
Obsidian | Level 7

Hi all, just wondering how I would statistically compare variables to each other if I had two datasets that had different amounts of observations. Essentially I'm trying to find if there are any significant differences between variables between a population at baseline and follow-up, the follow-up of which has only about half of the baseline participants. I'm looking for a way to do this statistically as well (i.e. to get comparison p values). Anybody have some insight? Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You cannot do a paired t-test in this case but you can do a normal t-test. 

 

If you match, and only include those are included you can do a paired t-test.

 

I would probably do both and then see what I got. 

 

I would also look at the statistics of who was included in the first and second to ensure they're the same across demographic data and that you for example don't have all older individuals in the second sample rather than younger or all one gender. If they have the same distributions, you can somewhat assume they're representative and compare. But if they're not you may need to adjust them to account for that. 

View solution in original post

2 REPLIES 2
Reeza
Super User

You cannot do a paired t-test in this case but you can do a normal t-test. 

 

If you match, and only include those are included you can do a paired t-test.

 

I would probably do both and then see what I got. 

 

I would also look at the statistics of who was included in the first and second to ensure they're the same across demographic data and that you for example don't have all older individuals in the second sample rather than younger or all one gender. If they have the same distributions, you can somewhat assume they're representative and compare. But if they're not you may need to adjust them to account for that. 

ballardw
Super User

Maybe this can give you start. It creates two sets of common variables with a "measurement" variable randomly generated, combines the sets and adds an identification source variable and then uses the source as Class to identify the data group for a Ttest of the measurement variable.

data setone;
  do id=1 to 100;
     somevar = rand('normal');
     output;
  end;
run;
data settwo;
  do id=1001 to 1050;
     somevar = rand('normal');
     output;
  end;
run;

data totest;
  set setone (in=in1)
      settwo
  ;
  if in1 then Set='Baseline';
  else Set='Follow';
run;

proc ttest data=totest;
  var somevar;
  class set;
run; 

Similar approach would work for other tests that might use the Class variable as an independent variable in a regression model.

 

With the random value chosen, specifically from the same distribution and sizes chosen I would be surprised to see a difference in the means. Change the second data step to Rand('normal', 0.3, 1.6) or similar and pretty likely to get a significant difference.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1728 views
  • 2 likes
  • 3 in conversation