BookmarkSubscribeRSS Feed
Onizuka
Pyrite | Level 9

Hello everyone,

 

I'm trying to compare means one by one to find out if the difference is significant.

 

I have to do this by id, by period (2 periods : 1 and 2) and by a variable question.

 

This is what my input looks like :

 

Data have ;
format id $15. period 1. question $20. mean_period1 mean_period2 BEST12. ;
input id period question mean_period1 mean_period2 ;
cards;
00000_11111 1 question_1 0.64 .
00000_11111 2 question_1 . 0.72
00000_11111 1 question_2 0.58 .
00000_11111 2 question_2 . 0.64
00000_22222 1 question_1 0.81 .
00000_22222 2 question_1 . 0.75
00000_22222 1 question_2 0.68 .
00000_22222 2 question_2 . 0.62
;

So i want to know if there is a significant difference between the first period and the second period.. but I don't know how to do..

I think I have to do a test of student or maybe a chi square test

 

Thank you all !

 

Onizuka

12 REPLIES 12
PaigeMiller
Diamond | Level 26

@Onizuka wrote:

Hello everyone,

 

I'm trying to compare means one by one to find out if the difference is significant.

 

I have to do this by id, by period (2 periods : 1 and 2) and by a variable question.

 

This is what my input looks like :

 

Data have ;
format id $15. period 1. question $20. mean_period1 mean_period2 BEST12. ;
input id period question mean_period1 mean_period2 ;
cards;
00000_11111 1 question_1 0.64 .
00000_11111 2 question_1 . 0.72
00000_11111 1 question_2 0.58 .
00000_11111 2 question_2 . 0.64
00000_22222 1 question_1 0.81 .
00000_22222 2 question_1 . 0.75
00000_22222 1 question_2 0.68 .
00000_22222 2 question_2 . 0.62
;

 


If you have only one data value for each combination of id, period and question, then there is no such thing as a statistical comparison of the means. Do you have, in real life (as opposed to this small data set) multiple observations for each combination of id, period and question?

 

So i want to know if there is a significant difference between the first period and the second period.. but I don't know how to do..

 

This is not the same question as discussed above, which said "I have to do this by id, by period (2 periods : 1 and 2) and by a variable question". So please clarify exactly what you want.

--
Paige Miller
Onizuka
Pyrite | Level 9

Hello @PaigeMiller ,

 

I have calculated the mean on a sql procedure grouping by id, period and question so, yes I Have a previous table where I have multiple observations.

 

Yes my bad, i want to know if the difference is significant grouping by id, period and question to say something like :

"For the id 00000_11111, and the question number 1, we can see a significant difference of the mean between period 1  (example : february 2019) et period 2 (example : january 2019)

PaigeMiller
Diamond | Level 26

PROC SQL doesn't let you compare means statistically.

 

You want to take your original data and use PROC TTEST, with  BY ID PERIOD QUESTION; statement.

 

Examples are here: https://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=statug&docsetTarget=statug_tt...

and 

https://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=statug&docsetTarget=statug_tt...

--
Paige Miller
Onizuka
Pyrite | Level 9

I have already saw the documentation ^^'

 

This is what i have tried :

 

proc ttest data = p1_p2 ;
by code_niv per question ;
var restsi restsi2; 
run ;

this is what the log look like :

1150  proc ttest data = p1_p2 ;
1151  by code_niv per question ;
1152  var restsi restsi2;
1153  run ;

WARNING: There are insufficient nonmissing observations to create a density plot.
WARNING: There are insufficient nonmissing observations to create a density plot.
WARNING: Some graphs could not be produced due to lack of observations with valid response
         values.
NOTE: The above message was for the following BY group:
      code_niv=10907 per=1 question=cim_q10
WARNING: There are insufficient nonmissing observations to create a density plot.
WARNING: There are insufficient nonmissing observations to create a density plot.
WARNING: Some graphs could not be produced due to lack of observations with valid response
         values.
NOTE: The above message was for the following BY group:
      code_niv=10907 per=1 question=cim_q11
WARNING: There are insufficient nonmissing observations to create a density plot.
WARNING: There are insufficient nonmissing observations to create a density plot.
WARNING: Some graphs could not be produced due to lack of observations with valid response
         values.
NOTE: The above message was for the following BY group:
      code_niv=10907 per=1 question=cim_q12b

EDIT :

Then i have tried this :

 

proc means data = P1_P2 noprint ;
var restsi restsi2 ;
by code_niv per question;
output out = test100000 ;
run ;

proc sort data = test100000 ; by code_niv per question ; run ;
proc ttest data = test100000 ;
by code_niv per question ;
var restsi restsi2; 
run ;

but all the results are missing for the ttest procedure

PaigeMiller
Diamond | Level 26

Show us a portion of your dataset p1_p2.

--
Paige Miller
Onizuka
Pyrite | Level 9

yes, i reply in a few minutes, my SAS crashed after the code i have runned haha

Onizuka
Pyrite | Level 9

Here an extract :

 

Capture.PNG

 

And here an extract sorted by code_niv (id) period and question :

Capture.PNG

PaigeMiller
Diamond | Level 26

As far as I can determine, you still have only one data point for each combination of Code_NIV PER and QUESTION.

 

Also please clarify: Are the data sets you showed really P1_P2 or the data set TEST100000? I specifically asked for P1_P2.

--
Paige Miller
Onizuka
Pyrite | Level 9

Yes you are right, my mistake..

 

Do you know what are my possibilities ?

 


@PaigeMiller wrote:

As far as I can determine, you still have only one data point for each combination of Code_NIV PER and QUESTION.

 

Also please clarify: Are the data sets you showed really P1_P2 or the data set TEST100000? I specifically asked for P1_P2.


TEST1000000 is the output of the proc means, what i show to you is P1_P2.

 

The proc means doens't calculate the variance, i have decided (i think it is better) to have only one variable (restsi which contain restsi and restsi2) :

 

Data P1_P2_ (drop = restsi2);
set P1_P2 ;
if restsi = . then restsi = restsi2 ; 
run ;

Proc means data = P1_P2_ noprint ;
var restsi ;
by code_niv per question ;
output out = test1000000;
run ;

Here an extract of P1_P2_ :

 

Capture.PNG

 

and here an extract of TEST100000 (from the proc means with P1_P2_)

 

Capture.PNG

PaigeMiller
Diamond | Level 26

@Onizuka wrote:

Yes you are right, my mistake..

 

Do you know what are my possibilities ?


There is no way to statistically compare means if you have only a single data point in each combination of variables of interest.

--
Paige Miller
Onizuka
Pyrite | Level 9

Ok.. Thank you for the answer

Onizuka
Pyrite | Level 9

I permit me to up the topic ..

Someone has an idea ? :X

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2880 views
  • 0 likes
  • 2 in conversation