BookmarkSubscribeRSS Feed
Onizuka
Pyrite | Level 9

Hello everyone,

 

I'm trying to compare means one by one to find out if the difference is significant.

 

I have to do this by id, by period (2 periods : 1 and 2) and by a variable question.

 

This is what my input looks like :

 

Data have ;
format id $15. period 1. question $20. mean_period1 mean_period2 BEST12. ;
input id period question mean_period1 mean_period2 ;
cards;
00000_11111 1 question_1 0.64 .
00000_11111 2 question_1 . 0.72
00000_11111 1 question_2 0.58 .
00000_11111 2 question_2 . 0.64
00000_22222 1 question_1 0.81 .
00000_22222 2 question_1 . 0.75
00000_22222 1 question_2 0.68 .
00000_22222 2 question_2 . 0.62
;

So i want to know if there is a significant difference between the first period and the second period.. but I don't know how to do..

I think I have to do a test of student or maybe a chi square test

 

Thank you all !

 

Onizuka

12 REPLIES 12
PaigeMiller
Diamond | Level 26

@Onizuka wrote:

Hello everyone,

 

I'm trying to compare means one by one to find out if the difference is significant.

 

I have to do this by id, by period (2 periods : 1 and 2) and by a variable question.

 

This is what my input looks like :

 

Data have ;
format id $15. period 1. question $20. mean_period1 mean_period2 BEST12. ;
input id period question mean_period1 mean_period2 ;
cards;
00000_11111 1 question_1 0.64 .
00000_11111 2 question_1 . 0.72
00000_11111 1 question_2 0.58 .
00000_11111 2 question_2 . 0.64
00000_22222 1 question_1 0.81 .
00000_22222 2 question_1 . 0.75
00000_22222 1 question_2 0.68 .
00000_22222 2 question_2 . 0.62
;

 


If you have only one data value for each combination of id, period and question, then there is no such thing as a statistical comparison of the means. Do you have, in real life (as opposed to this small data set) multiple observations for each combination of id, period and question?

 

So i want to know if there is a significant difference between the first period and the second period.. but I don't know how to do..

 

This is not the same question as discussed above, which said "I have to do this by id, by period (2 periods : 1 and 2) and by a variable question". So please clarify exactly what you want.

--
Paige Miller
Onizuka
Pyrite | Level 9

Hello @PaigeMiller ,

 

I have calculated the mean on a sql procedure grouping by id, period and question so, yes I Have a previous table where I have multiple observations.

 

Yes my bad, i want to know if the difference is significant grouping by id, period and question to say something like :

"For the id 00000_11111, and the question number 1, we can see a significant difference of the mean between period 1  (example : february 2019) et period 2 (example : january 2019)

PaigeMiller
Diamond | Level 26

PROC SQL doesn't let you compare means statistically.

 

You want to take your original data and use PROC TTEST, with  BY ID PERIOD QUESTION; statement.

 

Examples are here: https://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=statug&docsetTarget=statug_tt...

and 

https://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=statug&docsetTarget=statug_tt...

--
Paige Miller
Onizuka
Pyrite | Level 9

I have already saw the documentation ^^'

 

This is what i have tried :

 

proc ttest data = p1_p2 ;
by code_niv per question ;
var restsi restsi2; 
run ;

this is what the log look like :

1150  proc ttest data = p1_p2 ;
1151  by code_niv per question ;
1152  var restsi restsi2;
1153  run ;

WARNING: There are insufficient nonmissing observations to create a density plot.
WARNING: There are insufficient nonmissing observations to create a density plot.
WARNING: Some graphs could not be produced due to lack of observations with valid response
         values.
NOTE: The above message was for the following BY group:
      code_niv=10907 per=1 question=cim_q10
WARNING: There are insufficient nonmissing observations to create a density plot.
WARNING: There are insufficient nonmissing observations to create a density plot.
WARNING: Some graphs could not be produced due to lack of observations with valid response
         values.
NOTE: The above message was for the following BY group:
      code_niv=10907 per=1 question=cim_q11
WARNING: There are insufficient nonmissing observations to create a density plot.
WARNING: There are insufficient nonmissing observations to create a density plot.
WARNING: Some graphs could not be produced due to lack of observations with valid response
         values.
NOTE: The above message was for the following BY group:
      code_niv=10907 per=1 question=cim_q12b

EDIT :

Then i have tried this :

 

proc means data = P1_P2 noprint ;
var restsi restsi2 ;
by code_niv per question;
output out = test100000 ;
run ;

proc sort data = test100000 ; by code_niv per question ; run ;
proc ttest data = test100000 ;
by code_niv per question ;
var restsi restsi2; 
run ;

but all the results are missing for the ttest procedure

PaigeMiller
Diamond | Level 26

Show us a portion of your dataset p1_p2.

--
Paige Miller
Onizuka
Pyrite | Level 9

yes, i reply in a few minutes, my SAS crashed after the code i have runned haha

Onizuka
Pyrite | Level 9

Here an extract :

 

Capture.PNG

 

And here an extract sorted by code_niv (id) period and question :

Capture.PNG

PaigeMiller
Diamond | Level 26

As far as I can determine, you still have only one data point for each combination of Code_NIV PER and QUESTION.

 

Also please clarify: Are the data sets you showed really P1_P2 or the data set TEST100000? I specifically asked for P1_P2.

--
Paige Miller
Onizuka
Pyrite | Level 9

Yes you are right, my mistake..

 

Do you know what are my possibilities ?

 


@PaigeMiller wrote:

As far as I can determine, you still have only one data point for each combination of Code_NIV PER and QUESTION.

 

Also please clarify: Are the data sets you showed really P1_P2 or the data set TEST100000? I specifically asked for P1_P2.


TEST1000000 is the output of the proc means, what i show to you is P1_P2.

 

The proc means doens't calculate the variance, i have decided (i think it is better) to have only one variable (restsi which contain restsi and restsi2) :

 

Data P1_P2_ (drop = restsi2);
set P1_P2 ;
if restsi = . then restsi = restsi2 ; 
run ;

Proc means data = P1_P2_ noprint ;
var restsi ;
by code_niv per question ;
output out = test1000000;
run ;

Here an extract of P1_P2_ :

 

Capture.PNG

 

and here an extract of TEST100000 (from the proc means with P1_P2_)

 

Capture.PNG

PaigeMiller
Diamond | Level 26

@Onizuka wrote:

Yes you are right, my mistake..

 

Do you know what are my possibilities ?


There is no way to statistically compare means if you have only a single data point in each combination of variables of interest.

--
Paige Miller
Onizuka
Pyrite | Level 9

Ok.. Thank you for the answer

Onizuka
Pyrite | Level 9

I permit me to up the topic ..

Someone has an idea ? :X

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2701 views
  • 0 likes
  • 2 in conversation