Hello everyone,
I'm trying to compare means one by one to find out if the difference is significant.
I have to do this by id, by period (2 periods : 1 and 2) and by a variable question.
This is what my input looks like :
Data have ;
format id $15. period 1. question $20. mean_period1 mean_period2 BEST12. ;
input id period question mean_period1 mean_period2 ;
cards;
00000_11111 1 question_1 0.64 .
00000_11111 2 question_1 . 0.72
00000_11111 1 question_2 0.58 .
00000_11111 2 question_2 . 0.64
00000_22222 1 question_1 0.81 .
00000_22222 2 question_1 . 0.75
00000_22222 1 question_2 0.68 .
00000_22222 2 question_2 . 0.62
;
So i want to know if there is a significant difference between the first period and the second period.. but I don't know how to do..
I think I have to do a test of student or maybe a chi square test
Thank you all !
Onizuka
@Onizuka wrote:
Hello everyone,
I'm trying to compare means one by one to find out if the difference is significant.
I have to do this by id, by period (2 periods : 1 and 2) and by a variable question.
This is what my input looks like :
Data have ; format id $15. period 1. question $20. mean_period1 mean_period2 BEST12. ; input id period question mean_period1 mean_period2 ; cards; 00000_11111 1 question_1 0.64 . 00000_11111 2 question_1 . 0.72 00000_11111 1 question_2 0.58 . 00000_11111 2 question_2 . 0.64 00000_22222 1 question_1 0.81 . 00000_22222 2 question_1 . 0.75 00000_22222 1 question_2 0.68 . 00000_22222 2 question_2 . 0.62 ;
If you have only one data value for each combination of id, period and question, then there is no such thing as a statistical comparison of the means. Do you have, in real life (as opposed to this small data set) multiple observations for each combination of id, period and question?
So i want to know if there is a significant difference between the first period and the second period.. but I don't know how to do..
This is not the same question as discussed above, which said "I have to do this by id, by period (2 periods : 1 and 2) and by a variable question". So please clarify exactly what you want.
Hello @PaigeMiller ,
I have calculated the mean on a sql procedure grouping by id, period and question so, yes I Have a previous table where I have multiple observations.
Yes my bad, i want to know if the difference is significant grouping by id, period and question to say something like :
"For the id 00000_11111, and the question number 1, we can see a significant difference of the mean between period 1 (example : february 2019) et period 2 (example : january 2019)
PROC SQL doesn't let you compare means statistically.
You want to take your original data and use PROC TTEST, with BY ID PERIOD QUESTION; statement.
Examples are here: https://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=statug&docsetTarget=statug_tt...
and
I have already saw the documentation ^^'
This is what i have tried :
proc ttest data = p1_p2 ;
by code_niv per question ;
var restsi restsi2;
run ;
this is what the log look like :
1150 proc ttest data = p1_p2 ; 1151 by code_niv per question ; 1152 var restsi restsi2; 1153 run ; WARNING: There are insufficient nonmissing observations to create a density plot. WARNING: There are insufficient nonmissing observations to create a density plot. WARNING: Some graphs could not be produced due to lack of observations with valid response values. NOTE: The above message was for the following BY group: code_niv=10907 per=1 question=cim_q10 WARNING: There are insufficient nonmissing observations to create a density plot. WARNING: There are insufficient nonmissing observations to create a density plot. WARNING: Some graphs could not be produced due to lack of observations with valid response values. NOTE: The above message was for the following BY group: code_niv=10907 per=1 question=cim_q11 WARNING: There are insufficient nonmissing observations to create a density plot. WARNING: There are insufficient nonmissing observations to create a density plot. WARNING: Some graphs could not be produced due to lack of observations with valid response values. NOTE: The above message was for the following BY group: code_niv=10907 per=1 question=cim_q12b
EDIT :
Then i have tried this :
proc means data = P1_P2 noprint ;
var restsi restsi2 ;
by code_niv per question;
output out = test100000 ;
run ;
proc sort data = test100000 ; by code_niv per question ; run ;
proc ttest data = test100000 ;
by code_niv per question ;
var restsi restsi2;
run ;
but all the results are missing for the ttest procedure
Show us a portion of your dataset p1_p2.
yes, i reply in a few minutes, my SAS crashed after the code i have runned haha
Here an extract :
And here an extract sorted by code_niv (id) period and question :
As far as I can determine, you still have only one data point for each combination of Code_NIV PER and QUESTION.
Also please clarify: Are the data sets you showed really P1_P2 or the data set TEST100000? I specifically asked for P1_P2.
Yes you are right, my mistake..
Do you know what are my possibilities ?
@PaigeMiller wrote:As far as I can determine, you still have only one data point for each combination of Code_NIV PER and QUESTION.
Also please clarify: Are the data sets you showed really P1_P2 or the data set TEST100000? I specifically asked for P1_P2.
TEST1000000 is the output of the proc means, what i show to you is P1_P2.
The proc means doens't calculate the variance, i have decided (i think it is better) to have only one variable (restsi which contain restsi and restsi2) :
Data P1_P2_ (drop = restsi2);
set P1_P2 ;
if restsi = . then restsi = restsi2 ;
run ;
Proc means data = P1_P2_ noprint ;
var restsi ;
by code_niv per question ;
output out = test1000000;
run ;
Here an extract of P1_P2_ :
and here an extract of TEST100000 (from the proc means with P1_P2_)
@Onizuka wrote:
Yes you are right, my mistake..
Do you know what are my possibilities ?
There is no way to statistically compare means if you have only a single data point in each combination of variables of interest.
Ok.. Thank you for the answer
I permit me to up the topic ..
Someone has an idea ? :X
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.