Hello everyone,
I'm trying to compare means one by one to find out if the difference is significant.
I have to do this by id, by period (2 periods : 1 and 2) and by a variable question.
This is what my input looks like :
Data have ;
format id $15. period 1. question $20. mean_period1 mean_period2 BEST12. ;
input id period question mean_period1 mean_period2 ;
cards;
00000_11111 1 question_1 0.64 .
00000_11111 2 question_1 . 0.72
00000_11111 1 question_2 0.58 .
00000_11111 2 question_2 . 0.64
00000_22222 1 question_1 0.81 .
00000_22222 2 question_1 . 0.75
00000_22222 1 question_2 0.68 .
00000_22222 2 question_2 . 0.62
;
So i want to know if there is a significant difference between the first period and the second period.. but I don't know how to do..
I think I have to do a test of student or maybe a chi square test
Thank you all !
Onizuka
@Onizuka wrote:
Hello everyone,
I'm trying to compare means one by one to find out if the difference is significant.
I have to do this by id, by period (2 periods : 1 and 2) and by a variable question.
This is what my input looks like :
Data have ; format id $15. period 1. question $20. mean_period1 mean_period2 BEST12. ; input id period question mean_period1 mean_period2 ; cards; 00000_11111 1 question_1 0.64 . 00000_11111 2 question_1 . 0.72 00000_11111 1 question_2 0.58 . 00000_11111 2 question_2 . 0.64 00000_22222 1 question_1 0.81 . 00000_22222 2 question_1 . 0.75 00000_22222 1 question_2 0.68 . 00000_22222 2 question_2 . 0.62 ;
If you have only one data value for each combination of id, period and question, then there is no such thing as a statistical comparison of the means. Do you have, in real life (as opposed to this small data set) multiple observations for each combination of id, period and question?
So i want to know if there is a significant difference between the first period and the second period.. but I don't know how to do..
This is not the same question as discussed above, which said "I have to do this by id, by period (2 periods : 1 and 2) and by a variable question". So please clarify exactly what you want.
Hello @PaigeMiller ,
I have calculated the mean on a sql procedure grouping by id, period and question so, yes I Have a previous table where I have multiple observations.
Yes my bad, i want to know if the difference is significant grouping by id, period and question to say something like :
"For the id 00000_11111, and the question number 1, we can see a significant difference of the mean between period 1 (example : february 2019) et period 2 (example : january 2019)
PROC SQL doesn't let you compare means statistically.
You want to take your original data and use PROC TTEST, with BY ID PERIOD QUESTION; statement.
Examples are here: https://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=statug&docsetTarget=statug_tt...
and
I have already saw the documentation ^^'
This is what i have tried :
proc ttest data = p1_p2 ;
by code_niv per question ;
var restsi restsi2;
run ;
this is what the log look like :
1150 proc ttest data = p1_p2 ; 1151 by code_niv per question ; 1152 var restsi restsi2; 1153 run ; WARNING: There are insufficient nonmissing observations to create a density plot. WARNING: There are insufficient nonmissing observations to create a density plot. WARNING: Some graphs could not be produced due to lack of observations with valid response values. NOTE: The above message was for the following BY group: code_niv=10907 per=1 question=cim_q10 WARNING: There are insufficient nonmissing observations to create a density plot. WARNING: There are insufficient nonmissing observations to create a density plot. WARNING: Some graphs could not be produced due to lack of observations with valid response values. NOTE: The above message was for the following BY group: code_niv=10907 per=1 question=cim_q11 WARNING: There are insufficient nonmissing observations to create a density plot. WARNING: There are insufficient nonmissing observations to create a density plot. WARNING: Some graphs could not be produced due to lack of observations with valid response values. NOTE: The above message was for the following BY group: code_niv=10907 per=1 question=cim_q12b
EDIT :
Then i have tried this :
proc means data = P1_P2 noprint ;
var restsi restsi2 ;
by code_niv per question;
output out = test100000 ;
run ;
proc sort data = test100000 ; by code_niv per question ; run ;
proc ttest data = test100000 ;
by code_niv per question ;
var restsi restsi2;
run ;
but all the results are missing for the ttest procedure
Show us a portion of your dataset p1_p2.
yes, i reply in a few minutes, my SAS crashed after the code i have runned haha
Here an extract :
And here an extract sorted by code_niv (id) period and question :
As far as I can determine, you still have only one data point for each combination of Code_NIV PER and QUESTION.
Also please clarify: Are the data sets you showed really P1_P2 or the data set TEST100000? I specifically asked for P1_P2.
Yes you are right, my mistake..
Do you know what are my possibilities ?
@PaigeMiller wrote:As far as I can determine, you still have only one data point for each combination of Code_NIV PER and QUESTION.
Also please clarify: Are the data sets you showed really P1_P2 or the data set TEST100000? I specifically asked for P1_P2.
TEST1000000 is the output of the proc means, what i show to you is P1_P2.
The proc means doens't calculate the variance, i have decided (i think it is better) to have only one variable (restsi which contain restsi and restsi2) :
Data P1_P2_ (drop = restsi2);
set P1_P2 ;
if restsi = . then restsi = restsi2 ;
run ;
Proc means data = P1_P2_ noprint ;
var restsi ;
by code_niv per question ;
output out = test1000000;
run ;
Here an extract of P1_P2_ :
and here an extract of TEST100000 (from the proc means with P1_P2_)
@Onizuka wrote:
Yes you are right, my mistake..
Do you know what are my possibilities ?
There is no way to statistically compare means if you have only a single data point in each combination of variables of interest.
Ok.. Thank you for the answer
I permit me to up the topic ..
Someone has an idea ? :X
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.