Re: Correlation analysis

robertrao

I posted this question earlier but didnt get what i needed so reposting it.

It is seen that the rating for Established patients was very high for the questions. ( All the data is not shown)

We waned to see for which "questions" between the two groups ( established or NEW) there was a statistically significant difference???

Please let me know how can we solve this

Thank you

ID DOMAIN QUESTION SCORE APPOINTMENT_TYPE

101 Med Practice: Care Provider Med Practice: CP Concern 100 Estblished

101 Med Practice: Care Provider Med Practice: CP Explanation 100 Estblished

101 Med Practice: Care Provider Med Practice: CP Discuss treatments 100 Estblished

101 Med Practice: Care Provider Med Practice: CP Efforts 100 Estblished

101 Med Practice: Access Med Practice: Ease of contacting 100 Estblished

101 Med Practice: Access Med Practice: Ease of scheduling 100 Estblished

101 Med Practice: Access Med Practice: CP Efforts 100 Estblished

102 Med Practice: Care Provider Med Practice: CP Concern 100 New

102 Med Practice: Care Provider Med Practice: CP Explanation 100 New

102 Med Practice: Care Provider Med Practice: CP Discuss treatments 100 New

102 Med Practice: Care Provider Med Practice: CP Efforts 100 New

102 Med Practice: Access Med Practice: Ease of contacting 100 New

102 Med Practice: Access Med Practice: Ease of scheduling 100 New

102 Med Practice: Access Med Practice: CP Efforts 100 New

PaigeMiller

Which two groups do you mean?

Statistically different based on what statistical test?

--
Paige Miller

robertrao

THE TWO groups are : under Appointment_Type variable.
Established and NEW.

I am guessing it will be a correlation procedure. I might be wrong!

Thank you

PaigeMiller

Correlation not usually used to show "statistically different".

Can you describe in words what comparison of the data you are thinking about? Please provide enough detail so we can follow along.

--
Paige Miller

Ksharp

Since your question is about STAT, better post it at stat forum:
https://communities.sas.com/t5/Statistical-Procedures/bd-p/statistical_procedures

Let more statistical experts to see it.
@StatDave @lvm @SteveDenham @jiltao ........

SteveDenham

There are simple ways, and more complete ways. For the simple way try:

proc ttest data=your_data;
by question;
class appointment_type;
var score;
run;

However, it appears that your data may be clustered by ID, with several questions for each. If there is any chance that the response within an ID is correlated, then perhaps a generalized estimating equation approach would be useful. Here is some example code:

proc gee data=your_data;
class appointment_type question ID;
model score = appointment_type question;
repeated subject=ID/ type=exch covb corrb;
run;

You may want to include an interaction term, if you want to compare mean scores for each question. That would lead to this code:

proc gee data=your_data;
class appointment_type question ID;
model score = appointment_type question appointment_type*question;
repeated subject=ID/ type=exch covb corrb;
lsmeans appointment_type*question;
slice appointment_type*question / diff sliceby=question;
run;

Hope some of this helps.

SteveDenham

robertrao

Hello Steve, Thank you for the code and explanation.

We noticed that there is a good rating score given for the questions in the Established category compared to the NEW and we did a TTEST of the averge score per ID between the two groups and found to be statistically significnt.

Now we are trying to find if the Established versus NEW groups had any correlation for the questions they answered.

For example : IF Med Practice: CP Discuss treatments question score was significantly different between the two groups.....or if other question did etc. etc.

Would PROC GEE still a good procedure for this kind of analysis?

Thank you, again

SteveDenham

Yes, it would. It would enable you to account for correlation within ID as well as the correlation/covariance across ID's.

SteveDenham

robertrao

ID is the patient ID and each patient takes the survey and gives the score for each question in the domain(s).

When we look for any correlation like in our study trying to find which question did the Established favor more ( since we know Established had high Mean scores), do we need to use ID in the PROC GEE?

Also would you share a document to interpret the results ?

This is the result after using PROC GEE. Does this say anything ? If there was a difference for any of the questions each group answered?

Thank you

SteveDenham

I see the solution vector, but the sliced least squares mean differences are what would address your research question.

SteveDenham

SAS Innovate 2025: Call for Content