turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Multilevel factor Analysis in SAS?

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-24-2010 09:50 PM

Hi - I am developing a psychological measurement scale - and wish to do a factor analysis on the data that I have collected using my scale. However, the data has been collected by asking people to score it repeatedly over several days (its a drug withdrawal measurement scale - and I get people to fill it in every day for 3 weeks as they go through a withdrawal episode).

I have found that Multilevel Factor Analysis is appropriate for such data - but I dont see any obvious way to implement it using SAS...it can be done in STATA and in M-Plus - but I can't seem to find an example in SAS....

Can anybody help me with this?

Thanks

Dave

I have found that Multilevel Factor Analysis is appropriate for such data - but I dont see any obvious way to implement it using SAS...it can be done in STATA and in M-Plus - but I can't seem to find an example in SAS....

Can anybody help me with this?

Thanks

Dave

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

08-25-2010 05:41 PM

You might be able to use the NLMIXED procedure to fit such a model. However, I would want to know a bit more about the data and assumptions that you would employ in the factor analysis. Among the things that one might want to know would be the following:

1) How many variables are measured?

2) How many factors do you believe there are?

3) What is the within-person model that you wish to assume for the measured variables over time?

A simple factor analysis in which there are 10 measured items and two factors can be modeled with code like the following:

proc nlmixed;

parms L1_2-L1_10 1 L2_1-L2_9 1;

L1_1 = 1; /* Constraint establishes identifiability */

L2_10 = 1; /* Constraint establishes identifiability */

mu1 = L1_1*F1 + L2_1*F2;

mu2 = L1_2*F1 + L2_2*F2;

mu3 = L1_3*F1 + L2_3*F2;

mu4 = L1_4*F1 + L2_4*F2;

mu5 = L1_5*F1 + L2_5*F2;

mu6 = L1_6*F1 + L2_6*F2;

mu7 = L1_7*F1 + L2_7*F2;

mu8 = L1_8*F1 + L2_8*F2;

mu9 = L1_9*F1 + L2_9*F2;

mu10 = L1_10*F1 + L2_10*F2;

ll = -0.5*(log(Ve1) - ((x1-mu1)**2)/Ve1 +

log(Ve2) - ((x2-mu2)**2)/Ve2 +

log(Ve3) - ((x3-mu3)**2)/Ve3 +

...

log(Ve10) - ((x10-mu10)**2)/Ve10 );

rho = (exp(2*Z) - 1) / (exp(2*Z) + 1);

model ll ~ general(ll);

random F1 F2 ~ normal([0,0], [VF1, rho*sqrt(VF1*Vf2), VF2]) subject=person;

run;

In the above model, F1 and F2 are latent factor, parameters Li_j (i=1,2, j=1,2,...,10) are the loadings of variable j on latent factor i, Vej (j=1,2,...,10) are the variances of the stochastic component of variable Xj, VF1 and VF2 are variances of factors F1 and F2, and rho is the covariance of the two factors.

One could constrain some of these parameters for a more parsimonious model. For instance, we might believe that the factor loadings for F1 on variables X7 through X10 should be 0. Similarly, we might believe that the factor loadings for F2 on variables X1 through X4 are 0. (So, our model would state that only the two observed variables X5 and X6 were functions of both factors.) We would then have:

proc nlmixed;

parms L1_2-L1_6 1 L2_7-L2_9 1;

L1_1 = 1; /* Constraint establishes identifiability */

L2_10 = 1; /* Constraint establishes identifiability */

mu1 = L1_1*F1;

mu2 = L1_2*F1;

mu3 = L1_3*F1;

mu4 = L1_4*F1;

mu5 = L1_5*F1 + L2_5*F2;

mu6 = L1_6*F1 + L2_6*F2;

mu7 = L2_7*F2;

mu8 = L2_8*F2;

mu9 = L2_9*F2;

mu10 = L2_10*F2;

ll = -0.5*(log(Ve1) - ((x1-mu1)**2)/Ve1 +

log(Ve2) - ((x2-mu2)**2)/Ve2 +

log(Ve3) - ((x3-mu3)**2)/Ve3 +

...

log(Ve10) - ((x10-mu10)**2)/Ve10 );

rho = (exp(2*Z) - 1) / (exp(2*Z) + 1);

model ll ~ general(ll);

random F1 F2 ~ normal([0,0], [VF1, rho*sqrt(VF1*Vf2), VF2]) subject=person;

run;

The above code is not intended to take into account the multiple responses per subject. However, it could be extended to account for multiple observations on each person. Without adding any code, but simply adding additional records with the same person ID value, you would fit a model in which an individual had a latent factor which was consistent across the repeat measurements for that person.

There might be some question as to whether you want to model a time effect and also whether there should be some correlation structure for residuals over time. To get into the latter would require quite a bit of additional code as well as restructuring the data. I don't have time to go into how the code should be modified to account for a correlation structure among the residuals.

It should also be noted that the above code does not perform any rotation of the latent factors.

1) How many variables are measured?

2) How many factors do you believe there are?

3) What is the within-person model that you wish to assume for the measured variables over time?

A simple factor analysis in which there are 10 measured items and two factors can be modeled with code like the following:

proc nlmixed;

parms L1_2-L1_10 1 L2_1-L2_9 1;

L1_1 = 1; /* Constraint establishes identifiability */

L2_10 = 1; /* Constraint establishes identifiability */

mu1 = L1_1*F1 + L2_1*F2;

mu2 = L1_2*F1 + L2_2*F2;

mu3 = L1_3*F1 + L2_3*F2;

mu4 = L1_4*F1 + L2_4*F2;

mu5 = L1_5*F1 + L2_5*F2;

mu6 = L1_6*F1 + L2_6*F2;

mu7 = L1_7*F1 + L2_7*F2;

mu8 = L1_8*F1 + L2_8*F2;

mu9 = L1_9*F1 + L2_9*F2;

mu10 = L1_10*F1 + L2_10*F2;

ll = -0.5*(log(Ve1) - ((x1-mu1)**2)/Ve1 +

log(Ve2) - ((x2-mu2)**2)/Ve2 +

log(Ve3) - ((x3-mu3)**2)/Ve3 +

...

log(Ve10) - ((x10-mu10)**2)/Ve10 );

rho = (exp(2*Z) - 1) / (exp(2*Z) + 1);

model ll ~ general(ll);

random F1 F2 ~ normal([0,0], [VF1, rho*sqrt(VF1*Vf2), VF2]) subject=person;

run;

In the above model, F1 and F2 are latent factor, parameters Li_j (i=1,2, j=1,2,...,10) are the loadings of variable j on latent factor i, Vej (j=1,2,...,10) are the variances of the stochastic component of variable Xj, VF1 and VF2 are variances of factors F1 and F2, and rho is the covariance of the two factors.

One could constrain some of these parameters for a more parsimonious model. For instance, we might believe that the factor loadings for F1 on variables X7 through X10 should be 0. Similarly, we might believe that the factor loadings for F2 on variables X1 through X4 are 0. (So, our model would state that only the two observed variables X5 and X6 were functions of both factors.) We would then have:

proc nlmixed;

parms L1_2-L1_6 1 L2_7-L2_9 1;

L1_1 = 1; /* Constraint establishes identifiability */

L2_10 = 1; /* Constraint establishes identifiability */

mu1 = L1_1*F1;

mu2 = L1_2*F1;

mu3 = L1_3*F1;

mu4 = L1_4*F1;

mu5 = L1_5*F1 + L2_5*F2;

mu6 = L1_6*F1 + L2_6*F2;

mu7 = L2_7*F2;

mu8 = L2_8*F2;

mu9 = L2_9*F2;

mu10 = L2_10*F2;

ll = -0.5*(log(Ve1) - ((x1-mu1)**2)/Ve1 +

log(Ve2) - ((x2-mu2)**2)/Ve2 +

log(Ve3) - ((x3-mu3)**2)/Ve3 +

...

log(Ve10) - ((x10-mu10)**2)/Ve10 );

rho = (exp(2*Z) - 1) / (exp(2*Z) + 1);

model ll ~ general(ll);

random F1 F2 ~ normal([0,0], [VF1, rho*sqrt(VF1*Vf2), VF2]) subject=person;

run;

The above code is not intended to take into account the multiple responses per subject. However, it could be extended to account for multiple observations on each person. Without adding any code, but simply adding additional records with the same person ID value, you would fit a model in which an individual had a latent factor which was consistent across the repeat measurements for that person.

There might be some question as to whether you want to model a time effect and also whether there should be some correlation structure for residuals over time. To get into the latter would require quite a bit of additional code as well as restructuring the data. I don't have time to go into how the code should be modified to account for a correlation structure among the residuals.

It should also be noted that the above code does not perform any rotation of the latent factors.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Dale

08-25-2010 07:54 PM

Just FYI, I got the basic idea for how to fit a factor analysis using the NLMIXED procedure from a paper published in Biostatistics in 2006. That paper performed a latent class regression on latent factors.

As I thought about the code, it seemed that the method which was employed to establish identifiability of the latent factors could be managed differently. Identifiability was established by restricting the factor loading of a factor on one variable to be 1. Instead of restricting the factor loading for an arbitrary variable, it would probably be preferable to set the variance of the factor to 1 and estimate a factor loading for each variable. Revised code for the first model previously presented would be:

proc nlmixed;

parms L1_1-L1_10 1 L2_1-L2_10 1;

mu1 = L1_1*F1 + L2_1*F2;

mu2 = L1_2*F1 + L2_2*F2;

mu3 = L1_3*F1 + L2_3*F2;

mu4 = L1_4*F1 + L2_4*F2;

mu5 = L1_5*F1 + L2_5*F2;

mu6 = L1_6*F1 + L2_6*F2;

mu7 = L1_7*F1 + L2_7*F2;

mu8 = L1_8*F1 + L2_8*F2;

mu9 = L1_9*F1 + L2_9*F2;

mu10 = L1_10*F1 + L2_10*F2;

ll = -0.5*(log(Ve1) - ((x1-mu1)**2)/Ve1 +

log(Ve2) - ((x2-mu2)**2)/Ve2 +

log(Ve3) - ((x3-mu3)**2)/Ve3 +

...

log(Ve10) - ((x10-mu10)**2)/Ve10 );

rho = (exp(2*Z) - 1) / (exp(2*Z) + 1);

model ll ~ general(ll);

random F1 F2 ~ normal([0,0], [1, rho, 1]) subject=person;

run;

As I thought about the code, it seemed that the method which was employed to establish identifiability of the latent factors could be managed differently. Identifiability was established by restricting the factor loading of a factor on one variable to be 1. Instead of restricting the factor loading for an arbitrary variable, it would probably be preferable to set the variance of the factor to 1 and estimate a factor loading for each variable. Revised code for the first model previously presented would be:

proc nlmixed;

parms L1_1-L1_10 1 L2_1-L2_10 1;

mu1 = L1_1*F1 + L2_1*F2;

mu2 = L1_2*F1 + L2_2*F2;

mu3 = L1_3*F1 + L2_3*F2;

mu4 = L1_4*F1 + L2_4*F2;

mu5 = L1_5*F1 + L2_5*F2;

mu6 = L1_6*F1 + L2_6*F2;

mu7 = L1_7*F1 + L2_7*F2;

mu8 = L1_8*F1 + L2_8*F2;

mu9 = L1_9*F1 + L2_9*F2;

mu10 = L1_10*F1 + L2_10*F2;

ll = -0.5*(log(Ve1) - ((x1-mu1)**2)/Ve1 +

log(Ve2) - ((x2-mu2)**2)/Ve2 +

log(Ve3) - ((x3-mu3)**2)/Ve3 +

...

log(Ve10) - ((x10-mu10)**2)/Ve10 );

rho = (exp(2*Z) - 1) / (exp(2*Z) + 1);

model ll ~ general(ll);

random F1 F2 ~ normal([0,0], [1, rho, 1]) subject=person;

run;