BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MikeTurner
Calcite | Level 5

I have a few dummy variables as follow, and I need to do factor analysis on them. Anyone can help me out with this?

x1 x2 x3 x4

1    1   0   1

1    0   1   1

0    0   1   0

1 ACCEPTED SOLUTION

Accepted Solutions
sounpra
Fluorite | Level 6

I agree with PaigeMiller regarding ‘not knowing’ the true intentions of the original poster.  My original response was just pointing out that one can run factor analysis in the presence of binary or ordinal variables instead of having all continuous.  As pointed out by AA1973, SAS does have some limitations in this area.  Muthen (1978) and Muthen et al (1997) provides a discussion on factor analysis models with binary data using a robust weighted least squares estimator.

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

I think it is rather meaningless to use a technique like Factor Analysis (which was designed for continuous variables) on dummy variables. I can't imagine what the interpretation of the results would be.

Depending on what you are trying to do, something like Correspondence Analysis (designed for categorical variables) might be a better choice.

--
Paige Miller
PGStats
Opal | Level 21

I agree with PaigeMiller. What is the purpose of your analysis? To find relationships among the variables, or to identify patterns or groups among your observations?

PG

PG
sounpra
Fluorite | Level 6

If you believe the underneath indicators are continuous latent variables, then you can create a matrix of tetrachoric correlations and use that matrix for your factor analysis.  See the following SAS thread:  https://communities.sas.com/thread/34748

PaigeMiller
Diamond | Level 26

It is hard for me to see how dummy variables can represent a situation where "the underneath indicators are continuous latent variables".

Even if such a situation exists, we don't know if the original poster's dummy variables qualify to represent continuous latent variables.

--
Paige Miller
saira86
Calcite | Level 5

hi i need your help.

my one variable is dummy. the variable is latent and has five dimension. each dimension has 5 to 6 items. now i want to do a factoranalysis of that varible to check that these items under each dimension can be taken as it is or remove form the list to make the instrument reliable to measure the variable.  and the variable is Corporate social responsibility.

 

the other variable is competition which is meaured through HH index.

 

kindly suggest me to move on.....

thanks

saira86
Calcite | Level 5


hi i need your help.

my one variable is dummy. the variable is latent and has five dimension. each dimension has 5 to 6 items. now i want to do a factoranalysis of that varible to check that these items under each dimension can be taken as it is or remove form the list to make the instrument reliable to measure the variable. and the variable is Corporate social responsibility.



the other variable is competition which is meaured through HH index.



kindly suggest me to move on.....

thanks
AA1973
Calcite | Level 5

oh jeez, I might get some heat for this, but I think that as long as you can have some direction for the variables or items that allow you to come up with a reasonable interpretation for the solution, then it's fine to run a principal component analysis or even a factor analysis (with an extraction different than maximum likelihood) with the usual proc factor, because the analysis is exploratory and descriptive. You are not testing anything, all you want to know is whether there are some natural groupings among the items or variables. The factor solution should give you an indication of that.

However, the situation is more complicated if you want to do a confirmatory factor analysis. In that case a specified model for the variables is tested and unfortunately I do not think that proc calis offers the most up-to-date methodology for "easily" testing those models with variables that are not continuous (and normally distributed), unless you have a huge sample size. You might have to use Mplus for that, yikes.        

PGStats
Opal | Level 21

Here is a simple technique (mostly graphical) for exploring the similarity between binary variables, assuming you have more than 4 variables and not too many observations. It is illustrated with purely random Bernouilli trials.

data test(drop=_:);
array x{8};
call streaminit(1233);
do id = 1 to 50;
do _i = 1 to dim(x);
x{_i} = rand("BERNOULLI", 0.25);
end;
output;
end;
run;

proc transpose data=test out=ttest prefix=id;
var x:;
id id;
run;

proc distance data=ttest method=braycurtis out=braytest;
var anominal(id:/absent=0);
id _NAME_;
run;


/* Cluster analysis of the variables. Similarity is illustrated on a dendrogram. */
proc cluster data=braytest method=average outtree=trtest print=0;
id _NAME_;
run;

proc tree horizontal data=trtest;
run;

/* 2-dimension representation of the similarity between variables. */
proc distance data=ttest method=dice out=dicetest;
var anominal(id:/absent=0);
id _NAME_;
run;

ods graphics on;
proc mds data=dicetest out=mdstest fit=distance dim=2;
id _NAME_;
run;

The same approach, without the transposition, can be applied to explore the similarity between observations.

PG

PG
PaigeMiller
Diamond | Level 26

Lots of good ideas in this thread, if only we knew what the original poster really wanted to do with his dummy variables.

--
Paige Miller
sounpra
Fluorite | Level 6

I agree with PaigeMiller regarding ‘not knowing’ the true intentions of the original poster.  My original response was just pointing out that one can run factor analysis in the presence of binary or ordinal variables instead of having all continuous.  As pointed out by AA1973, SAS does have some limitations in this area.  Muthen (1978) and Muthen et al (1997) provides a discussion on factor analysis models with binary data using a robust weighted least squares estimator.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 12286 views
  • 6 likes
  • 6 in conversation