Dear all
I'd like to delete variables from a dataset based on their values in an outstat set created by PROC FACTOR.
proc factor
OUTSTAT=fout
data=normed
method=principal scree
mineigen=0
priors=smc
var
say -- confirm ;
run;
The code above creates an outstat data set called fout. This dataset stores the communality values (among other values) in a row like this:
_TYPE_ | NAME | var1 | var2 | var_n |
COMMUNAL | 0.3995 | 0.1133 | 0.3744 |
The full outstat dataset (fout) is attached in CSV format. The row in question is #205.
I'd like to delete variables that have COMMUNAL that are less than .15. In this case, variable var2 would be dropped from the dataset normed.
I can do this by including the variables in a data statement:
DATA normed (DROP = var2);
SET normed;
RUN;
But since this involves checking hundreds of variables, it'd be nice if it could be automated.
Could this process be automated in a program?
I'm using SAS University Edition.
Thank you all ahead for your time!
Tony
I am going to assume that you really want to delete any column that has a value less than 0.15 and greater than -0.15 even though you didn't say that.
Something like this (UNTESTED CODE)
data fout2;
set fout;
array v var1-varn;
do 1 = 1 to dim(v);
v(i)=abs(v(i));
end;
run;
proc summary data=fout2;
var var1-varn;
output out=minimums min=;
run;
proc transpose data=minimums out=minimums_t;
var var1-varn;
run;
proc sql;
select _name_ into :names separated by ' ' from minimums_t
where col1<0.15;
quit;
data want;
set fout;
drop &names;
run;
I am going to assume that you really want to delete any column that has a value less than 0.15 and greater than -0.15 even though you didn't say that.
Something like this (UNTESTED CODE)
data fout2;
set fout;
array v var1-varn;
do 1 = 1 to dim(v);
v(i)=abs(v(i));
end;
run;
proc summary data=fout2;
var var1-varn;
output out=minimums min=;
run;
proc transpose data=minimums out=minimums_t;
var var1-varn;
run;
proc sql;
select _name_ into :names separated by ' ' from minimums_t
where col1<0.15;
quit;
data want;
set fout;
drop &names;
run;
Thank you for your reply.
I just want to remove the vars with values less than .15, regardless of their negative values.
I oversimplified the data table for fout in my original post. The variable names are not numeric. The table looks more like this:
_TYPE_ | _NAME_ | say | coronavirus | covid | people | time | take | make | health |
MEAN | 6.83115306 | 2.98523827 | 3.30916916 | 3.42976631 | 2.12912599 | 1.78743404 | 1.91479591 | 2.58683648 | |
STD | 5.93316505 | 3.26815184 | 3.48430046 | 3.6285918 | 2.05970761 | 1.75296901 | 1.85799116 | 3.5991221 | |
N | 211455 | 211455 | 211455 | 211455 | 211455 | 211455 | 211455 | 211455 | |
CORR | say | 1 | 0.15361343 | 0.01273492 | 0.16232542 | -0.1062952 | 0.02024781 | -0.030496 | 0.14823108 |
CORR | coronavirus | 0.15361343 | 1 | -0.0494022 | 0.13147976 | -0.1120803 | 0.01712126 | -0.1051971 | 0.16906057 |
COMMUNAL | 0.3857993 | 0.42459764 | 0.42993311 | 0.35747462 | 0.20179172 | 0.13523689 | 0.22256621 | 0.54063112 | |
PRIORS | 0.2803741 | 0.33664296 | 0.34151796 | 0.26550787 | 0.14132677 | 0.07544468 | 0.1414966 | ||
EIGENVAL | 6.09010089 | 4.14327681 | 3.2221195 | 2.32712338 | 2.29719713 | 1.82682613 | 1.54001848 |
The line with the values is 'COMMUNAL'. Based on the cut-off of .15, the variable 'take' would be dropped.
The array is throwing an error:
73 data fout2;
74 set fout;
75 array v var1-varn;
ERROR: Missing numeric suffix on a numbered variable list (var1-varn).
WARNING: Defining an array with zero elements.
76 do 1 = 1 to dim(v);
_
80
200
ERROR 80-322: Expecting a variable name.
ERROR 200-322: The symbol is not recognized and will be ignored.
77 v(i)=abs(v(i));
78 end;
79 run;
Base on @PaigeMiller's code, this worked:
data fout2;
set fout (where=(_TYPE_="COMMUNAL"));
run;
proc transpose data=fout2 out=communal; id _TYPE_; run;
proc sql;
select _name_ into :names separated by ' ' from communal
where communal <.15;
quit;
/* drop variables with low communalities from data set */
data normed_clean ;
set normed ;
drop &names;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.