BookmarkSubscribeRSS Feed
NKormanik
Barite | Level 11

My dataset has 10 variables and 2000 cases. All variables are continuous. I would like to "standardize" each variable column. Then average those 10 columns, and compare the summary averages for each case, say, sorting from high to low.

 

I know that several variables data is bi-modal, as opposed to centered, with more data occurring at the extremes.

 

I'm wondering what the best standardization method might be. SAS offers several. STD, MAD, IQR, ABW, and others.

 

STD is common -- converting to Z-score: (X1 - mean of X1)/standard deviation of X1. Some of the others are apparently more 'robust,' however, with respect to outliers, and, I suppose, certain other data anomilies.

 

I'm tentatively thinking of using one of the more esoteric 'robust' ones, such as IQR, based on an example given in SAS documentation.

 

http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_stdize_getti...

 

I'd greatly appreciate hearing your thoughts or suggestions on how best to proceed.

 

Nicholas Kormanik

 

 

4 REPLIES 4
Ksharp
Super User
If you want score each obs , check Prime Component Analysis. 
Proc Prim
But it is usually use Z-Score to standardize .


NKormanik
Barite | Level 11

Hi Ksharp.  I'm not finding Prime Components Analysis or Proc Prim anywhere.

 

NKormanik
Barite | Level 11

With todays computing power, and SAS algorithms, I'm suspecting that an all-around better method of standardization now exists, than traditional std.

 

True or not?  And which one is the new top method?

 

Thanks for comments.

 

mkeintz
PROC Star

I suspect @Ksharp meant principal components analysis, which can be performed in PROC PRINCOMP.

 

The principal components are based on the correlation matrix of the original variables, which as he said, means you are effectively using Z-scores. But the great thing about PCA is that it will produce a linear combination that would account for the greatest possible amount of variation among the original variables.  That would be in Principal Component 1.   Principal Component 2, a second linear combo of the original vars, would  account for  the largest amount of variation left over after PC1.    Etc., etc.

 

Mark

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1109 views
  • 0 likes
  • 3 in conversation