BookmarkSubscribeRSS Feed
Golumn
Calcite | Level 5

Howdy, 

 

Been getting a crash course in using SAS EG to conduct initial Principle Component Analysis. 

On the question of nulls. I understand each null can be different in meaning, omission of response may have value given the variable. Also aware of potentially assigning 0, taking the mean or median, or adjusting. I'm currently looking to see if null values are heavily concentrated in a few variables for easy fix but I'm suspecting they will be wide spread. 

 

Given I have alot of variables, and each variable I'm referencing a data-dictionary to understand scope and meaning of nulls, as well as variable value, is there a procedure anyone has fallen back on in EG which an address widespread nulls? 

 

I'm trying to avoid going through each variable one by one, referencing the data dictionary, and adjusting each null. I've done this for transformation of character variables into numeric. I understand there might not be a good method other than plug and chug, I'm wanting to ensure I'm using my time wisely. 

1 REPLY 1
PaigeMiller
Diamond | Level 26

There's no really easy way forward here. And there's no generally agreed upon "best approach". It all depends on your data, and how much you know about each variable, and how much time and effort are you willing to put into the handling of missings.

 

If you understand ALL of the variables, you can do intelligent things like impute values for the missing in each variable. The imputation for X1 could very well be performed differently than the imputation for X2.

 

On the other hand, if you need to just handle all variables in bulk, you could assign the mean to each variable, but this has its own drawbacks.

 

Another approach is to create dummy variables for each continuous variable, where if X1 is missing, you assign it a value of the mean, and assign DUMMY1 to have a 1 to indicate missing, and zero elsewhere.

 

Yet other people bin each variable into 8–10 bins, with missing being an additional and separate bin.

 

And since I have appointed myself the PCA spelling police, I point out that it is "Principal Components", not "Principle Components".

--
Paige Miller

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 707 views
  • 0 likes
  • 2 in conversation