Hello!
I am preparing data for Survival analysis.I have noticed that my data (predictor variables) has many missing values (some variables 15% of data is missing observations).
Do I need to delete missing observations?
Thanks,
Best regards.
Most procedures by default will ignore missing values and the regression type procedures will generally ignore records with "required" variables, such as predictors, that have missing values. You will generally get a diagnostic that says something along the lines of "n records used". That is why SAS has the special value of "missing".
If you want to test this behavior try running the procedure twice once with all of the data and again selecting records without the problem:
data =yourdataset name (where=(not missing(variablename ))) ;
The where dataset option can, when the selection is not too complex, filter the data for tests like this to see if the results change without modifying the dataset. Some procedures also have a separate where statement that can be a bit more flexible than the dataset option as it may allow use of functions involving one or more variables instead of just a list of values.
I wouldn't delete the records for the first passes through the data.
You may want to investigate imputation if too many variable combinations are missing.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.