Hello,
How can I calculate percentage of missing values for each individual for specific variables in SAS?
Thanks!
Additional comment:
@emilync wrote:
data FINAL; set summary_stats; percent_missing=100*nmiss/sum(n,nmiss); run;
This is poor programming practice, creating a data set named FINAL when your original data set (the one used in PROC MEANS) was also named FINAL. You have overwritten your original data with the PROC MEANS output. I can't think of a situation where this is really a good thing to do.
Please show us the layout of the data in the SAS data set. (It can be fake data, as long as the organization of the data is clear)
What do you currently have?
What rules are involved in the calculation?
Provide some example data in the form of a working data step, the rules involved and an example of the result.
Or at least post some values as text into a text box opened on the forum with </> icon above the message box.
Otherwise the answer is "42".
I have 11 items that measured across three time points. I need to calculate missing percentages for each individual and remove individuals who have %75 of missing values. I used the following syntax but it gave me the item level missingness. I need to calculate missing percentages for these variables for each individual.
Thanks!
generate counts of N/NMISS; proc means data=FINAL stackods n nmiss NWAY; class ID; var EXP1_T1 EXP2_T1 EXP3_T1 EXP4_T1 EXP5_T1 INT1_T1 INT2_T1 INT3_T1R UTL1_T1 UTL2_T1 UTL3_T1 EXP1_T2 EXP2_T2 EXP3_T2 EXP4_T2 EXP5_T2 INT1_T2 INT2_T2 INT3_T2R UTL1_T2 UTL2_T2 UTL3_T2 EXP1_T3 EXP2_T3 EXP3_T3 EXP4_T3 EXP5_T3 INT1_T3 INT2_T3 INT3_T3R UTL1_T3 UTL2_T3 UTL3_T3; ods output summary=summary_stats; run;
Add one more piece of code where the percent missing is computed in a DATA step.
data want;
set summary_stats;
percent_missing=100*nmiss/sum(n,nmiss);
run;
Thank you. This code gave me whether the participant responded each question. However, I need to calculate how much percent missing values each participant have across three time points. And I need to exclude the participants who have %75 or above missing values across three waves. Is there something that I am missing?
data FINAL; set summary_stats; percent_missing=100*nmiss/sum(n,nmiss); run;
The issue of "across 3 waves" has not been mentioned before and requires explanation. As far as the concept of eliminating people who have more than 75% of the data, once you compute the percents, you simply delete the rows where the percent is > 75%.
I repeat my earlier request that you show us the arrangement of the data (even if it is fake data, it must represent the actual arrangement of the data). Please note: last time I asked for the arrangement of the data, you provided SAS code, which is not what I was asking for. I want to see the (fake) data in its actual arrangement in your data set, so that it is obvious what you are working with and so that I can help you write code that works on your data.
Sorry. It worked! I checked the results viewer section instead of the library. The percentage of missing values are available in the library.
Thank you so much!
Additional comment:
@emilync wrote:
data FINAL; set summary_stats; percent_missing=100*nmiss/sum(n,nmiss); run;
This is poor programming practice, creating a data set named FINAL when your original data set (the one used in PROC MEANS) was also named FINAL. You have overwritten your original data with the PROC MEANS output. I can't think of a situation where this is really a good thing to do.
Thank you! I forgot to ask the second part of the question. This is a longitudinal dataset that the same participants assessed across three time points. I attached the arrangement of the data that I used (SAV file). I really appreciate if you could provide a suggestion to remove individuals from the data set who have %75 or above missing data across three time points?
Is -99 coded as missing? If so, proc means won't get you what you need (and neither will this though it can be easily modified).
Assuming this is a SAS data set (since it's a SAS forum) here's how I'd do it in SAS.
data exclusion_list;
set have;
by id;
retain tot_miss;
array _vars(*) ex: int: ut:;
nmissing = nmiss(of _vars(*));
if first.id then tot_miss=0;
tot_miss + nmissing;
if last.id then pct_missing = tot_miss / dim(_vars)*3;
if pct_missing >= 0.75;
keep id;
run;
proc sql;
create table included_records as
select *
from have
where id not in (select id from exclusion_list);
quit;
Just a suggestion, variable lists are helpful.
Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.