BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
emilync
Fluorite | Level 6

Hello,

How can I calculate percentage of missing values for each individual for specific variables in SAS?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Additional comment:

 


@emilync wrote:

 

data FINAL;
    set summary_stats;
    percent_missing=100*nmiss/sum(n,nmiss);
run;

 


 

This is poor programming practice, creating a data set named FINAL when your original data set (the one used in PROC MEANS) was also named FINAL. You have overwritten your original data with the PROC MEANS output. I can't think of a situation where this is really a good thing to do.

--
Paige Miller

View solution in original post

11 REPLIES 11
PaigeMiller
Diamond | Level 26

Please show us the layout of the data in the SAS data set. (It can be fake data, as long as the organization of the data is clear)

--
Paige Miller
ballardw
Super User

What do you currently have?

What rules are involved in the calculation?

 

Provide some example data in the form of a working data step, the rules involved and an example of the result.

Or at least post some values as text into a text box opened on the forum with </> icon above the message box.

 

Otherwise the answer is "42".

emilync
Fluorite | Level 6

I have 11 items that measured across three time points. I need to calculate missing percentages for each individual and remove individuals who have %75 of missing values. I used the following syntax but it gave me the item level missingness. I need to calculate missing percentages for these variables for each individual.

Thanks!

generate counts of N/NMISS;
proc means data=FINAL stackods n nmiss NWAY;
class ID;
var 
EXP1_T1
EXP2_T1
EXP3_T1
EXP4_T1
EXP5_T1
INT1_T1
INT2_T1
INT3_T1R
UTL1_T1
UTL2_T1
UTL3_T1
EXP1_T2
EXP2_T2
EXP3_T2
EXP4_T2
EXP5_T2
INT1_T2
INT2_T2
INT3_T2R
UTL1_T2
UTL2_T2
UTL3_T2
EXP1_T3
EXP2_T3
EXP3_T3
EXP4_T3
EXP5_T3
INT1_T3
INT2_T3
INT3_T3R
UTL1_T3
UTL2_T3
UTL3_T3;
ods output summary=summary_stats;
run;
PaigeMiller
Diamond | Level 26

Add one more piece of code where the percent missing is computed in a DATA step.

 

data want;
    set summary_stats;
    percent_missing=100*nmiss/sum(n,nmiss);
run;

 

--
Paige Miller
emilync
Fluorite | Level 6

Thank you. This code gave me whether the participant responded each question. However, I need to calculate how much percent missing values each participant have across three time points. And I need to exclude the participants who have %75 or above missing values across three waves. Is there something that I am missing?

 

data FINAL;
    set summary_stats;
    percent_missing=100*nmiss/sum(n,nmiss);
run;

 emilync_0-1699382352642.png

PaigeMiller
Diamond | Level 26

The issue of "across 3 waves" has not been mentioned before and requires explanation. As far as the concept of eliminating people who have more than 75% of the data, once you compute the percents, you simply delete the rows where the percent is > 75%.

 

I repeat my earlier request that you show us the arrangement of the data (even if it is fake data, it must represent the actual arrangement of the data). Please note: last time I asked for the arrangement of the data, you provided SAS code, which is not what I was asking for. I want to see the (fake) data in its actual arrangement in your data set, so that it is obvious what you are working with and so that I can help you write code that works on your data.

--
Paige Miller
emilync
Fluorite | Level 6

emilync_0-1699383353777.png

Sorry. It worked!  I checked the results viewer section instead of the library. The percentage of missing values are available in the library.

Thank you so much!

 

PaigeMiller
Diamond | Level 26

Additional comment:

 


@emilync wrote:

 

data FINAL;
    set summary_stats;
    percent_missing=100*nmiss/sum(n,nmiss);
run;

 


 

This is poor programming practice, creating a data set named FINAL when your original data set (the one used in PROC MEANS) was also named FINAL. You have overwritten your original data with the PROC MEANS output. I can't think of a situation where this is really a good thing to do.

--
Paige Miller
emilync
Fluorite | Level 6

 Thank you! I forgot to ask the second part of the question. This is a longitudinal dataset that the same participants assessed across three time points. I attached the arrangement of the data that I used (SAV file). I really appreciate if you could provide a suggestion to remove individuals from the data set who have %75 or above missing data across three time points?

 

emilync_0-1699384244044.png

 

 

Reeza
Super User

Is -99 coded as missing? If so, proc means won't get you what you need (and neither will this though it can be easily modified).

 

Assuming this is a SAS data set (since it's a SAS forum) here's how I'd do it in SAS.

 

data exclusion_list;
set have;
by id;
retain tot_miss;
array _vars(*) ex: int: ut:;

nmissing = nmiss(of _vars(*));

if first.id then tot_miss=0;
tot_miss + nmissing;

if last.id then pct_missing = tot_miss / dim(_vars)*3;

if pct_missing >= 0.75;

keep id;
run;

proc sql;
create table included_records as
select * 
from have
where id not in (select id from exclusion_list);
quit;
Reeza
Super User

Just a suggestion, variable lists are helpful.

 

Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

 

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 2585 views
  • 3 likes
  • 4 in conversation