Solved: Re: calculate missing values by ID for specific variables in a data se...

emilync · Posted 11-07-2023 01:08 PM

Hello,

How can I calculate percentage of missing values for each individual for specific variables in SAS?

Thanks!

PaigeMiller · Posted 11-07-2023 01:57 PM

Additional comment:

data FINAL;
    set summary_stats;
    percent_missing=100*nmiss/sum(n,nmiss);
run;

This is poor programming practice, creating a data set named FINAL when your original data set (the one used in PROC MEANS) was also named FINAL. You have overwritten your original data with the PROC MEANS output. I can't think of a situation where this is really a good thing to do.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 11-07-2023 01:12 PM

Please show us the layout of the data in the SAS data set. (It can be fake data, as long as the organization of the data is clear)

--
Paige Miller

ballardw · Posted 11-07-2023 01:13 PM

What do you currently have?

What rules are involved in the calculation?

Provide some example data in the form of a working data step, the rules involved and an example of the result.

Or at least post some values as text into a text box opened on the forum with </> icon above the message box.

Otherwise the answer is "42".

emilync · Posted 11-07-2023 01:25 PM

I have 11 items that measured across three time points. I need to calculate missing percentages for each individual and remove individuals who have %75 of missing values. I used the following syntax but it gave me the item level missingness. I need to calculate missing percentages for these variables for each individual.

Thanks!

generate counts of N/NMISS;
proc means data=FINAL stackods n nmiss NWAY;
class ID;
var 
EXP1_T1
EXP2_T1
EXP3_T1
EXP4_T1
EXP5_T1
INT1_T1
INT2_T1
INT3_T1R
UTL1_T1
UTL2_T1
UTL3_T1
EXP1_T2
EXP2_T2
EXP3_T2
EXP4_T2
EXP5_T2
INT1_T2
INT2_T2
INT3_T2R
UTL1_T2
UTL2_T2
UTL3_T2
EXP1_T3
EXP2_T3
EXP3_T3
EXP4_T3
EXP5_T3
INT1_T3
INT2_T3
INT3_T3R
UTL1_T3
UTL2_T3
UTL3_T3;
ods output summary=summary_stats;
run;

PaigeMiller · Posted 11-07-2023 01:33 PM

Add one more piece of code where the percent missing is computed in a DATA step.

data want;
    set summary_stats;
    percent_missing=100*nmiss/sum(n,nmiss);
run;

--
Paige Miller

emilync · Posted 11-07-2023 01:44 PM

Thank you. This code gave me whether the participant responded each question. However, I need to calculate how much percent missing values each participant have across three time points. And I need to exclude the participants who have %75 or above missing values across three waves. Is there something that I am missing?

data FINAL;
    set summary_stats;
    percent_missing=100*nmiss/sum(n,nmiss);
run;

PaigeMiller · Posted 11-07-2023 01:50 PM

The issue of "across 3 waves" has not been mentioned before and requires explanation. As far as the concept of eliminating people who have more than 75% of the data, once you compute the percents, you simply delete the rows where the percent is > 75%.

I repeat my earlier request that you show us the arrangement of the data (even if it is fake data, it must represent the actual arrangement of the data). Please note: last time I asked for the arrangement of the data, you provided SAS code, which is not what I was asking for. I want to see the (fake) data in its actual arrangement in your data set, so that it is obvious what you are working with and so that I can help you write code that works on your data.

--
Paige Miller

emilync · Posted 11-07-2023 01:57 PM

Sorry. It worked! I checked the results viewer section instead of the library. The percentage of missing values are available in the library.

Thank you so much!

PaigeMiller · Posted 11-07-2023 01:57 PM

Additional comment:

@emilync wrote:

data FINAL;
    set summary_stats;
    percent_missing=100*nmiss/sum(n,nmiss);
run;

This is poor programming practice, creating a data set named FINAL when your original data set (the one used in PROC MEANS) was also named FINAL. You have overwritten your original data with the PROC MEANS output. I can't think of a situation where this is really a good thing to do.

--
Paige Miller

emilync · Posted 11-07-2023 02:14 PM

Thank you! I forgot to ask the second part of the question. This is a longitudinal dataset that the same participants assessed across three time points. I attached the arrangement of the data that I used (SAV file). I really appreciate if you could provide a suggestion to remove individuals from the data set who have %75 or above missing data across three time points?

Reeza · Posted 11-07-2023 03:51 PM

Is -99 coded as missing? If so, proc means won't get you what you need (and neither will this though it can be easily modified).

Assuming this is a SAS data set (since it's a SAS forum) here's how I'd do it in SAS.

data exclusion_list;
set have;
by id;
retain tot_miss;
array _vars(*) ex: int: ut:;

nmissing = nmiss(of _vars(*));

if first.id then tot_miss=0;
tot_miss + nmissing;

if last.id then pct_missing = tot_miss / dim(_vars)*3;

if pct_missing >= 0.75;

keep id;
run;

proc sql;
create table included_records as
select * 
from have
where id not in (select id from exclusion_list);
quit;

Reeza · Posted 11-07-2023 01:43 PM

Just a suggestion, variable lists are helpful.

Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

calculate missing values by ID for specific variables in a data set

Re: calculate missing values by ID for specific variables in a data set

Re: calculate missing values by ID for specific variables in a data set

Re: calculate missing values by ID for specific variables in a data set

Re: calculate missing values by ID for specific variables in a data set

Re: calculate missing values by ID for specific variables in a data set

Re: calculate missing values by ID for specific variables in a data set

Re: calculate missing values by ID for specific variables in a data set

Re: calculate missing values by ID for specific variables in a data set

Re: calculate missing values by ID for specific variables in a data set

Re: calculate missing values by ID for specific variables in a data set

Re: calculate missing values by ID for specific variables in a data set

Re: calculate missing values by ID for specific variables in a data set

Registration is open