BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Valeidoscope
Fluorite | Level 6

I have a dataset that looks like this 

idABCD
1Missing111
2Missing Missing 1Missing
3111Missing
41Missing 11
51Missing Missing 1

 

I want to have a column named "Missing" which show which variables equal "Missing" for each ID. The goal is something similar to this:

idABCDMissing 
1Missing111A
2Missing Missing 1MissingA,B,D
3111MissingD
41Missing 11B
51Missing Missing 1B,C

 

Any ideas for how to achieve this would be helpful. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@Valeidoscope wrote:

Hi @Reeza, the goal is not quite identifying patterns of missing data; I just need to know which variables are missing for each unique ID. The actual data set has 32 variables so it's difficult to tell manually.

 

Variables are a mix of Numeric and Character but could be converted to Numeric if necessary. I don't care about the order of the variables in the list.

Thanks


Names of the variables? Since your desired output is apparently the names of the variables concatenated together with a comma delimiter then we need to be able to know the maximum number of characters that could get stuck into that variable or do you want to use the 32*32 (maximum number of characters for 32 variable names)+31 (commas needed to separate?

 

One way:

data want;
   set have;
   array c (*) <list names of character variables here>;
   array n (*) <list names of numeric variables here>;
   length missvars $ 1055;
   do i=1 to dim(c);
      if missing(c[i]) then missvars=catx(',',missvars,vname(c[i]));
   end;
   do i=1 to dim(n);
      if missing(n[i]) then missvars=catx(',',missvars,vname(n[i]));
   end;
   drop i;
run;

However if you want the variables in a specific order then there is going to be a lot of extra work involved.

The Vname function will return the name of variable and will accept an array reference as valid source.

The CATX function, if you aren't familiar will concatenate strings placing the first parameter, the comma in this case, between values when there are two or more. The loops process the character and numeric variables separately because arrays must be of the same type variables, all character or all numeric.

Do not place the < > characters in the array statements, just the names of the variables you are interested in determining "missingness".

The Missing function returns 1 (true) if a variable is missing or 0 (false) and is one of the functions that works with both numeric and character variables.

View solution in original post

4 REPLIES 4
Reeza
Super User

Are all your variables numeric or character? Specifically are they all the same types or do you have a mix of character and numeric values that need to be tested? If so, do you care about the order of the variables in the list? And what are you trying to do with this, how do you plan to use that column because I forsee a lot of issues with this data structure.

Here are two posts that may be useful, if I'm right about what I think you're trying to do, which is identify patterns of missing data.
https://blogs.sas.com/content/iml/2017/11/29/visualize-patterns-missing-values.html
https://blogs.sas.com/content/iml/2016/04/18/patterns-of-missing-data-in-sas.html

 


@Valeidoscope wrote:

I have a dataset that looks like this 

id A B C D
1 Missing 1 1 1
2 Missing  Missing  1 Missing
3 1 1 1 Missing
4 1 Missing  1 1
5 1 Missing  Missing  1

 

I want to have a column named "Missing" which show which variables equal "Missing" for each ID. The goal is something similar to this:

id A B C D Missing 
1 Missing 1 1 1 A
2 Missing  Missing  1 Missing A,B,D
3 1 1 1 Missing D
4 1 Missing  1 1 B
5 1 Missing  Missing  1 B,C

 

Any ideas for how to achieve this would be helpful. Thanks!


 

Valeidoscope
Fluorite | Level 6

Hi @Reeza, the goal is not quite identifying patterns of missing data; I just need to know which variables are missing for each unique ID. The actual data set has 32 variables so it's difficult to tell manually.

 

Variables are a mix of Numeric and Character but could be converted to Numeric if necessary. I don't care about the order of the variables in the list.

Thanks

Reeza
Super User

Do your variables have the actual value of MISSING then or is that way of indicate SAS missing values.

Missing for character would be blanks and numeric would be a period.  

If you example data is not reflective of your actual data structure a solution provided here will not work on your actual data. 

 

So now we know it needs to handle character and numeric variables. 

This is rough idea of the full solution. 

data want;
set have;

array _num_miss(*) numVariable1 numVariable2 .... ;
array _char_miss(*) charVariable1 charVariable2 ...;

*may need to be bigger;
length missing_vars $600.;

do i=1 to dim(_num_miss);
if missing(_num_miss(i)) then catx(", ", trim(missing_var), vname(_num_miss(i)));
end;
do i=1 to dim(_char_miss);
if missing(_char_miss(i)) then catx(", ", trim(missing_var), vname(_char_miss(i)));
end;

run;



@Valeidoscope wrote:

Hi @Reeza, the goal is not quite identifying patterns of missing data; I just need to know which variables are missing for each unique ID. The actual data set has 32 variables so it's difficult to tell manually.

 

Variables are a mix of Numeric and Character but could be converted to Numeric if necessary. I don't care about the order of the variables in the list.

Thanks


 

ballardw
Super User

@Valeidoscope wrote:

Hi @Reeza, the goal is not quite identifying patterns of missing data; I just need to know which variables are missing for each unique ID. The actual data set has 32 variables so it's difficult to tell manually.

 

Variables are a mix of Numeric and Character but could be converted to Numeric if necessary. I don't care about the order of the variables in the list.

Thanks


Names of the variables? Since your desired output is apparently the names of the variables concatenated together with a comma delimiter then we need to be able to know the maximum number of characters that could get stuck into that variable or do you want to use the 32*32 (maximum number of characters for 32 variable names)+31 (commas needed to separate?

 

One way:

data want;
   set have;
   array c (*) <list names of character variables here>;
   array n (*) <list names of numeric variables here>;
   length missvars $ 1055;
   do i=1 to dim(c);
      if missing(c[i]) then missvars=catx(',',missvars,vname(c[i]));
   end;
   do i=1 to dim(n);
      if missing(n[i]) then missvars=catx(',',missvars,vname(n[i]));
   end;
   drop i;
run;

However if you want the variables in a specific order then there is going to be a lot of extra work involved.

The Vname function will return the name of variable and will accept an array reference as valid source.

The CATX function, if you aren't familiar will concatenate strings placing the first parameter, the comma in this case, between values when there are two or more. The loops process the character and numeric variables separately because arrays must be of the same type variables, all character or all numeric.

Do not place the < > characters in the array statements, just the names of the variables you are interested in determining "missingness".

The Missing function returns 1 (true) if a variable is missing or 0 (false) and is one of the functions that works with both numeric and character variables.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 3316 views
  • 2 likes
  • 3 in conversation