Hello,
it's 12hours since i'm trying to get over this problem. I need to write a report about credit scorring.
There are values assigned as 'M' that represent missing data(some comedian did that). I need to get rid of both usual missing data which is "." and "M" so i can start calculating correlations and regression based on regular data. I tried to reasign those "M" values into regular ' ' in a range of variables that contain the "M" value but i'm constantly failing i have egsams soon and have to write this report till saturday but since 3 days i have about 8 lines.
This is my recent ceration but it fails:
data wanted;
set have( keep=act_age -- app_char_cars);
if (act_age -- app_char_cars)='M' then (act_age -- app_char_cars)=' ';
run;
@Antoni wrote:
data wanted;
set sklejone( keep=act_age -- default_cus12);
array t act_age -- default_cus12;
do over t;
if t= 'M' then t=' ';
end;
run;thats this one, i checked and had previously included character variables, i have included the right range of ony numerical variables and it did nothing, no error, no expected output
You didn't answer the question and you don't seem to understand what we've been saying about missing values.
You need to read the page below about SAS special missing values.
You can have missing values as just missing, or you can code them to identify different types of missing values. For example, in survey data, I will often code N/A questions as .N and skipped questions/no answer as .M so they can be differentiated in the results.
So modifying my code according to the comment from @Astounding earlier in the thread, give the following macro definition. This should work now, I tested it.
%macro drop_missing_pct(input_dsn = , output_dsn=, pct = , id_vars=);
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt ._ - .Z ="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN. (drop = &ID_Vars);
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
*not required for display purposes;
proc sort data=long;
by variable;
run;
*select variables more than x% missing;
proc sql noprint;
select variable into :drop_var_list separated by " "
from long where variable_value = 'Missing' and percent > &pct;
quit;
*Drop variables;
data &output_dsn;
set &input_dsn;
drop &drop_var_list;
run;
*clean up;
*uncomment after testing;
/* proc sql; */
/* drop table long; */
/* drop table temp; */
/* quit; */
%mend;
Example showing it works, with special missing values:
*create test data set with many missing values;
data class;
set sashelp.class;
if age=13 then
call missing(height, weight, sex);
if name in ('Jane', 'Carol', 'Henry', 'Alfred') then
do;
age=.A;
sex=.M;
end;
if name in ('Judy', 'Mary') then
weight=.W;
run;
%drop_missing_pct(input_dsn=class, output_dsn=want, pct=20, id_vars=Name);
*check output;
proc contents data=want;
run;
title 'Missing Report for data set';
proc print data=long;
run;
title 'Data set with columns missing more than 20% dropped';
proc print data=want (obs=5);
run;
data wanted;
set have( keep=act_age -- app_char_cars);
array t act_age -- app_char_cars;
do over t;
if t= 'M' then t=' ';
end;
run;
Hi @Antoni -
1. Group the list of variables in an array. This assumes all the variables in the list are of the same type and the arrangement is contiguous
2. Cycle through the array using a Loop. Yes, it's a linear search and replace method. Therefore, If an element aka variable in the array is "M", reset the value of the element to ' '
ERROR: All variables in array list must be the same type, i.e., all numeric or character. - well that's not new sight for me 😞
@Antoni Hence the reason I mentioned "This assumes all the variables in the list are of the same type and the arrangement is contiguous"
Either way, If you are running against time for a deliverable I suggest not to be concerned about the mechanisms. Please post a clean representative mock sample of what you HAVE and what you WANT. I'm sure somebody will provide the needed solution. You need to include the data types and other info so that you can avoid going back and forth.
Just act fast, get the report delivered and then open up a new thread to discuss & learn. Chill
Chances are your data is fine and needs no change. The "comedian" may have actually known what (s)he was doing.
If you intend to run correlations, these variables must be numeric. So what appears as "M" is actually a special form of a missing value. SAS supports 28 different forms of missing values for numeric variables, and "." is only one of those. So are .A, .B, ..., .Z, and ._
Using a special missing value lets you differentiate why the value is missing, but it is still missing. So if all this rings true, let your data alone! It's fine. Try running a correlation and see what happens.
> it's easier to me to have only numerical values
You already have numerical values. Read @Astounding 's reply again. Chances are you need to fix exactly nothing.
I'm not sure a macro actually helps that much here.
That being said, here's an example of a macro that would do it. There's a small example at the end that shows how it's used as well and it drops anything over 20% in that example. I included that as a parameter so it's actually a more generic program. If your list ends up being longer than 64k characters this may not work as well for you.
%macro drop_missing_pct(input_dsn = , output_dsn=, pct = , id_vars=);
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN. (drop = &ID_Vars);
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
*not required for display purposes;
proc sort data=long;
by variable;
run;
*select variables more than x% missing;
proc sql noprint;
select variable into :drop_var_list separated by " "
from long where variable_value = 'Missing' and percent > &pct;
quit;
*Drop variables;
data &output_dsn;
set &input_dsn;
drop &drop_var_list;
run;
*clean up;
*uncomment after testing;
/* proc sql; */
/* drop table long; */
/* drop table temp; */
/* quit; */
%mend;
And the test to show it works:
*test;
data class;
set sashelp.class;
if age=14 then
call missing(height, weight, sex);
if name='Alfred' then
call missing(sex, age, height);
label age="Fancy Age Label";
run;
%drop_missing_pct(input_dsn = class, output_dsn = want, pct = 20, id_vars = Name);
*check output;
proc contents data=want;
run;
https://gist.github.com/statgeek/feedc3fc520cb0d2018ca2a8cab241d8
Given that there might be special missing values in the data, this line in the macro should be revised:
value nmissfmt .="Missing" other="Not Missing";
It should become:
value nmissfmt ._ - .Z ="Missing" other="Not Missing";
it showed this error,
ERROR: Invalid macro parameter name 4. It should be a valid SAS identifier no longer than 32 characters.
ERROR: A dummy macro will be compiled.
on this line:
length variable $32. variable_value $50.;
i tried to change "variable value %50" to 32 but its still the same
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.