SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Antoni
Fluorite | Level 6

data wanted;
set sklejone( keep=act_age -- default_cus12);
array t act_age -- default_cus12;
do over t;
if t= 'M' then t=' ';
end;
run;

thats this one, i checked and had previously included character variables, i have included the right range of ony numerical variables and it did nothing, no error, no expected output

 

novinosrin
Tourmaline | Level 20

@Antoni "i have included the right range of ony numerical variables and it did nothing, no error, no expected output"-- if this is the case, the conditonal check if t= 'M' then t=' '; would bear no fruit as you are checking for character. 

 

Antoni
Fluorite | Level 6

so how is this possible that numeric variable has "M" in it, there's no dot before the letter so SAS doesnt recognize it as missing value, then how to get rid of them.. some of the variables consist of almost only "M" value and it is useless data form me, some people in the group chat completely deleted those variables but nobody will share their method because everyone is a potential competition on the job market. in a group escpecially created to "help eachother" with this report 🙂

ChrisNZ
Tourmaline | Level 20

> so how is this possible that numeric variable has "M" in it

 

You are not helping yourself much are you?

 

This is explained in one of the first replies, given yesterday by @Astounding, and which I also pointed you to after you asked a similar question just after that reply was given.

 

The missing value is displayed as M [no dot used], and can be tested by code such as:     if VAR = .M then ...;      [dot used]

 

Also note that you needn't remove these values for any statistical calculations: SAS handles them appropriately when deriving statistics, as you've already been told.

 

 

novinosrin
Tourmaline | Level 20

Can you check whether those variables are actually formatted for display but the underlying value remains same? If you are certain those are not special missing values, the next thing i can think of is  numeric variables formatted with "M" where missing

 

Please do a proc contents and review thoroughly.

@Antoni 

FYI- Special missing values are not displayed with a dot prefix, rather you use dot prefix to assign a special missing value like x=.M; or in an IF statement like;

if x=.M then .....

The displayed special missing value is just M

 

 

Antoni
Fluorite | Level 6

ok i did a test and turns out it was the .M, and i'm stupid for only doing it now but as i pasted a print screen it wasn't that obvios because of the lack of dot before the character in the data set... 

 

novinosrin
Tourmaline | Level 20

@Antoni  So that test proves @Astounding is right. Since it is indeed special missing values. You could close the thread by marking @Astounding 's response as the answer.

 

 

ChrisNZ
Tourmaline | Level 20

> You could close the thread

43 replies later! 

Reeza
Super User

@Antoni wrote:

data wanted;
set sklejone( keep=act_age -- default_cus12);
array t act_age -- default_cus12;
do over t;
if t= 'M' then t=' ';
end;
run;

thats this one, i checked and had previously included character variables, i have included the right range of ony numerical variables and it did nothing, no error, no expected output

 


You didn't answer the question and you don't seem to understand what we've been saying about missing values.

You need to read the page below about SAS special missing values. 

 

You can have missing values as just missing, or you can code them to identify different types of missing values. For example, in survey data, I will often code N/A questions as .N and skipped questions/no answer as .M so they can be differentiated in the results.  

 

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=lrcon&docsetTarget=p1xr9f...

 

So modifying my code according to the comment from @Astounding earlier in the thread, give the following macro definition. This should work now, I tested it.

 

 

 

%macro drop_missing_pct(input_dsn = , output_dsn=, pct = , id_vars=);

*create format for missing;
proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
     value nmissfmt  ._ - .Z ="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN. (drop = &ID_Vars);
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;


ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

*not required for display purposes;
proc sort data=long;
    by variable;
run;

*select variables more than x% missing;
proc sql noprint;
select variable into :drop_var_list separated by " "
from long where variable_value = 'Missing' and percent > &pct;
quit;

*Drop variables;
data &output_dsn;
set &input_dsn;
drop &drop_var_list;
run;

*clean up;
*uncomment after testing;
/* proc sql; */
/* drop table long; */
/* drop table temp; */
/* quit; */

%mend;

Example showing it works, with special missing values:

*create test data set with many missing values;
data class;
	set sashelp.class;

	if age=13 then
		call missing(height, weight, sex);

	if name in ('Jane', 'Carol', 'Henry', 'Alfred') then
		do;
			age=.A;
			sex=.M;
		end;

	if name in ('Judy', 'Mary') then
		weight=.W;


run;

%drop_missing_pct(input_dsn=class, output_dsn=want, pct=20, id_vars=Name);
*check output;

proc contents data=want;
run;

title 'Missing Report for data set';

proc print data=long;
run;

title 'Data set with columns missing more than 20% dropped';

proc print data=want (obs=5);
run;

 

Antoni
Fluorite | Level 6

hello again, i just have a quick question. Can this macro be modified to work on two separate data sets and give two different outputs? turns out i have to work on my db's separately instead of merging them together. How would it be done if possible, i know i could run two marcos for each data sets but  dont know if this is the only way, i would apreciate the answer.

Reeza
Super User
Not sure why calling the macro twice is problematic at all, it's just adding a second %drop_missing_pct().

However, if the datasets are handled differently you cannot necessarily expect the same output. The percent missing will likely differ from a merged dataset. If that is the case, You would need to redefine the problem and show example data and expected output, preferably as a new question/post now.
Reeza
Super User

If you want to perform the same action on multiple columns and the columns are all the same type then you can use an array. 

 

However, you want to do numeric calculations on your data but it has to be coded as character to contain an M or you're using special missing values. I'm going to assume the first is the situation. You can try and deal with the M or you can convert everything to numbers. Because you are converting the variable type you'll need two arrays, one for the old values and one for the new values.

 

Here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/

 

For this to work for you, you will need to list the variables in each array. I've included a tutorial on arrays above.

 

data want;
set have;

array _old(*) act_age app_char_cars;
array _new(*) age char_cars;

do i=1 to dim(old_variables);

*if not missing or M, convert to a numeric value;
if old(i) not in ("", "M") then new(i) = input(old(i), 8.);

end;

drop act_age app_char_cars;
run;



@Antoni wrote:

Hello,
it's 12hours since i'm trying to get over this problem. I need to write a report about credit scorring.

There are values assigned as 'M' that represent missing data(some comedian did that). I need to get rid of both usual missing data which is "." and "M" so i can start calculating correlations and regression based on regular data. I tried to reasign those "M" values into regular ' ' in a range of variables that contain the "M" value but i'm constantly failing i have egsams soon and have to write this report till saturday but since 3 days i have about 8 lines. 
This is my recent ceration but it fails:

 

data wanted;
set have( keep=act_age -- app_char_cars);
if (act_age -- app_char_cars)='M' then (act_age -- app_char_cars)=' ';
run;


 

Antoni
Fluorite | Level 6
the problem is there are 2300 variables, whats why i needed it to be done through range
Reeza
Super User
There are other ways if you want to explain your problem thoroughly. We can only have the information you post and at the moment I suspect you need to back up and redesign however you created this data. If you read it in from text files or from a DB there may be methods there to ensure that these issues never occurred in the first place.
Antoni
Fluorite | Level 6
well this data was provided by my professor, i can do whatever i want with it as long as it works and provides some kind of information

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 46 replies
  • 1695 views
  • 14 likes
  • 6 in conversation