- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data wanted;
set sklejone( keep=act_age -- default_cus12);
array t act_age -- default_cus12;
do over t;
if t= 'M' then t=' ';
end;
run;
thats this one, i checked and had previously included character variables, i have included the right range of ony numerical variables and it did nothing, no error, no expected output
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Antoni "i have included the right range of ony numerical variables and it did nothing, no error, no expected output"-- if this is the case, the conditonal check if t= 'M' then t=' '; would bear no fruit as you are checking for character.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
so how is this possible that numeric variable has "M" in it, there's no dot before the letter so SAS doesnt recognize it as missing value, then how to get rid of them.. some of the variables consist of almost only "M" value and it is useless data form me, some people in the group chat completely deleted those variables but nobody will share their method because everyone is a potential competition on the job market. in a group escpecially created to "help eachother" with this report 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
> so how is this possible that numeric variable has "M" in it
You are not helping yourself much are you?
This is explained in one of the first replies, given yesterday by @Astounding, and which I also pointed you to after you asked a similar question just after that reply was given.
The missing value is displayed as M [no dot used], and can be tested by code such as: if VAR = .M then ...; [dot used]
Also note that you needn't remove these values for any statistical calculations: SAS handles them appropriately when deriving statistics, as you've already been told.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you check whether those variables are actually formatted for display but the underlying value remains same? If you are certain those are not special missing values, the next thing i can think of is numeric variables formatted with "M" where missing
Please do a proc contents and review thoroughly.
FYI- Special missing values are not displayed with a dot prefix, rather you use dot prefix to assign a special missing value like x=.M; or in an IF statement like;
if x=.M then .....
The displayed special missing value is just M
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ok i did a test and turns out it was the .M, and i'm stupid for only doing it now but as i pasted a print screen it wasn't that obvios because of the lack of dot before the character in the data set...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Antoni So that test proves @Astounding is right. Since it is indeed special missing values. You could close the thread by marking @Astounding 's response as the answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Antoni wrote:
data wanted;
set sklejone( keep=act_age -- default_cus12);
array t act_age -- default_cus12;
do over t;
if t= 'M' then t=' ';
end;
run;thats this one, i checked and had previously included character variables, i have included the right range of ony numerical variables and it did nothing, no error, no expected output
You didn't answer the question and you don't seem to understand what we've been saying about missing values.
You need to read the page below about SAS special missing values.
You can have missing values as just missing, or you can code them to identify different types of missing values. For example, in survey data, I will often code N/A questions as .N and skipped questions/no answer as .M so they can be differentiated in the results.
So modifying my code according to the comment from @Astounding earlier in the thread, give the following macro definition. This should work now, I tested it.
%macro drop_missing_pct(input_dsn = , output_dsn=, pct = , id_vars=);
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt ._ - .Z ="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN. (drop = &ID_Vars);
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
*not required for display purposes;
proc sort data=long;
by variable;
run;
*select variables more than x% missing;
proc sql noprint;
select variable into :drop_var_list separated by " "
from long where variable_value = 'Missing' and percent > &pct;
quit;
*Drop variables;
data &output_dsn;
set &input_dsn;
drop &drop_var_list;
run;
*clean up;
*uncomment after testing;
/* proc sql; */
/* drop table long; */
/* drop table temp; */
/* quit; */
%mend;
Example showing it works, with special missing values:
*create test data set with many missing values;
data class;
set sashelp.class;
if age=13 then
call missing(height, weight, sex);
if name in ('Jane', 'Carol', 'Henry', 'Alfred') then
do;
age=.A;
sex=.M;
end;
if name in ('Judy', 'Mary') then
weight=.W;
run;
%drop_missing_pct(input_dsn=class, output_dsn=want, pct=20, id_vars=Name);
*check output;
proc contents data=want;
run;
title 'Missing Report for data set';
proc print data=long;
run;
title 'Data set with columns missing more than 20% dropped';
proc print data=want (obs=5);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
hello again, i just have a quick question. Can this macro be modified to work on two separate data sets and give two different outputs? turns out i have to work on my db's separately instead of merging them together. How would it be done if possible, i know i could run two marcos for each data sets but dont know if this is the only way, i would apreciate the answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
However, if the datasets are handled differently you cannot necessarily expect the same output. The percent missing will likely differ from a merged dataset. If that is the case, You would need to redefine the problem and show example data and expected output, preferably as a new question/post now.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you want to perform the same action on multiple columns and the columns are all the same type then you can use an array.
However, you want to do numeric calculations on your data but it has to be coded as character to contain an M or you're using special missing values. I'm going to assume the first is the situation. You can try and deal with the M or you can convert everything to numbers. Because you are converting the variable type you'll need two arrays, one for the old values and one for the new values.
Here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/
For this to work for you, you will need to list the variables in each array. I've included a tutorial on arrays above.
data want;
set have;
array _old(*) act_age app_char_cars;
array _new(*) age char_cars;
do i=1 to dim(old_variables);
*if not missing or M, convert to a numeric value;
if old(i) not in ("", "M") then new(i) = input(old(i), 8.);
end;
drop act_age app_char_cars;
run;
@Antoni wrote:
Hello,
it's 12hours since i'm trying to get over this problem. I need to write a report about credit scorring.There are values assigned as 'M' that represent missing data(some comedian did that). I need to get rid of both usual missing data which is "." and "M" so i can start calculating correlations and regression based on regular data. I tried to reasign those "M" values into regular ' ' in a range of variables that contain the "M" value but i'm constantly failing i have egsams soon and have to write this report till saturday but since 3 days i have about 8 lines.
This is my recent ceration but it fails:
data wanted;
set have( keep=act_age -- app_char_cars);
if (act_age -- app_char_cars)='M' then (act_age -- app_char_cars)=' ';
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content