Hi everyone.
I am establishing a program to check missing data points based on a set of "if" statements, and I need to generate a "Comment" column to indicate which variables are missing. But I stuck right at the beginning of the program.
Consider that I have a set of demographic characteristics (like age, day/month/year of birth, day/month/year of visit, height, weight, ...) and if a variable is missing, I need to add a string like "Missing age" (for example) into the "Comment" column. If all variables are not missing, "Comment" column is expected to be blank.
The way I think is that, I will use CAT function to add a new string into "Comment" column if a variable meet the missing condition like:
if AGE = . then Comment = cat(Comment,"Missing Age");
if BRTHDY = " " then Comment = cat(Comment,"Missing Birth date");
if BRTHMO = " " then Comment = cat(Comment,"Missing Birth month");
if BRTHYR = " " then Comment = cat(Comment,"Missing Birth year");
But after running the data step, the "Comment" column is blank. I can only think that I need a way to pre-define one value of "Comment" column before the "IF" statements.
Could anyone please give me an idea to deal with this situation ?
Thank you
The Comment column is blank for a very simple reason: It is not long enough to hold all the data that you are trying to put into it. Why? because you are using the CAT function, and not CATS or CATX, which trim the parameters before appending. And before the CAT call, the COMMENT variable is already full (of blanks).
I would rewrite your code as something like:
if AGE = . then call catx(';',Comment,"Missing Age");
if BRTHDY = " " then call catx(';',Comment,"Missing Birth date");
if BRTHMO = " " then call catx(';',Comment,"Missing Birth month");
if BRTHYR = " " then call catx(';',Comment,"Missing Birth year");
I used CATX because you may want a delimiter between the different texts (the ';'). I changed the code to CALL CATX instead of the assignment, because it is faster, CPU-wise.
The following a bit different from what you've started doing but that's how I'd implement DQ.
Below sample code meant as a starting point for you to implement what you need in case you want to take the proposed approach.
data work.class;
set sashelp.class;
if _n_=3 then call missing(name);
if _n_=3 then call missing(age);
if _n_=5 then call missing(of _all_);
if _n_=8 then call missing(height);
run;
data
work.error(drop=__:)
work.exception(keep=_row_num __vname __dq_issue)
;
set work.class;
array _char {*} _character_;
array _nums {*} _numeric_;
__err_flg=0;
_row_num=_n_;
length __vname $32;
length __dq_issue $20;
do __i=1 to dim(_char);
if missing(_char(__i)) then
do;
__err_flg=1;
__vname=vname(_char(__i));
__dq_issue='Missing';
output exception;
end;
end;
do __i=1 to dim(_nums);
if missing(_nums(__i)) then
do;
__err_flg=1;
__vname=vname(_nums(__i));
__dq_issue='Missing';
output exception;
end;
end;
if cmiss(of _all_)=sum(dim(_char),dim(_nums)) then
do;
__err_flg=1;
__vname='n/a';
__dq_issue='All vars missing';
output exception;
end;
if __err_flg=1 then output error;
run;
The Comment column is blank for a very simple reason: It is not long enough to hold all the data that you are trying to put into it. Why? because you are using the CAT function, and not CATS or CATX, which trim the parameters before appending. And before the CAT call, the COMMENT variable is already full (of blanks).
I would rewrite your code as something like:
if AGE = . then call catx(';',Comment,"Missing Age");
if BRTHDY = " " then call catx(';',Comment,"Missing Birth date");
if BRTHMO = " " then call catx(';',Comment,"Missing Birth month");
if BRTHYR = " " then call catx(';',Comment,"Missing Birth year");
I used CATX because you may want a delimiter between the different texts (the ';'). I changed the code to CALL CATX instead of the assignment, because it is faster, CPU-wise.
A note on my previous submission: When using CALL CATX, the COMMENT variable must be defined first, so if you did not define it otherwise, the code I suggested should be changed to
length comment $200;
if AGE = . then call catx(';',Comment,"Missing Age");
if BRTHDY = " " then call catx(';',Comment,"Missing Birth date");
if BRTHMO = " " then call catx(';',Comment,"Missing Birth month");
if BRTHYR = " " then call catx(';',Comment,"Missing Birth year");
assuming that you are not putting more stuff into the comment variable, 200 characters should be enough.
Thank for your help.
I really appreciate your response.
Define the comment variable first with a suitable length, and then study the log after running your step. Also post the complete log (code and messages) here.
length comment $1000;
if AGE = . then Comment = catx(',',Comment,"Missing Age");
if BRTHDY = " " then Comment = catx(',',Comment,"Missing Birth date");
if BRTHMO = " " then Comment = catx(',',Comment,"Missing Birth month");
if BRTHYR = " " then Comment = catx(',',Comment,"Missing Birth year");
Why is a day, month, year stored as character?
Also consider the use of the MISSING function to determine missing values.
CATX will insert a delimiter for better readability.
Thanks for your help.
I will define the variable first. The reason why day, month, year stored as character is that we have an option of Unknown (stored as U) with of these variable.
@James_Yu wrote:
Thanks for your help.
I will define the variable first. The reason why day, month, year stored as character is that we have an option of Unknown (stored as U) with of these variable.
In SAS, such a value would be a missing value, represented by a dot for numeric variables. You can handle this with a custom informat when you read the data into SAS.
If you want to make a difference between "nothing was entered" and "U", you can use a special missing value like
.U
Thanks for your suggestion.
Currently our system allows personnel to mark a variable as Unknown by clicking some buttons (not by entering data) and in the extracted dataset, the unknown cells are filled with "U". This is the reason why I specify my variables as text.
How do you get the data from this data entry system into SAS? This the point where you convert "U" to a missing value.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.