BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
James_Yu
Obsidian | Level 7

Hi everyone.

I am establishing a program to check missing data points based on a set of "if" statements, and I need to generate a "Comment" column to indicate which variables are missing. But I stuck right at the beginning of the program.

 

Consider that I have a set of demographic characteristics (like age, day/month/year of birth, day/month/year of visit, height, weight, ...) and if a variable is missing, I need to add a string like "Missing age" (for example) into the "Comment" column. If all variables are not missing, "Comment" column is expected to be blank. 

 

The way I think is that, I will use CAT function to add a new string into "Comment" column if a variable meet the missing condition like: 

if AGE    = .   then Comment = cat(Comment,"Missing Age");
if BRTHDY = " " then Comment = cat(Comment,"Missing Birth date");
if BRTHMO = " " then Comment = cat(Comment,"Missing Birth month");
if BRTHYR = " " then Comment = cat(Comment,"Missing Birth year");

But after running the data step, the "Comment" column is blank. I can only think that I need a way to pre-define one value of "Comment" column before the "IF" statements.

 

Could anyone please give me an idea to deal with this situation ?

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

The Comment column is blank for a very simple reason: It is not long enough to hold all the data that you are trying to put into it. Why? because you are using the CAT function, and not CATS or CATX, which trim the parameters before appending. And before the CAT call, the COMMENT variable is already full (of blanks).

 

I would rewrite your code as something like:

if AGE    = .   then call catx(';',Comment,"Missing Age");
if BRTHDY = " " then call catx(';',Comment,"Missing Birth date");
if BRTHMO = " " then call catx(';',Comment,"Missing Birth month");
if BRTHYR = " " then call catx(';',Comment,"Missing Birth year");

I used CATX because you may want a delimiter between the different texts (the ';'). I changed the code to CALL CATX instead of the assignment, because it is faster, CPU-wise.

 

 

View solution in original post

10 REPLIES 10
Patrick
Opal | Level 21

The following a bit different from what you've started doing but that's how I'd implement DQ. 

  • The Error table contains all rows from source where at least one DQ issue has been found
  • The Exception table contains a row per DQ issue found
  • Variable _row_num in the Error and Exception table is what allows you to link the tables together.

Below sample code meant as a starting point for you to implement what you need in case you want to take the proposed approach.

data work.class;
  set sashelp.class;
  if _n_=3 then call missing(name);
  if _n_=3 then call missing(age);
  if _n_=5 then call missing(of _all_);
  if _n_=8 then call missing(height);
run;

data 
  work.error(drop=__:)
  work.exception(keep=_row_num __vname __dq_issue)
  ;
  set work.class;
  array _char {*} _character_;
  array _nums {*} _numeric_;
  __err_flg=0;
  _row_num=_n_;
  length __vname $32;
  length __dq_issue $20;

  do __i=1 to dim(_char);
    if missing(_char(__i)) then 
      do;
        __err_flg=1;
        __vname=vname(_char(__i));
        __dq_issue='Missing';
        output exception;
      end;
  end;

  do __i=1 to dim(_nums);
    if missing(_nums(__i)) then 
      do;
        __err_flg=1;
        __vname=vname(_nums(__i));
        __dq_issue='Missing';
        output exception;
      end;
  end;  

  if cmiss(of _all_)=sum(dim(_char),dim(_nums)) then
    do;
        __err_flg=1;
        __vname='n/a';
        __dq_issue='All vars missing';
        output exception;
    end;

  if __err_flg=1 then output error;
run;
s_lassen
Meteorite | Level 14

The Comment column is blank for a very simple reason: It is not long enough to hold all the data that you are trying to put into it. Why? because you are using the CAT function, and not CATS or CATX, which trim the parameters before appending. And before the CAT call, the COMMENT variable is already full (of blanks).

 

I would rewrite your code as something like:

if AGE    = .   then call catx(';',Comment,"Missing Age");
if BRTHDY = " " then call catx(';',Comment,"Missing Birth date");
if BRTHMO = " " then call catx(';',Comment,"Missing Birth month");
if BRTHYR = " " then call catx(';',Comment,"Missing Birth year");

I used CATX because you may want a delimiter between the different texts (the ';'). I changed the code to CALL CATX instead of the assignment, because it is faster, CPU-wise.

 

 

s_lassen
Meteorite | Level 14

A note on my previous submission: When using CALL CATX, the COMMENT variable must be defined first, so if you did not define it otherwise, the code I suggested should be changed to

length comment $200;                                                                                                                    
if AGE    = .   then call catx(';',Comment,"Missing Age");                                                                              
if BRTHDY = " " then call catx(';',Comment,"Missing Birth date");                                                                       
if BRTHMO = " " then call catx(';',Comment,"Missing Birth month");                                                                      
if BRTHYR = " " then call catx(';',Comment,"Missing Birth year");          

assuming that you are not putting more stuff into the comment variable, 200 characters should be enough.

James_Yu
Obsidian | Level 7

Thank for your help.

I really appreciate your response.

Kurt_Bremser
Super User

Define the comment variable first with a suitable length, and then study the log after running your step. Also post the complete log (code and messages) here.

length comment $1000;
if AGE    = .   then Comment = catx(',',Comment,"Missing Age");
if BRTHDY = " " then Comment = catx(',',Comment,"Missing Birth date");
if BRTHMO = " " then Comment = catx(',',Comment,"Missing Birth month");
if BRTHYR = " " then Comment = catx(',',Comment,"Missing Birth year");

Why is a day, month, year stored as character?

Also consider the use of the MISSING function to determine missing values.

CATX will insert a delimiter for better readability.

James_Yu
Obsidian | Level 7

Thanks for your help. 

I will define the variable first. The reason why day, month, year stored as character is that we have an option of Unknown (stored as U) with of these variable.

 

Kurt_Bremser
Super User

@James_Yu wrote:

Thanks for your help. 

I will define the variable first. The reason why day, month, year stored as character is that we have an option of Unknown (stored as U) with of these variable.

 


In SAS, such a value would be a missing value, represented by a dot for numeric variables. You can handle this with a custom informat when you read the data into SAS.

If you want to make a difference between "nothing was entered" and "U", you can use a special missing value like 

.U
James_Yu
Obsidian | Level 7

Thanks for your suggestion.

Currently our system allows personnel to mark a variable as Unknown by clicking some buttons (not by entering data) and in the extracted dataset, the unknown cells are filled with "U". This is the reason why I specify my variables as text.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3411 views
  • 3 likes
  • 4 in conversation