BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sassy_seb
Quartz | Level 8

Hello, so I was working on this macro to print the results of the id and each variable in a dataset if it is empty.

%macro filter_and_print(dataset, new_dataset, var_id);
    %local dsid varnum varname vartype filter_condition;

    /* Open the dataset */
    %let dsid = %sysfunc(open(&dataset));

    /* Check if dataset is opened successfully */
    %if &dsid %then %do;
        /* Get the number of variables in the dataset */
        %let varnum = %sysfunc(attrn(&dsid, nvars));

        /* Loop through variables */
        %do i = 1 %to &varnum;
            /* Get variable name and type */
            %let varname = %qscan(%sysfunc(varname(&dsid, &i)), 1);
            %let vartype = %sysfunc(vartype(&dsid, &i));

            /* Determine filter condition based on variable type */
            %if &vartype = C %then %do;
                %let filter_condition = %sysfunc(quote(&varname)) = ' ';
            %end;
            %else %if &vartype = N %then %do;
                %let filter_condition = %sysfunc(quote(&varname)) = .;
            %end;

            /* Create filtered dataset */
            data &new_dataset;
                set &dataset;
                if &filter_condition;
            run;

            /* Print filtered dataset */
            proc print; var &var_id &varname;
            run;
        %end;

        /* Close the dataset */
        %let dsid = %sysfunc(close(&dsid));
    %end;
    %else %put ERROR: Unable to open dataset &dataset;
%mend;

Right, so it does loop through each variable, if it is a character variable it checks if there is an empty string, if it is a numerical variable it checks if it is equal to .

 

Okay, so in the log, I get messages like 

sassy_seb_0-1717776313629.png

But using proc contents on the original dataset, I do see that the variable in question is numerical.

And I see the Note that says character values have been converted to numerical, but why would it be doing that?

 

I wouldn't recommend running the program with a large dataset though as it does take a bit of time to fully execute.
I do get my printed results at the end of running the program, however it does not print out the empty/missing values as expected. It instead returns every observation but filtered by the id variable and the variable being observed.  

 

Any help would be greatly appreciated. Thank you! 😁

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

What is this doing? Sounds almost like you are writing a bad imitation of Proc Contents.

 

What are you passing as a value for Var_id?

 

If you want to find if variable has missing values then use the MISSING function. It works for character and numeric variables. Your code is generating stuff like ' IF "Nameofvariable" = . ; '  which is the invalid numeric value

made by

%else %if &vartype = N %then %do;
                %let filter_condition = %sysfunc(quote(&varname)) = .;
            %end;

Your quotes on the variable name for the character variables are also just plain wrong as it generates

if "variablename" = ' ';

which is never ever going to be true. The condition would have to be "If Varname = ' ';"

 

Your filter condition can be unconditionally written

data &new_dataset;
   set &dataset;
   if missing(&varname);
run;

 

View solution in original post

13 REPLIES 13
PaigeMiller
Diamond | Level 26

Since this is a macro, you need to turn on the macro debugging option by running this command.

 

options mprint;

 

Then run the macro again and please show us the log for this macro down to the first NOTE indicating the problem. Do not show us error messages or warnings or notes detached from the part of the code where the error is.


From now on, please provide logs as text, not screen captures, by copying the log for this macro as text and pasting it into the window that appears when you click on the </> icon.

 

If you have relevant PROC CONTENTS, show them to us!

 

 

--
Paige Miller
ballardw
Super User

What is this doing? Sounds almost like you are writing a bad imitation of Proc Contents.

 

What are you passing as a value for Var_id?

 

If you want to find if variable has missing values then use the MISSING function. It works for character and numeric variables. Your code is generating stuff like ' IF "Nameofvariable" = . ; '  which is the invalid numeric value

made by

%else %if &vartype = N %then %do;
                %let filter_condition = %sysfunc(quote(&varname)) = .;
            %end;

Your quotes on the variable name for the character variables are also just plain wrong as it generates

if "variablename" = ' ';

which is never ever going to be true. The condition would have to be "If Varname = ' ';"

 

Your filter condition can be unconditionally written

data &new_dataset;
   set &dataset;
   if missing(&varname);
run;

 

sassy_seb
Quartz | Level 8

We had some old code in our programs that had checks for variables with missing values. What the people before me were doing were writing code for each variable and filtering out whatever was missing that way. So the programs were usually around 1000+ lines of code. I wanted to see if there was a way for me to just specify what dataset to check for missing values and loop through each variable and return the ID's with the missing values for each variable.  That way I can include this macro in other programs, just to see what is missing from each variable before moving on with working with the data manually.  

 

Thanks @ballardw, I was thinking that something with the quote was turning . into "." or something along those lines. I understand what it does now.

 

I ran the macro with the if missing function and it worked perfectly! Thanks a lot.

 

ballardw
Super User

You might be interested in the PROC FREQ NLEVELS option as a prior step. That can analyze a data set, return a data set with the variable names and if there are any missing values at all. Which could be used to drive the select an print option if still needed.

 

A small example:

data example;
   input a $ b $ x y;
datalines;
abc def 1 2
abc .   2 3
pdq abc . 4
;

ods select nlevels;
proc freq data=example nlevels;
   ods output nlevels=mynlevelsdataset;
run;

The ODS Select means the only output is the Nlevels information which means you don't generate the by default frequency listing of all the variables. You can look at the MYnlevelsdata set and the variables that have 1 (or more) in the Nmisslevels variable are the only ones with missing data.

So you could skip the whole open the data set loop over all the variables and go directly to the variables that have missing values somewhere.

sassy_seb
Quartz | Level 8

Sorry about the super late reply, was in a meeting. What do the levels mean exactly if you don't mind me asking.

sassy_seb_0-1717796556875.png

I thought variable a would result in 3 levels? 

 

sassy_seb
Quartz | Level 8

Never mind, I understand! Thank you! It is 2 because there are two instances of 'abc' for variable a?

Quentin
Super User

Glad you got it working.  If you're just counting missings, PROC FREQ might be more straight forward.

 

If you want to do something more custom in a DATA step you might consider using arrays for looping over the variables.  That way you don't have to execute the DATA step once for each variable. For example, below will give you a dataset that has a list of the variables that have missing values, with the ID value for the record.

 

 

data have ; 
  set sashelp.class (rename=(name=ID));
  if      _N_=2 then call missing(height) ;
  else if _N_=4 then call missing(sex) ;
run ;


data missing (keep=id var);
  set have ; 
  array nums{*} _numeric_ ;
  array chars{*} _character_ ;
  length var $32 ;
  do i=1 to dim(nums) ;
    if missing(nums{i}) then do ;
      var=vname(nums{i}) ;
      output ;
    end ;
  end ;
  do i=1 to dim(chars) ;
    if missing(chars{i}) then do ;
      var=vname(chars{i}) ;
      output ;
    end ;
  end ;
run ;

proc print data=missing ;
run ;

returns:

  ID      var

 Alice    Height
 Carol    Sex

You could take that and turn it into a utility macro for outputting a list of missing values.

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
sassy_seb
Quartz | Level 8

Yes actually, that is something I wanted to look into. I showed the team I work with and they suggested that they would like to modify the macro so that it prints out only a specific set of variables that they would like to observe. As I am new to SAS, I was wondering what the best way to do this was.

 

Is there something like a list that I can include in the macro parameter like in Python?

 

Something like variables = var1, var2, var3

And then the macro would only do the work for IF MISSING(var1) and the rest of the variables we specified.

Would an array help me do this? 

sassy_seb
Quartz | Level 8

OR can I just do %macro filter_and_print(dataset, new_dataset, var_id, var_list);

and then later call it using:
%filter_and_print(dataset=oldData, new_dataset=newData, var_id=ID, var_list=Name Age Salary);

Quentin
Super User

The SAS macro language is just a text processor / generator.  So there isn't a list object like in Python, because there are no objects.  But you can type a bunch of text that you know is a list, and then you can use it like a list, but it's still just text.

 

So you could do:

%macro printit(data=,var=) ;
  proc print data=&data ;
    var &var ;
  run ;
%mend ;

%printit(data=sashelp.class,var=name sex)
%printit(data=sashelp.shoes,var=region sales inventory)

So the macro language doesn't know that the parameter VAR is a list of variables, but you do. You can iterate over the text list, if you know a delimiter.  So you could have a parameter which is a list of dataset names, and the macro could iterate over it to print the first 5 records from each dataset:

%macro printlots(data=,obs=5) ;
  %local i datai ;
  %do i=1 %to %sysfunc(countw(&data,%str( ))) ;
    %let datai=%scan(&data,&i,%str( )) ;  %*get ith item of the list;

    proc print data=&datai(obs=&obs) ;
    run ;

  %end ;
%mend ;

%printlots(data=sashelp.class sashelp.shoes sashelp.iris)


For creating an output dataset that has one record for each missing value found, a utility macro might look like:

 

data have ; 
  set sashelp.class;
  if      _N_=2 then call missing(height) ;
  else if _N_=4 then call missing(sex) ;
run ;

%macro getmissing(data=,out=,var=,id=) ;

data &out (keep=&id var);
  set &data (keep=&id &var); 
  retain __dummynum 1 __dummychar "Q" ; *hack to make sure there is at least one num and one char var ;
  array nums{*} _numeric_ ;
  array chars{*} _character_ ;
  length var $32 ;
  do i=1 to dim(nums) ;
    if missing(nums{i}) then do ;
      var=vname(nums{i}) ;
      output ;
    end ;
  end ;
  do i=1 to dim(chars) ;
    if missing(chars{i}) then do ;
      var=vname(chars{i}) ;
      output ;
    end ;
  end ;
run ;

%mend getmissing ;

%getmissing(data=have,id=name,out=want1,var=height sex)
proc print data=want1 ;
run ;

%getmissing(data=have,id=name,out=want2,var=height weight)
proc print data=want2 ;
run ;

A key to understanding the macro language is that it's just a text generator.  And all of the values are just text.  The text could be a number, or the name of a dataset, or a pipe-delimited list of  variables, or ...  The macro language does not know about datasets or data set variables, it just processes text.  (The %sysfunc(open(&dataset))  is the exception that proves the rule. : )

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
ballardw
Super User

@sassy_seb wrote:

Yes actually, that is something I wanted to look into. I showed the team I work with and they suggested that they would like to modify the macro so that it prints out only a specific set of variables that they would like to observe. As I am new to SAS, I was wondering what the best way to do this was.

 

Is there something like a list that I can include in the macro parameter like in Python?

 

Something like variables = var1, var2, var3

And then the macro would only do the work for IF MISSING(var1) and the rest of the variables we specified.

Would an array help me do this? 


Show an example of the complete Proc Print code you might want to generate.

Yes you can provide a list of values in a macro parameter. DO NOT USE COMMAS IN THE LIST.

The comma is the macro language parameter delimiter and as such you don't want to try to use it. Spaces work just fine.

The generic loop for space delimited list looks like

%do i = 1 %to %sysfunc(countw(&listvar.));
   /* grab one of the parameters */
    %let oneparm = %scan(&listvar. , &i.);
   /* depending on what you want to do with that parm sets any other text in the loop*/
%end;

HOW to use that depends on the code to generate.

 

 

 

 

 

sassy_seb
Quartz | Level 8
%macro filter_and_print(dataset, new_dataset, var_id);
    %local dsid varnum varname;

    /* Open the dataset */
    %let dsid = %sysfunc(open(&dataset));

    /* Check if dataset is opened successfully */
    %if &dsid %then %do;
        /* Get the number of variables in the dataset */
        %let varnum = %sysfunc(attrn(&dsid, nvars));

        /* Loop through variables */
        %do i = 1 %to &varnum;
            /* Get variable name and type */
            %let varname = %qscan(%sysfunc(varname(&dsid, &i)), 1);

            /* Create filtered dataset */
            data &new_dataset;
                set &dataset;
                if missing(&varname);
            run;

            /* Print filtered dataset */
            proc print; var &var_id &varname;
            run;
        %end;

        /* Close the dataset */
        %let dsid = %sysfunc(close(&dsid));
    %end;
    %else %put ERROR: Unable to open dataset &dataset;
%mend;

So for this code, I would like to add the listvar parameter you mentioned.

So ideal scenario:
Do the PROC FREQ and NLEVELS to see what variables have missing values like was mentioned before.

Run the macro I am working on and specify the list of variables that we are going to be filtering by and printing the results for each variable.

I know I can just make a new dataset while using KEEP=var_id listvar and then run the macro with the new dataset. I am really interested though in just making something that can do that last part for ease of use in the initial macro. 

Tom
Super User Tom
Super User

What is the goal of this report? 

 

Sounds like you want a report of missing values.  But if the value is missing on every observation in the report then what is value of including that variable in the report?

 

And do you really want to make a totally separate listing for each variable in your list?  Won't that be both take a lot of pages and also be hard to interpret.

 

Wouldn't it be better to some type of summary report that would both take less space and make more sense? 

 

Is KEY a unique key (each observation has a separate value)?   If not do want the same KEY value to print over and over for every observation that has a missing value of VARNAME?  Why not just print it once?  And could some of the observation for a given value of KEY have a non-missing value of VARNAME and others have a missing value?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 679 views
  • 7 likes
  • 5 in conversation