DATA Step, Macro, Functions and more

Need to drop null columns starts with or prefixed with a certain word

Accepted Solution Solved
Reply
Contributor
Posts: 20
Accepted Solution

Need to drop null columns starts with or prefixed with a certain word


Hi Folks,

Hope you are doing great. Suppose I have few columns named with Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 and so on. In future the same named columns can be added or dropped, in other words we can consider as dynamic as of now.

I am trying to drop those columns which is having null or 0 or 1 values.

 



data have;
input category :$10. Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4;
datalines;
#ofpeople     20       30  0               
#ofproviders  10       5               
#ofclaims     40       25                     
AmountBilled  50       100  1       
AmountPaid    11       35                          
AmountDed     5        6                          
;
options symbolgen mlogic mprint;
data _null_;
   set have;
   array Attrib_vars Attrib_Itr:;
   call symputx('Attrib_qty', dim(Attrib_vars));
   run;
   %put &Attrib_qty;
   data _null_;
   set have end=done;
   array Attrib_vars Attrib_Itr:;/* Wild card for Attrib_Itr variables*/
   length list $ 500;
   if done then do;
      do i= 1 to &Attrib_qty;

if Attrib_vars[i] = ' ' or Attrib_vars[i] in (0,1) then list=catx(' ',trim(list),vname(Attrib_vars(i)));  
  
      end;
   call symputx('mlist',list);
 end;
run;
%put &mlist;

                                                                                                  
data D5;
   set have;
   drop &mlist;
run;


I am trying the above code in my scenario, but it's storing all the columns starts with Attrib_Itr into the mlist macro variable. Like mlist resolves to Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4.

But ideally I think mlist should have only Attrib_Itr3 and Attrib_Itr4 columns as these columns only contains 0,1 and null values.

What I am doing the mistake can anybody please let me know?

Please help if possible.

Thanks
Rajdeep.


Accepted Solutions
Solution
‎10-01-2017 05:16 PM
Frequent Contributor
Posts: 149

Re: Need to drop null columns starts with or prefixed with a certain word

Try this data-step:

data _null_;
   set have end=jobDone;
   length dropList $ 300; 
   retain dropList " ";
   
   array check Attrib_:;
   
   if _n_ = 1 then do;
      do i = 1 to dim(check);
         dropList = catx(' ', dropList, vname(check{i}));
      end;
   end;
   
   /* to test: iterating over dropList could be 
    * faster if the number of variables high */
   do i = 1 to dim(check);
      if check{i} not in (., 0, 1) then do;
         dropList = tranwrd(dropList, vname(check{i}), ' ');
      end;
   end;
   
   if jobDone then do;
      call symputx('mList', dropList);
   end;
run;

%put &=mlist;

 

 

View solution in original post


All Replies
Super User
Posts: 7,809

Re: Need to drop null columns starts with or prefixed with a certain word

First of all, please review your initial data step to create dataset have; when I copy/paste and run it, I only get 3 observations, which I guess is not what's intended. Be precise in setting missing values, so that the whole input list is satisfied. Or use the truncover option.

 

I then ran a slightly changed variation of your code:

data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4;
datalines;
#ofpeople     20       30  0               
#ofproviders  10       5               
#ofclaims     40       25                     
AmountBilled  50       100  1       
AmountPaid    11       35                          
AmountDed     5        6                          
;
run;

data _null_;
set have end=done;
array Attrib_vars Attrib_Itr:;/* Wild card for Attrib_Itr variables*/
length list $ 500;
if done then do;
  do i= 1 to dim(attrib_vars);
    if Attrib_vars[i] = ' ' or Attrib_vars[i] in (0,1) then list=catx(' ',trim(list),vname(Attrib_vars(i)));  
  end;
  call symputx('mlist',list);
end;
run;

%put &mlist;

and got this log (excerpt):

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
      43:25   
NOTE: There were 6 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds
      

48         
49         %put &mlist;
Attrib_Itr3 Attrib_Itr4

Note that this only checks the last observation of the dataset. If you want to check across all observations, the code needs to be rewritten.

 

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 20

Re: Need to drop null columns starts with or prefixed with a certain word

Posted in reply to KurtBremser

Hi Kurt,

 

Thanks for replying, but if the dataset is having the missing values apart from the first column then your modified code is not working.

 

Please have look at this type of datalines.

data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 $ Attrib_Itr2 $ Attrib_Itr3 $ Attrib_Itr4 $ Attrib_Itr5 $ ;
datalines;
                              
#ofpeople     20       30  0 1
ewrewr                    
;
run;

data _null_;
set have end=done;
array Attrib_vars Attrib_Itr:;/* Wild card for Attrib_Itr variables*/
length list $ 500;
if done then do;
  do i= 1 to dim(attrib_vars);
    if Attrib_vars[i] = ' ' or Attrib_vars[i] in ('0','1') then list=catx(' ',trim(list),vname(Attrib_vars(i)));  
  end;
  call symputx('mlist',list);
end;
run;

%put &mlist;

 

Like in the above datalines, I have only one value in the second row and other values are missing. Also all the columns are now  character type then the above code is not working.

 

I had actually encountered the similar type of scenario, but I am getting the log like this.

 

 SYMBOLGEN:  Some characters in the above value which were subject to macro quoting have been unquoted for printing.
 SYMBOLGEN:  Macro variable GRAPHINIT resolves to GOPTIONS RESET=ALL GSFNAME=_GSFNAME;
 59         
 60         data _null_;
 61         set have end=done;
 62         array Attrib_vars Attrib_Itr:;/* Wild card for Attrib_Itr variables*/
 63         length list $ 500;
 64         if done then do;
 65           do i= 1 to dim(attrib_vars);
 66             if Attrib_vars[i] = ' ' or Attrib_vars[i] in ('0','1') then list=catx(' ',trim(list),vname(Attrib_vars(i)));
 67           end;
 68           call symputx('mlist',list);
 69         end;
 70         run;
 
 NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
       66:32   
 NOTE: There were 3 observations read from the data set WORK.HAVE.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              731.62k
       OS Memory           31208.00k
       Timestamp           10/01/2017 11:39:33 AM
       Step Count                        513  Switch Count  0
       Page Faults                       0
       Page Reclaims                     117
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           0
       
 
 71         
 72         %put &mlist;
 SYMBOLGEN:  Macro variable MLIST resolves to Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4 Attrib_Itr5
 Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4 Attrib_Itr5
 73         
 74         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 SYMBOLGEN:  Macro variable GRAPHTERM resolves to GOPTIONS NOACCESSIBLE;
 86         

Now mlist macro is resolving to all the columns rather it should resolve to only Attrib_Itr2, Attrib_Itr4 and Attrib_Itr5 as it contains only nmissing, 0 and 1 values.

 

Please suggest if this type of scenario will occur then what modification need to be done in the above code or do we have to write any proc sql method for this type of scenario.

 

Please help.

 

Thanks in advance.

Super User
Posts: 11,343

Re: Need to drop null columns starts with or prefixed with a certain word

[ Edited ]

Using @KurtBremser's data set, or yours with the appropriate name consider:

proc freq data=have nlevels;
 ods output Nlevels=Mylevelset;
run;

Use a tables statement with the variables, or a variable list, to restrict to specific variables of interest.

Which will create  a data set that looks like:

 

 

TableVar NLevels NMissLevels NNonMissLevels
category 6 0 6
Attrib_Itr1 6 0 6
Attrib_Itr2 6 0 6
Attrib_Itr3 3 1 2
Attrib_Itr4 1 1 0

A variable that has NLevels=1 and NMissLevels=1 indicates a variable with all missing values.

So this will populate your mlist macro variable:

proc sql noprint;
   select tablevar into : mlist separated by ' '
   from MyLevelset
   where Nlevels=1 and NMissLevels=1
   ;
quit;

Though I will make no claims as to efficiency for large data sets. This approach doesn't care about the variable type luckily.

 

Contributor
Posts: 20

Re: Need to drop null columns starts with or prefixed with a certain word

Hi Ballard,

 

Thanks for the reply. It is working perfectly for the missing columns, but those columns which are conatining 0 and 1, the code given by you might not work.

 

For Example

 

data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 $ Attrib_Itr2 $ Attrib_Itr3 $ Attrib_Itr4 $ Attrib_Itr5 $ ;
datalines;
                              
#ofpeople     20       30  0 1
ewrewr                    
;
run;

 

 

 

From the above code ideally  Attrib_Itr2, Attrib_Itr4 and Attrib_Itr5 should be deleted as it's having the values missing, 0 and 1.

 

Please help.

 

Thanks in advance.

 

 

 

 

 

 

 

 

Super User
Posts: 10,035

Re: Need to drop null columns starts with or prefixed with a certain word

If I understood what you mean.

 


data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4;
datalines;
#ofpeople     20       30  0               
#ofproviders  10       5               
#ofclaims     40       25                     
AmountBilled  50       100  1       
AmountPaid    11       35                          
AmountDed     5        6                          
;

data _null_;
 set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE' and upcase(name) like 'ATTRIB%')) end=last;
 if _n_=1 then call execute('proc sql;create table temp as select ');
 call execute(cat('sum(',name,' in (. 0 1)) as ',name));
 if last then call execute('from have ;quit;');
  else call execute(',');
run;
%let dsid=%sysfunc(open(have));
%let nobs=%sysfunc(attrn(&dsid,nlobs));
%let dsid=%sysfunc(close(&dsid));
proc transpose data=temp out=want;
var _all_;
run;
proc sql;
select _name_ into : drop separated by ' '
 from want
  where col1=&nobs;
run;
data final_want;
 set have;
 drop &drop;
run;
Contributor
Posts: 20

Re: Need to drop null columns starts with or prefixed with a certain word

Hi Ksharp,

 

Thanks for replying.

 

If the datalines is like this then it is not working.

 

data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 $ Attrib_Itr2 $ Attrib_Itr3 $ Attrib_Itr4 $ Attrib_Itr5 $ ;
datalines;
                              
#ofpeople     20       30  0 1
ewrewr                    
;
run;

 

The drop varibles list what you had created in your code it's resolving to all the Attrib_Itr1 to Attrib_Itr5 columns. Please try to run your code as per the above datalines example you can understand the issue.

 

Please help if possible.

 

Thanks in advance.

Super User
Posts: 10,035

Re: Need to drop null columns starts with or prefixed with a certain word

That is because your all Attrib_ variables are character type ,not numeric type.

 

data _null_;
 set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE' and upcase(name) like 'ATTRIB%')) end=last;
 if _n_=1 then call execute('proc sql;create table temp as select ');
 call execute(cat('sum(',name,' in (" " "0" "1")) as ',name));
 if last then call execute('from have ;quit;');
  else call execute(',');
run;
Solution
‎10-01-2017 05:16 PM
Frequent Contributor
Posts: 149

Re: Need to drop null columns starts with or prefixed with a certain word

Try this data-step:

data _null_;
   set have end=jobDone;
   length dropList $ 300; 
   retain dropList " ";
   
   array check Attrib_:;
   
   if _n_ = 1 then do;
      do i = 1 to dim(check);
         dropList = catx(' ', dropList, vname(check{i}));
      end;
   end;
   
   /* to test: iterating over dropList could be 
    * faster if the number of variables high */
   do i = 1 to dim(check);
      if check{i} not in (., 0, 1) then do;
         dropList = tranwrd(dropList, vname(check{i}), ' ');
      end;
   end;
   
   if jobDone then do;
      call symputx('mList', dropList);
   end;
run;

%put &=mlist;

 

 

Contributor
Posts: 20

Re: Need to drop null columns starts with or prefixed with a certain word

Posted in reply to error_prone

Thanks Prone.......Thanks a lot it's working now......I checked.....

 

 

Cheers......

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 209 views
  • 0 likes
  • 5 in conversation