BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rajdeep
Pyrite | Level 9


Hi Folks,

Hope you are doing great. Suppose I have few columns named with Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 and so on. In future the same named columns can be added or dropped, in other words we can consider as dynamic as of now.

I am trying to drop those columns which is having null or 0 or 1 values.

 



data have;
input category :$10. Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4;
datalines;
#ofpeople     20       30  0               
#ofproviders  10       5               
#ofclaims     40       25                     
AmountBilled  50       100  1       
AmountPaid    11       35                          
AmountDed     5        6                          
;
options symbolgen mlogic mprint;
data _null_;
   set have;
   array Attrib_vars Attrib_Itr:;
   call symputx('Attrib_qty', dim(Attrib_vars));
   run;
   %put &Attrib_qty;
   data _null_;
   set have end=done;
   array Attrib_vars Attrib_Itr:;/* Wild card for Attrib_Itr variables*/
   length list $ 500;
   if done then do;
      do i= 1 to &Attrib_qty;

if Attrib_vars[i] = ' ' or Attrib_vars[i] in (0,1) then list=catx(' ',trim(list),vname(Attrib_vars(i)));  
  
      end;
   call symputx('mlist',list);
 end;
run;
%put &mlist;

                                                                                                  
data D5;
   set have;
   drop &mlist;
run;


I am trying the above code in my scenario, but it's storing all the columns starts with Attrib_Itr into the mlist macro variable. Like mlist resolves to Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4.

But ideally I think mlist should have only Attrib_Itr3 and Attrib_Itr4 columns as these columns only contains 0,1 and null values.

What I am doing the mistake can anybody please let me know?

Please help if possible.

Thanks
Rajdeep.

1 ACCEPTED SOLUTION

Accepted Solutions
error_prone
Barite | Level 11

Try this data-step:

data _null_;
   set have end=jobDone;
   length dropList $ 300; 
   retain dropList " ";
   
   array check Attrib_:;
   
   if _n_ = 1 then do;
      do i = 1 to dim(check);
         dropList = catx(' ', dropList, vname(check{i}));
      end;
   end;
   
   /* to test: iterating over dropList could be 
    * faster if the number of variables high */
   do i = 1 to dim(check);
      if check{i} not in (., 0, 1) then do;
         dropList = tranwrd(dropList, vname(check{i}), ' ');
      end;
   end;
   
   if jobDone then do;
      call symputx('mList', dropList);
   end;
run;

%put &=mlist;

 

 

View solution in original post

9 REPLIES 9
Kurt_Bremser
Super User

First of all, please review your initial data step to create dataset have; when I copy/paste and run it, I only get 3 observations, which I guess is not what's intended. Be precise in setting missing values, so that the whole input list is satisfied. Or use the truncover option.

 

I then ran a slightly changed variation of your code:

data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4;
datalines;
#ofpeople     20       30  0               
#ofproviders  10       5               
#ofclaims     40       25                     
AmountBilled  50       100  1       
AmountPaid    11       35                          
AmountDed     5        6                          
;
run;

data _null_;
set have end=done;
array Attrib_vars Attrib_Itr:;/* Wild card for Attrib_Itr variables*/
length list $ 500;
if done then do;
  do i= 1 to dim(attrib_vars);
    if Attrib_vars[i] = ' ' or Attrib_vars[i] in (0,1) then list=catx(' ',trim(list),vname(Attrib_vars(i)));  
  end;
  call symputx('mlist',list);
end;
run;

%put &mlist;

and got this log (excerpt):

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
      43:25   
NOTE: There were 6 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds
      

48         
49         %put &mlist;
Attrib_Itr3 Attrib_Itr4

Note that this only checks the last observation of the dataset. If you want to check across all observations, the code needs to be rewritten.

 

 

rajdeep
Pyrite | Level 9

Hi Kurt,

 

Thanks for replying, but if the dataset is having the missing values apart from the first column then your modified code is not working.

 

Please have look at this type of datalines.

data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 $ Attrib_Itr2 $ Attrib_Itr3 $ Attrib_Itr4 $ Attrib_Itr5 $ ;
datalines;
                              
#ofpeople     20       30  0 1
ewrewr                    
;
run;

data _null_;
set have end=done;
array Attrib_vars Attrib_Itr:;/* Wild card for Attrib_Itr variables*/
length list $ 500;
if done then do;
  do i= 1 to dim(attrib_vars);
    if Attrib_vars[i] = ' ' or Attrib_vars[i] in ('0','1') then list=catx(' ',trim(list),vname(Attrib_vars(i)));  
  end;
  call symputx('mlist',list);
end;
run;

%put &mlist;

 

Like in the above datalines, I have only one value in the second row and other values are missing. Also all the columns are now  character type then the above code is not working.

 

I had actually encountered the similar type of scenario, but I am getting the log like this.

 

 SYMBOLGEN:  Some characters in the above value which were subject to macro quoting have been unquoted for printing.
 SYMBOLGEN:  Macro variable GRAPHINIT resolves to GOPTIONS RESET=ALL GSFNAME=_GSFNAME;
 59         
 60         data _null_;
 61         set have end=done;
 62         array Attrib_vars Attrib_Itr:;/* Wild card for Attrib_Itr variables*/
 63         length list $ 500;
 64         if done then do;
 65           do i= 1 to dim(attrib_vars);
 66             if Attrib_vars[i] = ' ' or Attrib_vars[i] in ('0','1') then list=catx(' ',trim(list),vname(Attrib_vars(i)));
 67           end;
 68           call symputx('mlist',list);
 69         end;
 70         run;
 
 NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
       66:32   
 NOTE: There were 3 observations read from the data set WORK.HAVE.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              731.62k
       OS Memory           31208.00k
       Timestamp           10/01/2017 11:39:33 AM
       Step Count                        513  Switch Count  0
       Page Faults                       0
       Page Reclaims                     117
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           0
       
 
 71         
 72         %put &mlist;
 SYMBOLGEN:  Macro variable MLIST resolves to Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4 Attrib_Itr5
 Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4 Attrib_Itr5
 73         
 74         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 SYMBOLGEN:  Macro variable GRAPHTERM resolves to GOPTIONS NOACCESSIBLE;
 86         

Now mlist macro is resolving to all the columns rather it should resolve to only Attrib_Itr2, Attrib_Itr4 and Attrib_Itr5 as it contains only nmissing, 0 and 1 values.

 

Please suggest if this type of scenario will occur then what modification need to be done in the above code or do we have to write any proc sql method for this type of scenario.

 

Please help.

 

Thanks in advance.

ballardw
Super User

Using @Kurt_Bremser's data set, or yours with the appropriate name consider:

proc freq data=have nlevels;
 ods output Nlevels=Mylevelset;
run;

Use a tables statement with the variables, or a variable list, to restrict to specific variables of interest.

Which will create  a data set that looks like:

 

 

TableVar NLevels NMissLevels NNonMissLevels
category 6 0 6
Attrib_Itr1 6 0 6
Attrib_Itr2 6 0 6
Attrib_Itr3 3 1 2
Attrib_Itr4 1 1 0

A variable that has NLevels=1 and NMissLevels=1 indicates a variable with all missing values.

So this will populate your mlist macro variable:

proc sql noprint;
   select tablevar into : mlist separated by ' '
   from MyLevelset
   where Nlevels=1 and NMissLevels=1
   ;
quit;

Though I will make no claims as to efficiency for large data sets. This approach doesn't care about the variable type luckily.

 

rajdeep
Pyrite | Level 9

Hi Ballard,

 

Thanks for the reply. It is working perfectly for the missing columns, but those columns which are conatining 0 and 1, the code given by you might not work.

 

For Example

 

data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 $ Attrib_Itr2 $ Attrib_Itr3 $ Attrib_Itr4 $ Attrib_Itr5 $ ;
datalines;
                              
#ofpeople     20       30  0 1
ewrewr                    
;
run;

 

 

 

From the above code ideally  Attrib_Itr2, Attrib_Itr4 and Attrib_Itr5 should be deleted as it's having the values missing, 0 and 1.

 

Please help.

 

Thanks in advance.

 

 

 

 

 

 

 

 

Ksharp
Super User

If I understood what you mean.

 


data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4;
datalines;
#ofpeople     20       30  0               
#ofproviders  10       5               
#ofclaims     40       25                     
AmountBilled  50       100  1       
AmountPaid    11       35                          
AmountDed     5        6                          
;

data _null_;
 set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE' and upcase(name) like 'ATTRIB%')) end=last;
 if _n_=1 then call execute('proc sql;create table temp as select ');
 call execute(cat('sum(',name,' in (. 0 1)) as ',name));
 if last then call execute('from have ;quit;');
  else call execute(',');
run;
%let dsid=%sysfunc(open(have));
%let nobs=%sysfunc(attrn(&dsid,nlobs));
%let dsid=%sysfunc(close(&dsid));
proc transpose data=temp out=want;
var _all_;
run;
proc sql;
select _name_ into : drop separated by ' '
 from want
  where col1=&nobs;
run;
data final_want;
 set have;
 drop &drop;
run;
rajdeep
Pyrite | Level 9

Hi Ksharp,

 

Thanks for replying.

 

If the datalines is like this then it is not working.

 

data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 $ Attrib_Itr2 $ Attrib_Itr3 $ Attrib_Itr4 $ Attrib_Itr5 $ ;
datalines;
                              
#ofpeople     20       30  0 1
ewrewr                    
;
run;

 

The drop varibles list what you had created in your code it's resolving to all the Attrib_Itr1 to Attrib_Itr5 columns. Please try to run your code as per the above datalines example you can understand the issue.

 

Please help if possible.

 

Thanks in advance.

Ksharp
Super User

That is because your all Attrib_ variables are character type ,not numeric type.

 

data _null_;
 set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE' and upcase(name) like 'ATTRIB%')) end=last;
 if _n_=1 then call execute('proc sql;create table temp as select ');
 call execute(cat('sum(',name,' in (" " "0" "1")) as ',name));
 if last then call execute('from have ;quit;');
  else call execute(',');
run;
error_prone
Barite | Level 11

Try this data-step:

data _null_;
   set have end=jobDone;
   length dropList $ 300; 
   retain dropList " ";
   
   array check Attrib_:;
   
   if _n_ = 1 then do;
      do i = 1 to dim(check);
         dropList = catx(' ', dropList, vname(check{i}));
      end;
   end;
   
   /* to test: iterating over dropList could be 
    * faster if the number of variables high */
   do i = 1 to dim(check);
      if check{i} not in (., 0, 1) then do;
         dropList = tranwrd(dropList, vname(check{i}), ' ');
      end;
   end;
   
   if jobDone then do;
      call symputx('mList', dropList);
   end;
run;

%put &=mlist;

 

 

rajdeep
Pyrite | Level 9

Thanks Prone.......Thanks a lot it's working now......I checked.....

 

 

Cheers......

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1255 views
  • 0 likes
  • 5 in conversation