Hi Folks,
Hope you are doing great. Suppose I have few columns named with Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 and so on. In future the same named columns can be added or dropped, in other words we can consider as dynamic as of now.
I am trying to drop those columns which is having null or 0 or 1 values.
data have;
input category :$10. Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4;
datalines;
#ofpeople 20 30 0
#ofproviders 10 5
#ofclaims 40 25
AmountBilled 50 100 1
AmountPaid 11 35
AmountDed 5 6
;
options symbolgen mlogic mprint;
data _null_;
set have;
array Attrib_vars Attrib_Itr:;
call symputx('Attrib_qty', dim(Attrib_vars));
run;
%put &Attrib_qty;
data _null_;
set have end=done;
array Attrib_vars Attrib_Itr:;/* Wild card for Attrib_Itr variables*/
length list $ 500;
if done then do;
do i= 1 to &Attrib_qty;
if Attrib_vars[i] = ' ' or Attrib_vars[i] in (0,1) then list=catx(' ',trim(list),vname(Attrib_vars(i)));
end;
call symputx('mlist',list);
end;
run;
%put &mlist;
data D5;
set have;
drop &mlist;
run;
I am trying the above code in my scenario, but it's storing all the columns starts with Attrib_Itr into the mlist macro variable. Like mlist resolves to Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4.
But ideally I think mlist should have only Attrib_Itr3 and Attrib_Itr4 columns as these columns only contains 0,1 and null values.
What I am doing the mistake can anybody please let me know?
Please help if possible.
Thanks
Rajdeep.
Try this data-step:
data _null_;
set have end=jobDone;
length dropList $ 300;
retain dropList " ";
array check Attrib_:;
if _n_ = 1 then do;
do i = 1 to dim(check);
dropList = catx(' ', dropList, vname(check{i}));
end;
end;
/* to test: iterating over dropList could be
* faster if the number of variables high */
do i = 1 to dim(check);
if check{i} not in (., 0, 1) then do;
dropList = tranwrd(dropList, vname(check{i}), ' ');
end;
end;
if jobDone then do;
call symputx('mList', dropList);
end;
run;
%put &=mlist;
First of all, please review your initial data step to create dataset have; when I copy/paste and run it, I only get 3 observations, which I guess is not what's intended. Be precise in setting missing values, so that the whole input list is satisfied. Or use the truncover option.
I then ran a slightly changed variation of your code:
data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4;
datalines;
#ofpeople 20 30 0
#ofproviders 10 5
#ofclaims 40 25
AmountBilled 50 100 1
AmountPaid 11 35
AmountDed 5 6
;
run;
data _null_;
set have end=done;
array Attrib_vars Attrib_Itr:;/* Wild card for Attrib_Itr variables*/
length list $ 500;
if done then do;
do i= 1 to dim(attrib_vars);
if Attrib_vars[i] = ' ' or Attrib_vars[i] in (0,1) then list=catx(' ',trim(list),vname(Attrib_vars(i)));
end;
call symputx('mlist',list);
end;
run;
%put &mlist;
and got this log (excerpt):
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 43:25 NOTE: There were 6 observations read from the data set WORK.HAVE. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds 48 49 %put &mlist; Attrib_Itr3 Attrib_Itr4
Note that this only checks the last observation of the dataset. If you want to check across all observations, the code needs to be rewritten.
Hi Kurt,
Thanks for replying, but if the dataset is having the missing values apart from the first column then your modified code is not working.
Please have look at this type of datalines.
data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 $ Attrib_Itr2 $ Attrib_Itr3 $ Attrib_Itr4 $ Attrib_Itr5 $ ;
datalines;
#ofpeople 20 30 0 1
ewrewr
;
run;
data _null_;
set have end=done;
array Attrib_vars Attrib_Itr:;/* Wild card for Attrib_Itr variables*/
length list $ 500;
if done then do;
do i= 1 to dim(attrib_vars);
if Attrib_vars[i] = ' ' or Attrib_vars[i] in ('0','1') then list=catx(' ',trim(list),vname(Attrib_vars(i)));
end;
call symputx('mlist',list);
end;
run;
%put &mlist;
Like in the above datalines, I have only one value in the second row and other values are missing. Also all the columns are now character type then the above code is not working.
I had actually encountered the similar type of scenario, but I am getting the log like this.
SYMBOLGEN: Some characters in the above value which were subject to macro quoting have been unquoted for printing.
SYMBOLGEN: Macro variable GRAPHINIT resolves to GOPTIONS RESET=ALL GSFNAME=_GSFNAME;
59
60 data _null_;
61 set have end=done;
62 array Attrib_vars Attrib_Itr:;/* Wild card for Attrib_Itr variables*/
63 length list $ 500;
64 if done then do;
65 do i= 1 to dim(attrib_vars);
66 if Attrib_vars[i] = ' ' or Attrib_vars[i] in ('0','1') then list=catx(' ',trim(list),vname(Attrib_vars(i)));
67 end;
68 call symputx('mlist',list);
69 end;
70 run;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
66:32
NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 731.62k
OS Memory 31208.00k
Timestamp 10/01/2017 11:39:33 AM
Step Count 513 Switch Count 0
Page Faults 0
Page Reclaims 117
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
71
72 %put &mlist;
SYMBOLGEN: Macro variable MLIST resolves to Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4 Attrib_Itr5
Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4 Attrib_Itr5
73
74 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
SYMBOLGEN: Macro variable GRAPHTERM resolves to GOPTIONS NOACCESSIBLE;
86
Now mlist macro is resolving to all the columns rather it should resolve to only Attrib_Itr2, Attrib_Itr4 and Attrib_Itr5 as it contains only nmissing, 0 and 1 values.
Please suggest if this type of scenario will occur then what modification need to be done in the above code or do we have to write any proc sql method for this type of scenario.
Please help.
Thanks in advance.
Using @Kurt_Bremser's data set, or yours with the appropriate name consider:
proc freq data=have nlevels; ods output Nlevels=Mylevelset; run;
Use a tables statement with the variables, or a variable list, to restrict to specific variables of interest.
Which will create a data set that looks like:
TableVar | NLevels | NMissLevels | NNonMissLevels |
---|---|---|---|
category | 6 | 0 | 6 |
Attrib_Itr1 | 6 | 0 | 6 |
Attrib_Itr2 | 6 | 0 | 6 |
Attrib_Itr3 | 3 | 1 | 2 |
Attrib_Itr4 | 1 | 1 | 0 |
A variable that has NLevels=1 and NMissLevels=1 indicates a variable with all missing values.
So this will populate your mlist macro variable:
proc sql noprint; select tablevar into : mlist separated by ' ' from MyLevelset where Nlevels=1 and NMissLevels=1 ; quit;
Though I will make no claims as to efficiency for large data sets. This approach doesn't care about the variable type luckily.
Hi Ballard,
Thanks for the reply. It is working perfectly for the missing columns, but those columns which are conatining 0 and 1, the code given by you might not work.
For Example
data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 $ Attrib_Itr2 $ Attrib_Itr3 $ Attrib_Itr4 $ Attrib_Itr5 $ ;
datalines;
#ofpeople 20 30 0 1
ewrewr
;
run;
From the above code ideally Attrib_Itr2, Attrib_Itr4 and Attrib_Itr5 should be deleted as it's having the values missing, 0 and 1.
Please help.
Thanks in advance.
If I understood what you mean.
data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 Attrib_Itr2 Attrib_Itr3 Attrib_Itr4;
datalines;
#ofpeople 20 30 0
#ofproviders 10 5
#ofclaims 40 25
AmountBilled 50 100 1
AmountPaid 11 35
AmountDed 5 6
;
data _null_;
set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE' and upcase(name) like 'ATTRIB%')) end=last;
if _n_=1 then call execute('proc sql;create table temp as select ');
call execute(cat('sum(',name,' in (. 0 1)) as ',name));
if last then call execute('from have ;quit;');
else call execute(',');
run;
%let dsid=%sysfunc(open(have));
%let nobs=%sysfunc(attrn(&dsid,nlobs));
%let dsid=%sysfunc(close(&dsid));
proc transpose data=temp out=want;
var _all_;
run;
proc sql;
select _name_ into : drop separated by ' '
from want
where col1=&nobs;
run;
data final_want;
set have;
drop &drop;
run;
Hi Ksharp,
Thanks for replying.
If the datalines is like this then it is not working.
data have;
infile datalines truncover;
input category :$10. Attrib_Itr1 $ Attrib_Itr2 $ Attrib_Itr3 $ Attrib_Itr4 $ Attrib_Itr5 $ ;
datalines;
#ofpeople 20 30 0 1
ewrewr
;
run;
The drop varibles list what you had created in your code it's resolving to all the Attrib_Itr1 to Attrib_Itr5 columns. Please try to run your code as per the above datalines example you can understand the issue.
Please help if possible.
Thanks in advance.
That is because your all Attrib_ variables are character type ,not numeric type.
data _null_;
set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE' and upcase(name) like 'ATTRIB%')) end=last;
if _n_=1 then call execute('proc sql;create table temp as select ');
call execute(cat('sum(',name,' in (" " "0" "1")) as ',name));
if last then call execute('from have ;quit;');
else call execute(',');
run;
Try this data-step:
data _null_;
set have end=jobDone;
length dropList $ 300;
retain dropList " ";
array check Attrib_:;
if _n_ = 1 then do;
do i = 1 to dim(check);
dropList = catx(' ', dropList, vname(check{i}));
end;
end;
/* to test: iterating over dropList could be
* faster if the number of variables high */
do i = 1 to dim(check);
if check{i} not in (., 0, 1) then do;
dropList = tranwrd(dropList, vname(check{i}), ' ');
end;
end;
if jobDone then do;
call symputx('mList', dropList);
end;
run;
%put &=mlist;
Thanks Prone.......Thanks a lot it's working now......I checked.....
Cheers......
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.