DATA Step, Macro, Functions and more

Finding a value in all datasets in all variables

Accepted Solution Solved
Reply
Contributor
Posts: 54
Accepted Solution

Finding a value in all datasets in all variables

Base SAS 9.4--

I have an annoying compliance task that I'm trying to automate.

I need to find an account number (number stored as text or as a number) anywhere it appears in any dataset and in any variable in a library.

 

For example, I will need to find the value 1234567891 anywhere it appears in an entire library. It could be in one dataset in a text varible in the middle of a paragraph such as "This account, 1234567891, needs to be closed" or it could be stored in a different or same dataset in a variable that stores account numbers as numbers.

 

So far, I have generated a list of the datasets and could do something similar with each variable in each dataset using the following code. I was then going to create a loop to look for the account number in each variable in each dataset. When I found a match, it would write the "dataset, variable and obs number" to a table each time and move on to the next value to search.

I am not skilled enough to even get close. Is anyone able to assist with the looping logic?

Proc SQL noprint; /*** Create list of all the datasets to look through. ***/

Select 'work.' || memname as ViewName 

into :TableNames separated by ' '

From sashelp.vmember

Where UPCASE(LIBNAME)='WORK' 

and UPCASE(MEMTYPE)='DATA' 

;quit;

%put &TableNames; 

 

 

Thank you so much!


Accepted Solutions
Solution
‎12-07-2016 02:24 PM
Super User
Posts: 10,044

Re: Finding a value in all datasets in all variables

Oh. You have low version SAS.

Try ANY() function .

 



data a;
input a : $40. b c;
cards;
Thisaccount,1234567891 1234567891 1234567891
wwrew 232442 343
;
run;
data b;
input aa : $40. bb xxxx : $20.;
cards;
qweqwe 233 2323
Thisaccount,1234567891 1234567891 1234567891
;
run;
data c;
input c : $40. bbbbbb;
cards;
Thisaccount 1234567891
Thisaccount,1234567891 2324
;
run;

proc iml;
dsn='work.'+datasets('work');
do i=1 to nrow(dsn);
use (dsn[i]);
 read all var _char_ into x[c=vnames];
 flag=(find(x,'1234567891')^=0);
 do j=1 to nrow(flag);
  if any(flag[j,]=1) then do;
    loc=loc(flag[j,]);
    temp=vnames[loc];
    vname=vname//temp;
    obs=obs//repeat(j,nrow(temp));
    table=table//repeat(dsn[i],nrow(temp));
  end;
 end;
 
 read all var _num_ into x[c=vnames];
 flag=(x=1234567891);
 do j=1 to nrow(flag);
  if any(flag[j,]=1) then do; 
    loc=loc(flag[j,]);
    temp=vnames[loc]; 
    vname=vname//temp;
    obs=obs//repeat(j,nrow(temp));
    table=table//repeat(dsn[i],nrow(temp));
  end;
 end;
close (dsn[i]);
end;

create want var {table vname obs};
append;
close;
quit;

proc print noobs;run;

View solution in original post


All Replies
Super Contributor
Posts: 253

Re: Finding a value in all datasets in all variables

The approach you need is probably to have nested macros.

 

First, write a macro that takes 'variable' and 'value' parameters, checks that variable for that value, and outputs a row with the needed information if it maches.

 

Second, write a macro that looks at the dictionary.columns table (or sashelp.vcolumn) and makes calls to that macro, taking parameters of libname and memname, and then contains a data step that creates your compliance dataset and includes the SET statement for that dataset and the macro calls created for that dataset earlier in this sentence.

 

Third, write a macro that creates macro calls to that second macro, one for each dataset/libname combination (you have largely done that above - just put select cats('%your_macro_here(libname=',libname,',memname=',memnane,',')') or similar in it.

 

No looping required, just create lots and lots of macro calls.

Super Contributor
Posts: 253

Re: Finding a value in all datasets in all variables

Posted in reply to snoopy369

Sort of vague example:

 


%macro find_val(val=,var=);
  *test it and output;
%mend find_val;

%macro find_in_Dataset(data=,lib=,val=);
  proc sql;
    select cats('%find_val(var=',name,",val=&val.)") 
     into :findvallist separated by ' '
     from dictionary.columns
     where libname="&lib." and memname="&data."
  ;
  quit;
  data something;
    set &lib..&data.;
    *probalby some code to save data/lib here;
    &findvallist.
  run;
%mend find_in_dataset;

proc sql;
  select cats('%find_in_dataset(data=',memname,',lib=',libname,",val=&val.)")
    into :datasetlist separated by ' '
    from dictionary.tables;
quit;

&datasetlist.
  

 

Trusted Advisor
Posts: 1,584

Re: Finding a value in all datasets in all variables

Posted in reply to snoopy369

I would make just a slight change to @snoopy369's find_val macro.

If the variable is numeric you can check for equal value otherwise check using index function:

   

%macro find_val(val=,var=, type=);
   %if &type = NUM %then %do;
if &var = &val then output;
%end; %else %do;
if index(&var,"&vak) > then output;
%end; %mend find_val;

 

Contributor
Posts: 54

Re: Finding a value in all datasets in all variables

Thank you both! I'll spend some time trying to make it work.

Super User
Posts: 11,343

Re: Finding a value in all datasets in all variables

To add to @snoopy369 you can get the name and library of the data set by adding the option INDSNAMe to the set statement.

You would want to assign a length and name to a permanent variable though. A stub of code:

 

%let searchval = 4;

data temp;
   length datasetname $41. variable $32. value $100.;
   set sashelp.cars indsname=dname;
   datasetname=dname;
   array c _character_;
   array n _numeric_;
   RecordNumber = _n_;
   SearchValue = "&searchval";
   do i = 1 to dim(c);
      if find(c[i],"&searchval","i")>0 then do;
      /* found the string somewhere*/
         variable = vname(c[i]);
         value = c[i];
         output;
      end;
   end;/* character search*/
   do i = 1 to dim(n);
      if n[i] = &searchval then do;
      /* found the number*/
         variable = vname(n[i]);
         value = put(n[i],best32.);
         output;
      end;
   end;/* numeric search*/
   keep datasetname searchvalue variable RecordNumber value;
run;

Note that this approach doesn't work well for a character value or a "number" like 123-4 that isn't a valid numeric du to the way I'm looking for equality. An exercise for the interested reader (and not difficult) to either add a check for type of searchval or value.

 

Trusted Advisor
Posts: 1,022

Re: Finding a value in all datasets in all variables

One tiny tweak

 

if dim(c)>0 then do I=1 to dim(c);

    and

if dim(n)>0 then do I=1 to dim(n);

Super User
Posts: 10,044

Re: Finding a value in all datasets in all variables

It would be very easy, if you are using IML code.



data a;
input a : $40. b c;
cards;
Thisaccount,1234567891 1234567891 1234567891
wwrew 232442 343
;
run;
data b;
input aa : $40. bb xxxx : $20.;
cards;
qweqwe 233 2323
Thisaccount,1234567891 1234567891 1234567891
;
run;
data c;
input c : $40. bbbbbb;
cards;
Thisaccount 1234567891
Thisaccount,1234567891 2324
;
run;

proc iml;
dsn='work.'+datasets('work');
do i=1 to nrow(dsn);
use (dsn[i]);
 read all var _char_ into x[c=vnames];
 flag=(find(x,'1234567891')^=0);
 do j=1 to nrow(flag);
  loc=loc(flag[j,]);
  if ^isempty(loc) then do;
   temp=vnames[loc];
   vname=vname//temp;
   obs=obs//repeat(j,nrow(temp));
   table=table//repeat(dsn[i],nrow(temp));
  end;
 end;
 
 read all var _num_ into x[c=vnames];
 flag=(x=1234567891);
 do j=1 to nrow(flag);
  loc=loc(flag[j,]);
  if ^isempty(loc) then do;
   temp=vnames[loc]; 
   vname=vname//temp;
   obs=obs//repeat(j,nrow(temp));
   table=table//repeat(dsn[i],nrow(temp));
  end;
 end;
close (dsn[i]);
end;

create want var {table vname obs};
append;
close;
quit;

proc print noobs;run;

Contributor
Posts: 54

Re: Finding a value in all datasets in all variables

This is very interesting. This is the first time I'm exploring IML and I'm excited about the possibilities. The 3 tables created with data correctly but is there a configuration I need to change in my 9.4 to make it work? It created dataset 'Want' but without any data. See log:

proc iml;
NOTE: IML Ready
280  dsn='work.'+datasets('work');
281  do i=1 to nrow(dsn);
282  use (dsn[i]);
283   read all var _char_ into x[c=vnames];
284   flag=(find(x,'1234567891')^=0);
285   do j=1 to nrow(flag);
286    loc=loc(flag[j,]);
287    if ^isempty(loc) then do;
288     temp=vnames[loc];
289     vname=vname//temp;
290     obs=obs//repeat(j,nrow(temp));
291     table=table//repeat(dsn[i],nrow(temp));
292    end;
293   end;
294
295   read all var _num_ into x[c=vnames];
296   flag=(x=1234567891);
297   do j=1 to nrow(flag);
298    loc=loc(flag[j,]);
299    if ^isempty(loc) then do;
300     temp=vnames[loc];
301     vname=vname//temp;
302     obs=obs//repeat(j,nrow(temp));
303     table=table//repeat(dsn[i],nrow(temp));
304    end;
305   end;
306  close (dsn[i]);
307  end;
ERROR: File WORK.WANT.DATA does not exist.

 statement : USE at line 282 column 1
308
309  create want var {table vname obs};
310  append;
WARNING: All data set variables are unvalued. No APPEND done.

 statement : APPEND at line 310 column 1
311  close;
NOTE: Closing WORK.WANT
NOTE: The data set WORK.WANT has 0 observations and 3 variables.
312  quit;
NOTE: Exiting IML.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IML used (Total process time):
      real time           0.06 seconds
      cpu time            0.09 seconds

 

Thanks again

Super User
Posts: 10,044

Re: Finding a value in all datasets in all variables


Oh. That means No tables and no variables contain 1234567891.

Can you post some testing data to let me test and debug it ?
Contributor
Posts: 54

Re: Finding a value in all datasets in all variables

I tried just running your sample code with the 3 tables of created data (A,B and C) that did include the value we are trying to find.

I closed all SAS instances and tried running it again. When I did. It threw this error--

 

ERROR: Invocation of unresolved module ISEMPTY.

 

25   proc iml;
NOTE: Writing HTML Body file: sashtml.htm
NOTE: IML Ready
26   dsn='work.'+datasets('work');
26 !                               /*Change to your libname*/
27   do i=1 to nrow(dsn);
28   use (dsn[i]);
29    read all var _char_ into x[c=vnames];
29 !                                       /*Seach chars*/
30    flag=(find(x,'123')^=0);
30 !                           /*Change to the value you want to find*/
31    do j=1 to nrow(flag);
32     loc=loc(flag[j,]);
33     if ^isempty(loc) then do;
34      temp=vnames[loc];
35      vname=vname//temp;
36      obs=obs//repeat(j,nrow(temp));
37      table=table//repeat(dsn[i],nrow(temp));
38     end;
39    end;
40
41    read all var _num_ into x[c=vnames];
41 !                                      /*Seach numbers*/
42    flag=(x=123);
42 !               /*Change to the value you want to find*/
43    do j=1 to nrow(flag);
44     loc=loc(flag[j,]);
45     if ^isempty(loc) then do;
46      temp=vnames[loc];
47      vname=vname//temp;
48      obs=obs//repeat(j,nrow(temp));
49      table=table//repeat(dsn[i],nrow(temp));
50     end;
51    end;
52   close (dsn[i]);
53   end;
ERROR: Invocation of unresolved module ISEMPTY.

 

 

If I delete the empty dataset 'Want' and run it again, I get a different error:

135  proc iml;
NOTE: IML Ready
136  dsn='work.'+datasets('work');
136!                               /*Change to your libname*/
137  do i=1 to nrow(dsn);
138  use (dsn[i]);
139   read all var _char_ into x[c=vnames];
139!                                       /*Seach chars*/
140   flag=(find(x,'123')^=0);
140!                           /*Change to the value you want to find*/
141   do j=1 to nrow(flag);
142    loc=loc(flag[j,]);
143    if ^isempty(loc) then do;
144     temp=vnames[loc];
145     vname=vname//temp;
146     obs=obs//repeat(j,nrow(temp));
147     table=table//repeat(dsn[i],nrow(temp));
148    end;
149   end;
150
151   read all var _num_ into x[c=vnames];
151!                                      /*Seach numbers*/
152   flag=(x=123);
152!               /*Change to the value you want to find*/
153   do j=1 to nrow(flag);
154    loc=loc(flag[j,]);
155    if ^isempty(loc) then do;
156     temp=vnames[loc];
157     vname=vname//temp;
158     obs=obs//repeat(j,nrow(temp));
159     table=table//repeat(dsn[i],nrow(temp));
160    end;
161   end;
162  close (dsn[i]);
163  end;
ERROR: File WORK.WANT.DATA does not exist.

 statement : USE at line 138 column 1

And if I leave the empty dataset 'Want' and try running it again, I get this error:

 

171
172  proc iml;
NOTE: IML Ready
173  dsn='work.'+datasets('work');
173!                               /*Change to your libname*/
174  do i=1 to nrow(dsn);
175  use (dsn[i]);
176   read all var _char_ into x[c=vnames];
176!                                       /*Seach chars*/
177   flag=(find(x,'123')^=0);
177!                           /*Change to the value you want to find*/
178   do j=1 to nrow(flag);
179    loc=loc(flag[j,]);
180    if ^isempty(loc) then do;
181     temp=vnames[loc];
182     vname=vname//temp;
183     obs=obs//repeat(j,nrow(temp));
184     table=table//repeat(dsn[i],nrow(temp));
185    end;
186   end;
187
188   read all var _num_ into x[c=vnames];
188!                                      /*Seach numbers*/
189   flag=(x=123);
189!               /*Change to the value you want to find*/
190   do j=1 to nrow(flag);
191    loc=loc(flag[j,]);
192    if ^isempty(loc) then do;
193     temp=vnames[loc];
194     vname=vname//temp;
195     obs=obs//repeat(j,nrow(temp));
196     table=table//repeat(dsn[i],nrow(temp));
197    end;
198   end;
199  close (dsn[i]);
200  end;
WARNING: Data set WORK.WANT is empty.

 statement : USE at line 175 column 1
WARNING: End of File reached.

 statement : READ at line 176 column 2
ERROR: (execution) Module not loaded, operation not available.

 operation : FIND at line 177 column 12
 operands  : x, *LIT1004

x      0 row       0 col     (type ?, size 0)


*LIT1004      1 row       1 col     (character, size 3)

 123

 statement : ASSIGN at line 177 column 2
Solution
‎12-07-2016 02:24 PM
Super User
Posts: 10,044

Re: Finding a value in all datasets in all variables

Oh. You have low version SAS.

Try ANY() function .

 



data a;
input a : $40. b c;
cards;
Thisaccount,1234567891 1234567891 1234567891
wwrew 232442 343
;
run;
data b;
input aa : $40. bb xxxx : $20.;
cards;
qweqwe 233 2323
Thisaccount,1234567891 1234567891 1234567891
;
run;
data c;
input c : $40. bbbbbb;
cards;
Thisaccount 1234567891
Thisaccount,1234567891 2324
;
run;

proc iml;
dsn='work.'+datasets('work');
do i=1 to nrow(dsn);
use (dsn[i]);
 read all var _char_ into x[c=vnames];
 flag=(find(x,'1234567891')^=0);
 do j=1 to nrow(flag);
  if any(flag[j,]=1) then do;
    loc=loc(flag[j,]);
    temp=vnames[loc];
    vname=vname//temp;
    obs=obs//repeat(j,nrow(temp));
    table=table//repeat(dsn[i],nrow(temp));
  end;
 end;
 
 read all var _num_ into x[c=vnames];
 flag=(x=1234567891);
 do j=1 to nrow(flag);
  if any(flag[j,]=1) then do; 
    loc=loc(flag[j,]);
    temp=vnames[loc]; 
    vname=vname//temp;
    obs=obs//repeat(j,nrow(temp));
    table=table//repeat(dsn[i],nrow(temp));
  end;
 end;
close (dsn[i]);
end;

create want var {table vname obs};
append;
close;
quit;

proc print noobs;run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 745 views
  • 2 likes
  • 6 in conversation