BookmarkSubscribeRSS Feed
CurtisER
Obsidian | Level 7

Hello,

 

I am referencing to a 2016 article "Finding a value in all datasets in all variables."

 

I do not use PROC IML but we are licensed for this product.

 

My goal is to find a specific value '024683579' in variable TABNC in library QPPCT.  The library QPPCT has over 9K datasets with nearly all of them having the same variable TABNC.  I need the name of the dataset(s) found and the creation date.  The purpose is to find a specific (or multiple) datasets with this value.

 

Can the 2016 solution be adapted for this use or is there a better way to do this?

 

Thank you,

Curtis

6 REPLIES 6
ballardw
Super User

The code would work.

 

"Better way" might depend on the size of those 9K data sets.

One way would be at use a data set with SET statement to stack all of the data sets with the INDSNAME option to add the data set name as a variable. But this may only be viable if your data sets have been very consistent about setting variable types the same for same-named variables.

 

By "creation date" do you mean of the data set as would be stored in dictionary.tables or something else?

If the creation date wanted is the one in dictionary.tables (the view Sashelp.vtable ) then match and add using the data set name kept in the first.

 

Something along the lines of:

proc sql;
   create table <somelib>.tabnc as
   select libname, memname, catx('.',libname,memname) as dsn
   from dictionary.columns
   where libname='QPPCT' and upcase(name)='TABCN';
quit;

/* write code to build the data set with values*/

data _null_;
   set <somelib>.tabnc end=lastone;;
   file "<path>\get_tabnc.sas";
   if _n_=1 then do;
      put "data <somelib>.tabnc_values;";
      put "set ";
   end;
   /* specify the data set names and keep only the TABCN variable */
   put dsn +1 "(keep=tabnc) " ;
   if lastone then do;
      /* add the Indsname option and end the SET statement*/
      put "indsname=dsn ;"
      put "Where tabnc =  '024683579' ;";
      /* keep the temporary variable DSN from the INDSNAME into
         another variable
      */
      put "source=dsn ;";
      /* add the run for the data step*/
      put "Run ;";
      
   end; 
run;

/* execute the data step created above */
%include "<path>\get_tabnc.sas";

/* get createion dates */
Proc sql;
    create table <somelib>.tabcn_created as
    select a.dsn,b.crdate
    from (select distinct dsn from  <somelib>.tabnc_values as a
         left join
         (select * from dictionary.tables 
          where libname='QPPCT' )as b
          on a.dsn = catx('.',b.libname,b.memname)
    ;
quit;

/* the crdate should be easy to merge back to the <somelib>.tabnc_values
  on the value of dsn
*/

This has a minor advantage of the data step that selects the values is a separate program file for review.

Pick your values for the <somelib> library to store the data set and the <path> to store the generated data step code.

 

Obviously untested as I have none of  your libraries or data sets.

CurtisER
Obsidian | Level 7

By "creation date" do you mean of the data set as would be stored in dictionary.tables or something else?

If the creation date wanted is the one in dictionary.tables (the view Sashelp.vtable ) then match and add using the data set name kept in the first.

This is merely for reporting to folks who wants to know the age (but that really isn't important in this case).  My apologies.

 

The variable TABNC can be thought of as a key ID variable.

 

I will look into your suggestions, thank you!

 

CurtisER
Obsidian | Level 7

I hope you all a nice weekend!  Thank you everyone!  I'm testing against some of your sample code and I'll update back with the solution I've found.

 

Curtis

Tom
Super User Tom
Super User

Can you set some limits on this problem?

Do all of the datasets have the same variables (defined the same way)  so that you could use a single data step to read them all? Do all of the datasets of interest use a common prefix (and only the dataset of interest using that prefix)?  If so then perhaps something like:

data want;
  set QPPCT.prefix: indsname=dsname;
  where TABNC in ('123');
  if dsname ne lag(dsname);
  dataset = scan(dsname,-1,'.');
  keep dataset;
run;

Otherwise get a list of the dataset that have the TABNC variable and use that to write the list of dataset names.

proc contents data=QPPCT._all_ noprint out=contents; run;
proc sql ;
  select catx('.',libname,memname) into :dslist separated by ' '
  from contents
  where upcase(name)='TABNK')
;
quit;
data want;
  set &dslist indsname=dsname;
....

 

Ksharp
Super User


data a;
input a : $40. b c;
cards;
Thisaccount,024683579 1234567891 1234567891
wwrew 232442 343
;
run;
data b;
input TABNC  : $40. bb xxxx : $20.;
cards;
qweqwe 233 2323
Thisaccount,024683579 1234567891 1234567891
;
run;
data c;
input c : $40. bbbbbb;
cards;
Thisaccount 1234567891
Thisaccount,1234567891 2324
;
run;


%let libname=work;  *the name of library you want to search;

proc iml;
dsn="&libname.."+datasets("&libname.");
do i=1 to nrow(dsn);
use (dsn[i]);
 read all var _char_ into x[c=vnames];
 flag=(find(x,'024683579')^=0);
 do j=1 to nrow(flag);
  if any(flag[j,]=1) then do;
    loc=loc(flag[j,]);
    temp=vnames[loc];
    vname=vname//temp;
    obs=obs//repeat(j,nrow(temp));
    table=table//repeat(dsn[i],nrow(temp));
  end;
 end;
/* 
 read all var _num_ into x[c=vnames];
 flag=(x=1234567891);
 do j=1 to nrow(flag);
  if any(flag[j,]=1) then do; 
    loc=loc(flag[j,]);
    temp=vnames[loc]; 
    vname=vname//temp;
    obs=obs//repeat(j,nrow(temp));
    table=table//repeat(dsn[i],nrow(temp));
  end;
 end;
*/
close (dsn[i]);
end;

create want var {table vname obs};
append;
close;
quit;

proc contents data=&libname.._all_ out=out noprint;
run;
proc sql;
create table want3 as
select distinct a.*,CRDATE,MODATE
 from (select * from want where upcase(vname)='TABNC') as a left join out as b
  on b.MEMNAME = upcase(scan(a.TABLE,-1,'.'));
quit;
mkeintz
PROC Star

If you really need only the names of datasets with at least one instance of the value, you can build a SET statement with all the candidate datasets, but use OBS=1 to tell SAS to stop processing a dataset once an instance has been encountered.  Looking for variable X=2001 in the code below:

 

data xxx;  /*Four instances of x=2001 */
  do x=1111 to 2222;
    output;

  end;
  x=2001;  output; output; output;
run;
data yyy;  /*One instance of x=2001 */
  do x=2222 to 1111 by -1;
    output;
  end;
run;
data zzz;  /*No variable X */
  do y= 1111 to 2222;
    output;
  end;
run;
proc sql noprint;
  create table need as
    select libname, memname, catx('.',libname,memname) as dsn
    from dictionary.columns
   where libname='WORK' and upcase(name)='X'
   ;
  create table candidates as 
    select n.*,     t.crdate
    from need as N left join dictionary.tables as T
    on n.libname=t.libname and n.memname=t.memname;
  drop table need;
  select distinct  catx(' ',dsn,"(keep=x where=(x=2001) obs=1)")
    into :dsn_list separated by ' ' 
     from candidates
  ;
quit;

%put &=dsn_list ;
data want;
  set &dsn_list
      indsname=indsn ;
  if _n_=1 then do;
    if 0 then set candidates;
    declare hash h (dataset:'candidates');
      h.definekey('dsn');
      h.definedata(all:'Y');
      h.definedone();
  end;
  h.find(key:indsn);
  put (_all_) (=);
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 571 views
  • 0 likes
  • 5 in conversation