Hello,
I am referencing to a 2016 article "Finding a value in all datasets in all variables."
I do not use PROC IML but we are licensed for this product.
My goal is to find a specific value '024683579' in variable TABNC in library QPPCT. The library QPPCT has over 9K datasets with nearly all of them having the same variable TABNC. I need the name of the dataset(s) found and the creation date. The purpose is to find a specific (or multiple) datasets with this value.
Can the 2016 solution be adapted for this use or is there a better way to do this?
Thank you,
Curtis
The code would work.
"Better way" might depend on the size of those 9K data sets.
One way would be at use a data set with SET statement to stack all of the data sets with the INDSNAME option to add the data set name as a variable. But this may only be viable if your data sets have been very consistent about setting variable types the same for same-named variables.
By "creation date" do you mean of the data set as would be stored in dictionary.tables or something else?
If the creation date wanted is the one in dictionary.tables (the view Sashelp.vtable ) then match and add using the data set name kept in the first.
Something along the lines of:
proc sql; create table <somelib>.tabnc as select libname, memname, catx('.',libname,memname) as dsn from dictionary.columns where libname='QPPCT' and upcase(name)='TABCN'; quit; /* write code to build the data set with values*/ data _null_; set <somelib>.tabnc end=lastone;; file "<path>\get_tabnc.sas"; if _n_=1 then do; put "data <somelib>.tabnc_values;"; put "set "; end; /* specify the data set names and keep only the TABCN variable */ put dsn +1 "(keep=tabnc) " ; if lastone then do; /* add the Indsname option and end the SET statement*/ put "indsname=dsn ;" put "Where tabnc = '024683579' ;"; /* keep the temporary variable DSN from the INDSNAME into another variable */ put "source=dsn ;"; /* add the run for the data step*/ put "Run ;"; end; run; /* execute the data step created above */ %include "<path>\get_tabnc.sas"; /* get createion dates */ Proc sql; create table <somelib>.tabcn_created as select a.dsn,b.crdate from (select distinct dsn from <somelib>.tabnc_values as a left join (select * from dictionary.tables where libname='QPPCT' )as b on a.dsn = catx('.',b.libname,b.memname) ; quit; /* the crdate should be easy to merge back to the <somelib>.tabnc_values on the value of dsn */
This has a minor advantage of the data step that selects the values is a separate program file for review.
Pick your values for the <somelib> library to store the data set and the <path> to store the generated data step code.
Obviously untested as I have none of your libraries or data sets.
By "creation date" do you mean of the data set as would be stored in dictionary.tables or something else?
If the creation date wanted is the one in dictionary.tables (the view Sashelp.vtable ) then match and add using the data set name kept in the first.
This is merely for reporting to folks who wants to know the age (but that really isn't important in this case). My apologies.
The variable TABNC can be thought of as a key ID variable.
I will look into your suggestions, thank you!
I hope you all a nice weekend! Thank you everyone! I'm testing against some of your sample code and I'll update back with the solution I've found.
Curtis
Can you set some limits on this problem?
Do all of the datasets have the same variables (defined the same way) so that you could use a single data step to read them all? Do all of the datasets of interest use a common prefix (and only the dataset of interest using that prefix)? If so then perhaps something like:
data want;
set QPPCT.prefix: indsname=dsname;
where TABNC in ('123');
if dsname ne lag(dsname);
dataset = scan(dsname,-1,'.');
keep dataset;
run;
Otherwise get a list of the dataset that have the TABNC variable and use that to write the list of dataset names.
proc contents data=QPPCT._all_ noprint out=contents; run;
proc sql ;
select catx('.',libname,memname) into :dslist separated by ' '
from contents
where upcase(name)='TABNK')
;
quit;
data want;
set &dslist indsname=dsname;
....
data a;
input a : $40. b c;
cards;
Thisaccount,024683579 1234567891 1234567891
wwrew 232442 343
;
run;
data b;
input TABNC : $40. bb xxxx : $20.;
cards;
qweqwe 233 2323
Thisaccount,024683579 1234567891 1234567891
;
run;
data c;
input c : $40. bbbbbb;
cards;
Thisaccount 1234567891
Thisaccount,1234567891 2324
;
run;
%let libname=work; *the name of library you want to search;
proc iml;
dsn="&libname.."+datasets("&libname.");
do i=1 to nrow(dsn);
use (dsn[i]);
read all var _char_ into x[c=vnames];
flag=(find(x,'024683579')^=0);
do j=1 to nrow(flag);
if any(flag[j,]=1) then do;
loc=loc(flag[j,]);
temp=vnames[loc];
vname=vname//temp;
obs=obs//repeat(j,nrow(temp));
table=table//repeat(dsn[i],nrow(temp));
end;
end;
/*
read all var _num_ into x[c=vnames];
flag=(x=1234567891);
do j=1 to nrow(flag);
if any(flag[j,]=1) then do;
loc=loc(flag[j,]);
temp=vnames[loc];
vname=vname//temp;
obs=obs//repeat(j,nrow(temp));
table=table//repeat(dsn[i],nrow(temp));
end;
end;
*/
close (dsn[i]);
end;
create want var {table vname obs};
append;
close;
quit;
proc contents data=&libname.._all_ out=out noprint;
run;
proc sql;
create table want3 as
select distinct a.*,CRDATE,MODATE
from (select * from want where upcase(vname)='TABNC') as a left join out as b
on b.MEMNAME = upcase(scan(a.TABLE,-1,'.'));
quit;
If you really need only the names of datasets with at least one instance of the value, you can build a SET statement with all the candidate datasets, but use OBS=1 to tell SAS to stop processing a dataset once an instance has been encountered. Looking for variable X=2001 in the code below:
data xxx; /*Four instances of x=2001 */
do x=1111 to 2222;
output;
end;
x=2001; output; output; output;
run;
data yyy; /*One instance of x=2001 */
do x=2222 to 1111 by -1;
output;
end;
run;
data zzz; /*No variable X */
do y= 1111 to 2222;
output;
end;
run;
proc sql noprint;
create table need as
select libname, memname, catx('.',libname,memname) as dsn
from dictionary.columns
where libname='WORK' and upcase(name)='X'
;
create table candidates as
select n.*, t.crdate
from need as N left join dictionary.tables as T
on n.libname=t.libname and n.memname=t.memname;
drop table need;
select distinct catx(' ',dsn,"(keep=x where=(x=2001) obs=1)")
into :dsn_list separated by ' '
from candidates
;
quit;
%put &=dsn_list ;
data want;
set &dsn_list
indsname=indsn ;
if _n_=1 then do;
if 0 then set candidates;
declare hash h (dataset:'candidates');
h.definekey('dsn');
h.definedata(all:'Y');
h.definedone();
end;
h.find(key:indsn);
put (_all_) (=);
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.