I have a situation that requires knowing what a data set list expansion will be, regardless of common variable types being mixed, or of differing lengths.
Consider the complicated scenario that has a combination of single data sets, name prefix lists and numbered range lists.
data x_a ; x=1 ; a='abc' ; do i=1 to 4 ; output ; end ;
data x_b ; x=1 ; a='defg' ; b='xyzzy' ; do i=1 to 4 ; output ; end ;
data x_d ; y=1 ; b='xyz' ; a='pqrstuv' ; i=1; stop ;
data x_f ; x='aaa' ; f='pqr' ; do i=1 to 4 ; output ; end ;
data y1 y2 y3 y4 y9 y10 y11; qqq='rrr' ; do i=1 to 4 ; output ; end ;
The following code captures the data set names, except for the one that has zero rows.
* no multiple length warning, because drop=_all_ ==> no variables; * one record per input data set (that contains at least one row) ; * output does not contain dataset='X_D' ; options obs=1; data datasets; set x_:(drop=_all_) y1-y4(drop=_all_) y9(drop=_all_) indsname=libmem ; dataset = libmem ; run ; options obs=max;
NOTE: There were 1 observations read from the data set WORK.X_A. NOTE: There were 1 observations read from the data set WORK.X_B. NOTE: There were 0 observations read from the data set WORK.X_D. NOTE: There were 1 observations read from the data set WORK.X_F. NOTE: There were 1 observations read from the data set WORK.Y1. NOTE: There were 1 observations read from the data set WORK.Y2. NOTE: There were 1 observations read from the data set WORK.Y3. NOTE: There were 1 observations read from the data set WORK.Y4. NOTE: There were 1 observations read from the data set WORK.Y9. NOTE: The data set WORK.DATASETS has 8 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Obs dataset
1 WORK.X_A
2 WORK.X_B
3 WORK.X_F
4 WORK.Y1
5 WORK.Y2
6 WORK.Y3
7 WORK.Y4
8 WORK.Y9
Short of writing my own name list expansion routine, is there a way to capture the no rows table?
Instead of using the INDSNAME parameter, you could parse the log notes, which include the datasets with zero observations. All those notes have the text 'observations read from the data set', which can be used as the parameter of the INPUT @'character string' technique:
data x_a ; x=1 ; a='abc' ; do i=1 to 4 ; output ; end ;
data x_b ; x=1 ; a='defg' ; b='xyzzy' ; do i=1 to 4 ; output ; end ;
data x_d ; y=1 ; b='xyz' ; a='pqrstuv' ; i=1; stop ;
data x_f ; x='aaa' ; f='pqr' ; do i=1 to 4 ; output ; end ;
data y1 y2 y3 y4 y9 y10 y11; qqq='rrr' ; do i=1 to 4 ; output ; end ;
data x_1 x_empty x_3;
x=1; output x_1;
x=3; output x_3; output x_3;
run;
options obs=1;
filename tmplog temp ;
proc printto log=tmplog new;
run;
data _null_;
set x_:(drop=_all_) y1-y4(drop=_all_) y9(drop=_all_) ;
run ;
proc printto; run;
options obs=max;
data datasets;
infile tmplog dlm=' .';
input @'observations read from the data set' dslib :$8. dsname :$32. ;
run;
Do you want to list only datasets with some observations after knowing what datasets have no observations?
If so, I would use DICTIONARY.TABLES to identify datasets with/without observations.
Eg:
data x_a ; x=1 ; a='abc' ; do i=1 to 4 ; output ; end ;
data x_b ; x=1 ; a='defg' ; b='xyzzy' ; do i=1 to 4 ; output ; end ;
data x_d ; y=1 ; b='xyz' ; a='pqrstuv' ; i=1; stop ;
data x_f ; x='aaa' ; f='pqr' ; do i=1 to 4 ; output ; end ;
data y1 y2 y3 y4 y9 y10 y11; qqq='rrr' ; do i=1 to 4 ; output ; end ;
proc sql noprint;
select distinct memname
into: dsn separated by ' '
from dictionary.tables
where libname= 'WORK' and nobs eq 0 /*or NE 0 to capture datasets with some obs*/;
quit;
%put &=dsn;
Unfortnately no. I have to deal with data sets specified by a macro argument, such as
%macro zzz(datasets=) ; ... %mend zzz; %zzz(datasets=x_: y1-y4 y9)
It looks like I'll end up using Proc PRINTTO to capture a log snippet and then parse that.
It is still possible to use &DSN macro variable as your macro parameter.
It passes all desired datasets (with observations) into macro parameter 'datasets='.
data x_a ; x=1 ; a='abc' ; do i=1 to 4 ; output ; end ;
data x_b ; x=1 ; a='defg' ; b='xyzzy' ; do i=1 to 4 ; output ; end ;
data x_d ; y=1 ; b='xyz' ; a='pqrstuv' ; i=1; stop ;
data x_f ; x='aaa' ; f='pqr' ; do i=1 to 4 ; output ; end ;
data y1 y2 y3 y4 y9 y10 y11; qqq='rrr' ; do i=1 to 4 ; output ; end ;
proc sql noprint;
select distinct memname
into: dsn separated by ' '
from dictionary.tables
where libname= 'WORK' and nobs NE 0;
quit;
%put &=dsn;
%macro zzz(datasets=) ;
...
%mend zzz;
%zzz(datasets=&dsn);
Does something like this do the job?
data x_a ; x=1 ; a='abc' ; do i=1 to 4 ; output ; end ;
data x_b ; x=1 ; a='defg' ; b='xyzzy' ; do i=1 to 4 ; output ; end ;
data x_d ; y=1 ; b='xyz' ; a='pqrstuv' ; i=1; stop ;
data x_f ; x='aaa' ; f='pqr' ; do i=1 to 4 ; output ; end ;
data y1 y2 y3 y4 y9 y10 y11; qqq='rrr' ; do i=1 to 4 ; output ; end ;
run;
%macro expandDataSetsList(lib=work, datasets=) ;
%LOCAL rc DataSetsList prefix;
%let prefix=_%sysfunc(datetime(),b8601dt15.)_;
%let rc = %sysfunc(doSubL(%str(
options nonotes nosource nostimer nofullstimer nomprint nomlogic nosymbolgen;
proc sql noprint;
create table work.&prefix.tmp as
select memname
from dictionary.tables
where libname=%upcase("&lib.")
;
quit;
proc transpose data=work.&prefix.tmp out=work.&prefix.tmp2;
var memname;
id memname;
run;
proc transpose data=work.&prefix.tmp2(obs=0) out=work.&prefix.tmp3(keep=_name_);
var &datasets.;
run;
proc sql noprint;
select _name_
into :DataSetsList separated by " "
from &prefix.tmp3
;
drop table work.&prefix.tmp;
drop table work.&prefix.tmp2;
drop table work.&prefix.tmp3;
quit;
)));
&DataSetsList.
%mend expandDataSetsList;
%PUT %expandDataSetsList(lib=work,datasets=x_: y1-y4 y9);
I assumed all datasets are in one library.
Bart
@RichardAD wrote:
Hi Bart:
Very interesting perspective shift, switching from data set list to variable list. I'll chew on this a bit thinking about the datasets= specifying tables in more than one library.
I think @yabwon nailed it (which only makes it even more likely that I'll ping him the next time I see an interesting macro problem that I'm not creative enough to solve. : )
Even if you want to extend it to allow:
%PUT %expandDataSetsList(datasets=foo.x_: bar.y1-bar.y4 foo.y9);
I think parsing the librefs out of the list should be straight forward, then you could essentially run the logic in chunks, once for each libref.
@RichardAD wrote:
Unfortnately no. I have to deal with data sets specified by a macro argument, such as
%macro zzz(datasets=) ; ... %mend zzz; %zzz(datasets=x_: y1-y4 y9)It looks like I'll end up using Proc PRINTTO to capture a log snippet and then parse that.
Hi, that seems feasible, but would be an annoying way to have to do it. It certainly feels like there should be a better way for SAS to do it for you.
I'm trying to think, what are the different places that data set lists are supported?
PROC DATASETS allows them and I tried:
proc datasets library=work memtype=data ;
select x_: y1-y4 y9 ;
quit ;
But it doesn't work, because SELECT can only be used with the COPY statement and a few other statements.
Hmm, glancing at the docs, it looks like dataset lists are only used for SET statement MERGE statement and PROC DATASETS... https://documentation.sas.com/doc/en/lrcon/9.4/n0oru3akf51pi5n1wco2onzr39oz.htm
I still have hope that someone will come up with a neat way to expand a dataset list. Maybe @yabwon has a thought?
I do NOT understand what you want.
I THINK you are asking how to convert something like:
x_: y1-y4 y9
into the list of dataset names that a SET statement would find.
I do NOT understand why that would be useful.
But it does not seem too hard. For example a query like this:
select memname
from dictionary.members
where libname='WORK'
and (memname like 'X^_%' escape '^'
or (memname like 'Y%' and input(substr(memname,2),?32.) between 1 and 4)
or memname = 'Y9'
)
;
It should not be hard to parse the initial list into pieces that then translate into the WHERE condition.
Instead of using the INDSNAME parameter, you could parse the log notes, which include the datasets with zero observations. All those notes have the text 'observations read from the data set', which can be used as the parameter of the INPUT @'character string' technique:
data x_a ; x=1 ; a='abc' ; do i=1 to 4 ; output ; end ;
data x_b ; x=1 ; a='defg' ; b='xyzzy' ; do i=1 to 4 ; output ; end ;
data x_d ; y=1 ; b='xyz' ; a='pqrstuv' ; i=1; stop ;
data x_f ; x='aaa' ; f='pqr' ; do i=1 to 4 ; output ; end ;
data y1 y2 y3 y4 y9 y10 y11; qqq='rrr' ; do i=1 to 4 ; output ; end ;
data x_1 x_empty x_3;
x=1; output x_1;
x=3; output x_3; output x_3;
run;
options obs=1;
filename tmplog temp ;
proc printto log=tmplog new;
run;
data _null_;
set x_:(drop=_all_) y1-y4(drop=_all_) y9(drop=_all_) ;
run ;
proc printto; run;
options obs=max;
data datasets;
infile tmplog dlm=' .';
input @'observations read from the data set' dslib :$8. dsname :$32. ;
run;
I found mkeintz approach being the solution most tightly coupled to (i.e the actual) process done by SET or MERGE. Input @'string' is a nice touch.
Fleshed out to a robust macro I ended up with
%macro memnames_of(datasets=, out=_datasets, outsym_qlibnamelist=libnames) ;
%* Note: Does NOT handle data set specifiers that are direct filenames such as '<path>/xyzzy.sas7bdat' ;
%if not %symexist (&outsym_qlibnamelist) %then %do ;
%put ERROR: Symbol [&outsym_qlibnamelist] does not exist in the callers scope ;
%return ;
%end ;
%let datasets = %sysfunc(prxchange(s/(\S+)/$1(drop=_all_)/,-1,&datasets));
%* capture the expansion of &datasets ;
filename tmplog temp ;
proc printto log=tmplog ;
options obs=1 ;
data _null_ ;
set &datasets ;
run ;
options obs=max ;
proc printto log=log ;
%* parse the captured log to get a list of data sets ;
%let &outsym_qlibnamelist = '' ;
data &out (keep=dataset memnum) ;
infile tmplog end=end;
input ;
rx = prxparse ('m/^NOTE: There were \d+ observations read from the data set (.*)\./') ;
length memnum 8 dataset $41 libnames $32767 libname $8 ;
retain libnames ;
if prxmatch (rx, _infile_) then do ;
dataset = prxposn(rx, 1, _infile_) ;
memnum + 1 ;
output ;
libname = scan(dataset,1,'.') ;
if indexw(libnames, quote(trim(libname),"'")) = 0 then libnames = catx(' ', libnames, quote(trim(libname),"'")) ;
end ;
else if _infile_ =: 'ERROR:' then
put _infile_ ;
%* libnames when used in subsequent join will reduce dictionary search space;
if end then call symputx("&outsym_qlibnamelist", libnames) ;
run ;
filename tmplog ;
%mend memnames_of ;
and test code
%let workpath = %sysfunc(pathname(WORK)) ;
options dlcreatedir ;
libname one "&workpath/one" ;
libname two "&workpath/two" ;
options nodlcreatedir ;
data a b c d e ; set sashelp.class (obs=1); output a b d ; run ;
options user=one ;
data x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13; set sashelp.class (obs=1); run ;
options user=two ;
data y1 y2 y3 y4 y17 y18 y19 y20 ; set sashelp.class (obs=1); output y1 y2 y3 y4 ; run ;
options user=work ;
/* NOTE:
Numbered data set lists with reverse order numbering (such as y20-y17)
only works when endpoints have same number of digits and no leading zeroes
*/
%global libname_coverage ;
%memnames_of ( datasets=a b c d e one.x: two.y1-two.y4 two.y20-two.y17, outsym_qlibnamelist=libname_coverage )
title "Data set list expanded from" ;
title2 "[a b c d e one.x: two.y1-two.y4 two.y20-two.y17]" ;
title3 "libname coverage qlist: " "&libname_coverage" ;
proc print data=_datasets noobs;
run ;
producing output
Data set list expanded from
[a b c d e one.x: two.y1-two.y4 two.y20-two.y17]
libname coverage qlist: 'WORK' 'ONE' 'TWO'
memnum dataset
1 WORK.A
2 WORK.B
3 WORK.C
4 WORK.D
5 WORK.E
6 ONE.X1
7 ONE.X10
8 ONE.X11
9 ONE.X12
10 ONE.X13
11 ONE.X2
12 ONE.X3
13 ONE.X4
14 ONE.X5
15 ONE.X6
16 ONE.X7
17 ONE.X8
18 ONE.X9
19 TWO.Y1
20 TWO.Y2
21 TWO.Y3
22 TWO.Y4
23 TWO.Y20
24 TWO.Y19
25 TWO.Y18
26 TWO.Y17
I like scanning the log for errors/warning/notes, but in general scanning the log for data scares me a bit. So I'd probably opt for @yabwon 's approach. That said, thanks for posting your finished macro.
Just in case SAS ever fixes the grammatical errors in their note:
NOTE: There *were* 1 observation*s* read from the data set WORK.X_F.
You might want to update the regex slightly.
I did get burned by my log checker once when a note changed from "data set" to "dataset" (or the converse) during a version update. SAS is pretty fanatical about backwards compatibility (in a good way), but they are free to change the text in log messages.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.