Hi All,
I have many datasets with similar suffix such as:
dataset_01
dataset_02
dataset_03
dataset_03_v2
dataset_04
dataset_05_v2
I'm trying to stack all of them together excluding the ones with _v2, which cannot be achieved by just using
data want;
set dataset_: ;
run;
Is there an efficient way to exclude by conditions?
Thank you!
Hi @lydiawawa My recommendation would be is to read dictionary tables and filter into a macro variable-
data
dataset_01
dataset_02
dataset_03
dataset_03_v2
dataset_04
dataset_05_v2;
x=1;
y=2;
run;
proc sql;
select memname into :dsn separated by ' '
from dictionary.tables
where libname='WORK' and upcase(memname) like ('DATASET_%')
AND countw(memname,'_')=2;
quit;
data want;
set &dsn;
run;
One caveat though is there is a limitation of 64K bytes that a macro variable can hold as length. In that case switching to CALL EXECUTE would be idea or there are other boring ways to generate SAS statements and CALL using %INCLUDE
If you want to use lists you could use an explicit list like
set dataset_01 - dataset_05;
BUT you can't have gaps in the number sequence. If you do have gaps you code around with
set dataset_01 - dataset_04 dataset_06-dataset_12;
or similar.
Another approach might be to use the simple list and drop the sets whose names don't match the pattern
possibly based on the length of names or other pattern. The INDSNAME set option creates an automatic variable with the library.dataset name the current observation is from:
set dataset_: indsname=source;
if length(scan(source,2,'.')) ne 10;
This could be pretty time intensive if lots of records are involved.
Hi @lydiawawa My recommendation would be is to read dictionary tables and filter into a macro variable-
data
dataset_01
dataset_02
dataset_03
dataset_03_v2
dataset_04
dataset_05_v2;
x=1;
y=2;
run;
proc sql;
select memname into :dsn separated by ' '
from dictionary.tables
where libname='WORK' and upcase(memname) like ('DATASET_%')
AND countw(memname,'_')=2;
quit;
data want;
set &dsn;
run;
One caveat though is there is a limitation of 64K bytes that a macro variable can hold as length. In that case switching to CALL EXECUTE would be idea or there are other boring ways to generate SAS statements and CALL using %INCLUDE
@lydiawawa wrote:
the tables are not in work library, how do I set the datasets in dsn with a library name?
Put your library name in the place of "WORK". If in the dictionary table use uppercase spelling as that is how the libname is stored.
@lydiawawa I see your point. You need to concatenate libname and memname like-
select catx('.',libname,memname) into :dsn separated by ' '
data
dataset_01
dataset_02
dataset_03
dataset_03_v2
dataset_04
dataset_05_v2;
x=1;
y=2;
run;
proc sql;
select catx('.',libname,memname) into :dsn separated by ' '
from dictionary.tables
where libname='WORK' and upcase(memname) like ('DATASET_%')
AND countw(memname,'_')=2;
quit;
data want;
set &dsn;
run;
Hi @lydiawawa Please point to the name of the library where your datasets reside instead of WORK. Mine was just a mere example of logic/approach
You could use the indsname= option on the SET statement:
data want;
set dataset_: indsname=dsn;
if scan(upcase(dsn),3,'_')='V2' then delete;
run;
But that requires doing the if test on each and every incoming observation.
It might be better to customize the dataset list prior to the data step using proc sql dictionary.tables:
proc sql noprint;
select distinct memname into :dsnlist separated by ' '
from dictionary.tables where upcase(memname) like "DATASET_%"
and scan(upcase(memname),3,'_') NE 'V2';
quit;
data want;
set &dsnlist;
run;
This option doesn't have to do a record-by-record check on the incoming dataset name.
A bit off-topic but it is good practice to name your datasets based on the contents and not just generic names like dataset. It will likely simplify dataset management as well.
Create the data set name list as follows
proc sql noprint;
select catx('.', libname, memname) into :dsnlist separated by ' '
from dictionary.tables
where
libname = "your-libname-in-UPPERCASE" and
upcase(memname) like "DATASET_%" and
scan(upcase(memname),3,'_') NE 'V2'
;
quit;
And use the list as follows
SET &dsnlist;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.