Thank you for the reply.
I am using SAS 9.1 so this is why the transpose portion of the code was not working.
First of all, thank you very much for this code. Two questions :
table | Name | Missing | Not_Applicable | Not_Missing | Total |
TABLE1_ | Name | 1 | 1 | 2 | 4 |
TABLE1_ | Age | 0 | 0 | 4 | 4 |
TABLE2_ | Color | 1 | 1 | 2 | 4 |
TABLE2_ | Height | 1 | 0 | 3 | 4 |
Thank you for your help and time.
hi ... not sure why you get that message (Ksharp's code is pretty neat and normally very reliable)
here's an alternative that I think is a bit faster and generic enough that all you have to do is list the data sets with a '/' separator ...
data table1;
input name $ age @@ ;
cards;
John 45 N/A 30 . 15 Carl 25
;
data table2;
input color $ height;
cards;
Blue 110 N/A 120 . 100 Red .
;
proc format;
value nn low-high='1' other='3';
value $c 'N/A'='2' ' '='3' other='1';
run;
%macro tables(stuff);
proc datasets lib=work nolist;
delete tables;
quit;
%let j=1;
%do %while(%scan(&stuff,&j,/) ne);
%let dset=%scan(&stuff,&j,/);
ods output onewayfreqs=temp (keep=table f_: freq:);
proc freq data=&dset;
tables _all_ / missing;
format _numeric_ nn. _character_ $c.;
run;
ods output close;
data temp (keep=table name not_: missing total);
length table name $32;
do until(last.table);
set temp;
by table notsorted;
array t(*) f_:;
select(t(_n_));
when('1') not_missing = frequency;
when('2') not_applicable = frequency;
when('3') missing = frequency;
end;
end;
total = sum(of not_:, missing);
name = scan(table,2);
table = "&dset";
run;
proc append base=tables data=temp;
run;
%let j=%eval(&j+1);
%end;
proc datasets lib=work nolist;
delete temp;
quit;
%mend;
ods listing close;
%tables(table1/table2/sashelp.class/sashelp.heart);
ods listing;
partial output of data set TABLES ...
not_
table name missing not_applicable missing total
table1 name 2 1 1 4
table1 age 4 . . 4
z.table2 color 1 . . 1
z.table2 height 1 . . 1
sashelp.class Name 19 . . 19
sashelp.class Sex 19 . . 19
sashelp.class Age 19 . . 19
sashelp.class Height 19 . . 19
sashelp.class Weight 19 . . 19
sashelp.heart Status 5209 . . 5209
sashelp.heart DeathCause 1991 . 3218 5209
sashelp.heart AgeCHDdiag 1449 . 3760 5209
Thank you both for your reply. Really appreciated.
@ Ksharp : Your code works great. Thank you for your help and time.
@ MikeZdeb : I don't get the same output as you. I would really like to get the same output as the one you provided in your post.
If we take my two example tables, here is the result I get :
Bitmap | ||||
The FREQ Procedure | ||||
name | Frequency | Percent | Cumulative | Cumulative |
Frequency | Percent | |||
3 | 1 | 25.00 | 1 | 25.00 |
1 | 2 | 50.00 | 3 | 75.00 |
2 | 1 | 25.00 | 4 | 100.00 |
age | Frequency | Percent | Cumulative | Cumulative |
Frequency | Percent | |||
1 | 4 | 100.00 | 4 | 100.00 |
The FREQ Procedure | ||||
color | Frequency | Percent | Cumulative | Cumulative |
Frequency | Percent | |||
3 | 1 | 25.00 | 1 | 25.00 |
1 | 2 | 50.00 | 3 | 75.00 |
2 | 1 | 25.00 | 4 | 100.00 |
height | Frequency | Percent | Cumulative | Cumulative |
Frequency | Percent | |||
3 | 1 | 25.00 | 1 | 25.00 |
1 | 3 | 75.00 | 4 | 100.00 |
I am running SAS Enterprise Guide 4.1 with SAS 9.1
Can you please help me get the same output as you?
Thank you for your help
Hi ... did you check to see if there was also a data set named TABLES ...
proc print data=tables;
run;
when the macro finished?
My bad sorry.
I see the table now.
One last thing I would like to have with your code.
Would you be able to a feature to your code where I would be able to to say create the summary table named "tables" with (work.table1/work.table2) and create another summary table named "summary_table" with (work.table3/work.table4)?
Basicly when writing the macro code colde I have the choice to :
1. Enter a specific output table name
2. Create multiple output tables with specific table name
I would really like to be able to write this myself, but I am a real beginner. I am learning a lot using this forum and I hope I will be able to at least modify existing sas code soon.
Thank you for your help and time!
Oh. I understand that ERROR message.
SQL's UNION clause only can support 250 select clause , so maybe you have too many tables and too many variables needed to process.
data table1; input Name $ Age ; cards; John 45 N/A 30 . 15 Carl 25 ; run; data table2; input Color $ Height ; cards; Blue 110 N/A 120 . 100 Red . ; run; options missing=' '; data _null_; set sashelp.vcolumn(keep=name memname libname where=(libname='WORK')) end=last; if _n_ eq 1 then call execute("proc sql;"); call execute("create table temp"||strip(_n_)||" as select '"||strip(memname)||"' length=40 as table,'"||strip(name)||"' length=40 as Name,nmiss("||strip(name)||") as missing,sum(case when cats("||strip(name)||")='N/A' then 1 else 0 end) as Not_Applicable,sum(case when cat("||strip(name)||") not in ('N/A' ' ') then 1 else 0 end) as Not_Missing,count(*) as Total from "||strip(memname)||";" ); if last then call execute(";quit;"); run; data want; set temp:; run;
Ksharp
Hi All,
I thought this was an interesting problem, and a chance for me to "play" with some techniques in SAS I don't normally use. Apologies in advance for the length of this post.
General Concepts:
Use of sashelp.vcolumns:
In my environment, we pre-allocate about 30 SAS libraries with hundreds of tables and tens of thousands of variables. Some of those libraries are to Oracle databases. After the incredibly painful process (~ 10 minutes) of retrieving 4 variables, the code generation finally began! However, even when I cleared all libnames (libname _all_ clear), sashelp.vcolumn was still really slow.
I recommend using proc contents instead of sashelp.vcolumns. Compare the three approaches below. I also switched to other datasets (sashelp), because we also want to compare performance across different approaches, and the example data is just too small to evaluate performance.
* using dictionary.columns ;
proc sql;
create table vars as
select libname, memname, name
from dictionary.columns
where catx(".",libname,memname) in ("SASHELP.CLASS","SASHELP.CARS","WORK.ZIPCODE")
order by libname, memname, varnum
;
quit;
* using sashelp.vcolumn ;
data vars;
set sashelp.vcolumn (keep=libname memname name);
where catx(".",libname,memname) in ("SASHELP.CLASS","SASHELP.CARS","WORK.ZIPCODE");
run;
* using proc contents ;
%macro get_variables(data);
proc contents data=&data out=temp (keep=libname memname name) noprint;
run;
proc append base=vars data=temp;
run;
%mend;
proc datasets lib=work nowarn nolist;
delete vars;
quit;
%get_variables(sashelp.class);
%get_variables(sashelp.cars);
%get_variables(work.zipcode);
The actual performance will vary based on your environment, but in general I find the best performance is 1) proc contents, 2) dictionary.columns, and 3) sashelp.vcolumn, in that order, especially if you just need the columns from a single dataset. Try each approach in your environment and see which works best for you.
call execute vs. dynamic code generation:
There is nothing intrinsically wrong with call execute. However, I generally prefer dynamic code generation to a temporary file, then %including that temporary file. This way, I can easily debug my generated code using "fslist". This assumes code generation via DMS, otherwise use a data _null_ step to echo the code to the log. I also get more control over the code formatting. This has nothing to do with code execution, but can be helpful with debugging if your generated code block is large.
Here is an example:
* dynamically create code ;
filename code temp;
data _null_;
set vars end=eof;
file code;
if _n_ eq 1 then do;
call symputx("firstvar",name,"G");
put @1 "proc sql;";
put @4 "create table temp as";
put @7 "select";
end;
put @10 memname $quote. @80 "as Table,";
put @10 name $quote. @80 "as Name,";
put @10 "nmiss(" name +(-1) ")" @80 "as Missing,";
put @10 "sum(case when cats(" name +(-1) ")='N/A' then 1 else 0 end)" @80 "as Not_Applicable,";
put @10 "sum(case when cat(" name +(-1) ") not in ('N/A', ' ') then 1 else 0 end)" @80 "as Not_Missing,";
if eof then do;
put @10 "0" @80 "as dummy";
put @7 "from";
put @10 libname +(-1) "." memname;
put @4 ";";
put @1 "quit;";
end;
run;
* check out the generated code ;
dm "fslist code";
* execute the code ;
%include code;
Note: You don’t need the “firstname” macro variable for the proc transpose (see later code below), but the above would be a way to set it to the first variable in the source data.
Missing option:
The character representation of a missing numeric value is controlled by the missing option, which by default is ".". If you want to make absolutely sure that your code works correctly when checking for missing numeric values, explicitly set the missing option. You can also simplify the code slightly if you explicitly set it to blank:
* reset character used for missing numeric data ;
%let missing=%sysfunc(getoption(missing));
options missing=" ";
/* Your code. Both character and numeric missing are now both “ “ */
* restore character used for missing numeric data ;
options missing="&missing";
Increasing disk I/O performance:
This is more of an aside. I did test this approach, but it did not perform better than using views.
When I need high performance disk I/O, I’ve been using this approach lately:
libname spdework spde "%sysfunc(pathname(work))" temp=yes;
Then, use a two-level name of spdework.<your work dataset> instead of <work>. This can yield really good performance, especially for a large work dataset that is used repeatedly in downstream code.
You can also set the user= option to use a single level name writing to spdework:
options user=spdework;
data foo;x=1;run;
Use views when appropriate:
Use a data step or SQL view when you can to reduce disk I/O when appropriate. Determining “when appropriate” is beyond the scope of this post, but see code using a data step view below.
Problem Analysis:
If we review the problem, what we want to do is group the data into "buckets" (missing, not applicable, not missing), for each variable, then get frequency counts for those "buckets". Further analysis of the previous answers shows that we're generating our results one column at a time, over the entire dataset – each proc sql/select/union all code block is processing the entire dataset, one variable at a time. The original poster indicated that his/her "real" data is 2M + records. It may also have many more variables than the 4 in the sample datasets. So, the total data processed is # of records * # of variables. So, for example, 2M records * 10 variables = processing 20M records. In the "real" problem, performance does matter.
I also considered "Is there a way to do this without pre-processing the data (sashelp.vcolumns or proc contents) and dynamic code generation"? Also “Is there a way to process the data in one pass”?
I created a number of code versions - I’ll only include a few of the approaches (I won’t post the disasters lol). Some comments:
Here are my code examples. Hopefully they are somewhat self explanatory, otherwise post a follow up question.
Code Prologue (create example data used for all programs):
options mprint nomlogic;
data table1;
input Name $ Age;
cards;
John 45
N/A 30
. 15
Carl 25
;
run;
data table2;
input Color $ Height;
cards;
Blue 110
N/A 120
. 100
Red .
;
run;
* create a large dataset to test performance ;
data work.zipcode;
set
sashelp.zipcode
sashelp.zipcode
sashelp.zipcode
sashelp.zipcode
sashelp.zipcode
;
run;
%bench macro (macro I use when I’m benchmarking performance):
Note: comment out call to %parmv
/*=====================================================================
Program Name : bench.sas
Purpose : Measures elapsed time between successive
invocations.
SAS Version : SAS 8.2
Input Data : N/A
Output Data : N/A
Macros Called : parmv
Originally Written by : Scott Bass
Date : 24APR2006
Program Version # : 1.0
=======================================================================
Modification History : Original version
=====================================================================*/
/*---------------------------------------------------------------------
Usage:
* Start benchmarking.
* Both invocations are identical as long as start ;
* has not been previously invoked ;
%bench;
%bench(start);
data _null_;
rc=sleep(3);
run;
* Get elapsed time, should be approx. 3 seconds elapsed, 3 seconds total ;
%bench(elapsed);
data _null_;
rc=sleep(7);
run;
* Get another elapsed time, should be approx. 7 seconds elapsed, 10 seconds total ;
%bench; * elapsed parm not required since start was already called ;
data _null_;
rc=sleep(2);
run;
* End benchmarking, should be approx. 2 seconds elapsed, 12 seconds total ;
* Must be called after start. Resets benchmarking. ;
%bench(end);
-----------------------------------------------------------------------
Notes:
If %bench has never been invoked, calling %bench without parameters
starts benchmarking. You may also explicity specify the start
benchmarking, although normally the end parameter would be used.
If %bench has been previously invoked with the start parameter, calling
%bench without parameters prints the elapsed time. You may also
explicity specify the elapsed parameter.
To end benchmarking and reset the start time, specify the end
Only the elapsed or end parameters (or equivalent processing) print
time measurements to the log. The start parameter does not print
anything to the log.
The only parameter that needs to be explicitly specified is end.
Otherwise the macro should do the right thing, either starting
benchmarking or printing elapsed times.
Benchmarking a time period greater than 24 hours is "unpredictable".
---------------------------------------------------------------------*/
%macro bench
/*---------------------------------------------------------------------
Measures elapsed time between successive invocations.
---------------------------------------------------------------------*/
(PARM /* Benchmarking parameter (Opt). */
/* If not specified: */
/* If first invocation, start benchmarking. */
/* If subsequent invocation, print elapsed time. */
/* Valid values are START ELAPSED END. */
);
%local macro parmerr time_elapsed time_total time_elapsed_str time_total_str h m s;
%let macro = &sysmacroname;
%* check input parameters ;
%parmv(PARM, _req=0,_words=0,_case=U,_val=START ELAPSED END)
%if (&parmerr) %then %goto quit;
%* nested macro for printing ;
%macro print(_parm);
%let time_elapsed = %sysevalf(%sysfunc(datetime()) - &_elapsed);
%let time_total = %sysevalf(%sysfunc(datetime()) - &_start);
%let h = %sysfunc(hour(&time_elapsed),z2.);
%let m = %sysfunc(minute(&time_elapsed),z2.);
%let s = %sysfunc(second(&time_elapsed),z2.);
%let time_elapsed_str = &h hours, &m minutes, &s seconds;
%let h = %sysfunc(hour(&time_total),z2.);
%let m = %sysfunc(minute(&time_total),z2.);
%let s = %sysfunc(second(&time_total),z2.);
%let time_total_str = &h hours, &m minutes, &s seconds;
%put;
%put Benchmark &_parm:;
%put;
%put Elapsed seconds = &time_elapsed_str &time_elapsed;
%put Total seconds = &time_total_str &time_total;
%put;
%mend;
%* declare global variables ;
%global _start _elapsed;
%if (&parm eq START) %then %do;
%let _start = %sysfunc(datetime());
%let _elapsed = &_start;
%end;
%else
%if (&parm eq ELAPSED) %then %do;
%if (&_start eq ) %then %do;
%put ERROR: Benchmarking must be started before elapsed time can be printed.;
%goto quit;
%end;
%else %do;
%print(ELAPSED)
%let _elapsed = %sysfunc(datetime());
%end;
%end;
%else
%if (&parm eq END) %then %do;
%if (&_start eq ) %then %do;
%put ERROR: Benchmarking must be started before elapsed time can be printed.;
%goto quit;
%end;
%else %do;
%print(END)
%* reset benchmarking ;
%symdel _start _elapsed / nowarn;
%end;
%end;
%else
%if (&parm eq ) %then %do;
%* derive proper parm then recursively call this macro ;
%if (&_start eq ) %then %do;
%bench(start)
%end;
%else %do;
%bench(elapsed)
%end;
%end;
%quit:
%* if (&parmerr) %then %abort;
%mend;
/******* END OF FILE *******/
Data Step View and PROC SUMMARY approach:
%macro get_counts(data);
* reset character used for missing numeric data ;
%let missing=%sysfunc(getoption(missing));
options missing=" ";
* create additional grouping variables ;
data vgrouped / view=vgrouped;
set &data indsname=dsn;
* set a dummy variable as an end of variable list marker ;
retain dummy "";
drop dummy;
* define additional variables ;
* varname must be long enough to contain memname_varname ;
length libname $8 memname $32 varname $65 cvalue $200 measure $15;
* we only need to get the libname and memname once ;
if (_n_=1) then do;
libname=scan(dsn,1,".");
memname=scan(dsn,2,".");
retain libname memname;
end;
* spin through all the variables in the dataset, building grouping variables ;
* since our checks are mutually exclusive, this will not cause bloating of the dataset ;
do while (1);
call vnext(varname);
if (varname in ("dsn","eof")) then continue;
if (varname="dummy") then leave;
* get the variable value (character, formatted value) ;
cvalue=vvaluex(varname);
* build the grouping variable ;
select;
when (missing(cvalue))
measure="Missing";
when (strip(cvalue)="N/A")
measure="Not_Applicable";
when (not missing(cvalue))
measure="Not_Missing";
* this covers all possibilities, so I purposely left out an otherwise statement ;
end;
* build the new variable name ;
varname=catx("_",memname,varname);
* output the observation ;
output;
end;
run;
* now summarize over each measure to get the frequency counts ;
proc summary data=vgrouped nway;
class libname memname varname measure;
output out=summary (drop=_type_);
run;
* transpose data ;
proc transpose data=summary out=transposed (drop=_name_);
by libname memname varname;
id measure;
var _freq_;
run;
* set desired PDV order, ensure all variables are present, and replace missing values with zero ;
data missing2zero;
format libname memname varname;
length Missing Not_Applicable Not_Missing 8;
set transposed;
array miss{*} Missing -- Not_Missing;
do i=1 to dim(miss);
if miss{i}=. then miss{i}=0;
end;
drop i;
run;
* append data ;
proc append base=final1 data=missing2zero;
run;
* restore character used for missing numeric data ;
options missing="&missing";
%mend;
proc datasets lib=work nolist nowarn;
delete final1:;
quit;
%bench(start)
%get_counts(work.table1)
%get_counts(work.table2)
%get_counts(sashelp.class)
%get_counts(sashelp.cars)
%get_counts(work.zipcode)
%bench(elapsed)
* this sort isn't required, but makes it easier to compare the proc print outputs ;
proc sort data=final1;
by libname memname varname;
run;
* one last transpose. use whichever dataset you prefer, final or final2 ;
proc transpose data=final1 out=final1a (rename=(_name_=Num_Obs));
id varname;
run;
title;
proc print data=final1;
run;
proc print data=final1a;
run;
%bench(end)
Hash Object, setting and retrieving frequency counts:
This approach only processes the data once, but has the overhead of finding and setting the frequency counts for every data value in the source data.
The hash object will never get that big: up to three rows (Missing, Not Applicable, Not Missing) per variable.
%macro get_counts(data);
* reset character used for missing numeric data ;
%let missing=%sysfunc(getoption(missing));
options missing=" ";
data _null_;
set &data indsname=dsn end=eof;
* set a dummy variable as an end of variable list marker ;
retain dummy "";
* define additional variables ;
* varname must be long enough to contain memname_varname ;
length libname $8 memname $32 varname $65 cvalue $200 measure $15 count 8;
* use a hash object to store summary data ;
if (_n_=1) then do;
dcl hash sums(hashexp: 16);
sums.defineKey( "libname","memname","varname","measure");
sums.defineData("libname","memname","varname","measure","count");
sums.defineDone();
* get libname and memname ;
libname=scan(dsn,1,".");
memname=scan(dsn,2,".");
retain libname memname;
end;
* spin through all the variables in the dataset, building summarization variables ;
do while (1);
call vnext(varname);
if (varname in ("dsn","eof")) then continue;
if (varname in ("dummy")) then leave;
* get the variable value (character, formatted value) ;
cvalue=vvaluex(varname);
* set the hash object keys (libname and memname are already set) ;
* derived varname ;
varname=catx("_",memname,varname);
* measure ;
select;
when(missing(cvalue))
measure="Missing";
when(strip(cvalue)="N/A")
measure="Not_Applicable";
when(not missing(cvalue))
measure="Not_Missing";
end;
* initialize counter back to 0 ;
count=0;
* retrieve the current key and increment it ;
* if find fails (first time through) count is still 0;
rc=sums.find();
* increment the count ;
count=count+1;
* save the incremented count ;
sums.replace();
end;
* now output the hash object as a dataset ;
if eof then sums.output(dataset: "counts");
run;
* append data ;
proc append base=final2 data=counts;
run;
* restore character used for missing numeric data ;
options missing="&missing";
%mend;
proc datasets lib=work nolist nowarn;
delete final2:;
quit;
%bench(start)
%get_counts(work.table1)
%get_counts(work.table2)
%get_counts(sashelp.class)
%get_counts(sashelp.cars)
%get_counts(work.zipcode)
%bench(elapsed)
* this sort isn't required, but makes it easier to compare the proc print outputs ;
proc sort data=final2;
by libname memname varname;
run;
* one last transpose. use whichever dataset you prefer, final or final2 ;
proc transpose data=final2 out=temp (drop=_name_);
by libname memname varname notsorted;
id measure;
run;
* set PDV order, which sets final observation order in final2 ;
* could use a view here but it is such a tiny dataset ;
data temp;
format libname memname varname Missing Not_Applicable Not_Missing;
set temp;
run;
proc transpose data=temp out=temp2 (rename=(_name_=Num_Obs));
id varname;
run;
* replace missing values with zero ;
data final2a;
set temp2;
array miss{*} _numeric_;
do i=1 to dim(miss);
if miss{i}=. then miss{i}=0;
end;
drop i;
run;
title;
proc print data=final2;
run;
proc print data=final2a;
run;
%bench(end)
Hash Object, using suminc incrementation variable:
This approach also only processes the data once.
%macro get_counts(data);
data _null_;
set &data indsname=dsn end=eof;
retain dummy "";
length libname $8 memname vname $32 varname $65 measure $20 cvalue $200;
if (_n_=1) then do;
dcl hash sums(suminc:"count");
sums.defineKey("libname","memname","varname","measure");
sums.defineData("libname","memname","varname","measure","count");
sums.defineDone();
dcl hiter iter("sums");
libname=scan(dsn,1,".");
memname=scan(dsn,2,".");
retain libname memname;
end;
do while (1);
call vnext(vname);
if (vname in ("dsn","eof")) then continue;
if (vname in ("dummy")) then leave;
cvalue=vvaluex(vname);
select;
when(missing(cvalue))
measure="Missing";
when(strip(cvalue)="N/A")
measure="Not_Applicable";
when(not missing(cvalue))
measure="Not_Missing";
end;
varname=catx("_",memname,vname);
count=1;
if (sums.find() ne 0) then sums.add();
end;
if eof then do;
rc=iter.first();
do while (rc=0);
sums.sum(sum: count);
sums.replace();
rc=iter.next();
end;
sums.output(dataset: "counts");
end;
run;
proc append base=final3 data=counts;
run;
%mend;
proc datasets lib=work nolist nowarn;
delete final3:;
quit;
%bench(start)
%get_counts(work.table1)
%get_counts(work.table2)
%get_counts(sashelp.class)
%get_counts(sashelp.cars)
%get_counts(work.zipcode)
%bench(elapsed)
* this sort IS required, to group the varnames in the next transpose ;
proc sort data=final3;
by libname memname varname measure;
run;
* one last transpose. use whichever dataset you prefer, final or final2 ;
proc transpose data=final3 out=temp (drop=_name_);
by libname memname varname;
id measure;
run;
* set PDV order, which sets final observation order in final2 ;
* could use a view here but it is such a tiny dataset ;
data temp;
format libname memname varname Missing Not_Applicable Not_Missing;
set temp;
run;
proc transpose data=temp out=temp2 (rename=(_name_=Num_Obs));
id varname;
run;
* replace missing values with zero ;
data final3a;
set temp2;
array miss{*} _numeric_;
do i=1 to dim(miss);
if miss{i}=. then miss{i}=0;
end;
drop i;
run;
title;
proc print data=final3;
run;
proc print data=final3a;
run;
%bench(end)
After running all three approaches, you can compare the “A” datasets. The “not A” datasets are not normalized the same way, so they don’t compare.
proc compare base=final1a compare=final2a;
run;
proc compare base=final1a compare=final3a;
run;
To print totals, there are a number of ways to do that. I’ll use PROC REPORT:
options nocenter;
proc report data=final1 nowd;
columns libname memname varname missing not_applicable not_missing total;
define libname / order;
define memname / order width=12;
define varname / order width=25;
compute total;
total=sum(missing.sum,not_applicable.sum,not_missing.sum);
endcomp;
break after memname / ol summarize skip;
quit;
proc report data=final1a nowd;
rbreak after / ol summarize skip;
label num_obs=" ";
quit;
Finally, after running all three approaches, there wasn’t a big difference in performance between any of the approaches. Based on this, I’d recommend the proc summary approach, since it’s the simplest (of the three approaches I listed). Also, I did not compare the performance with the original sql/union all approach, since 1) the previous posted code wasn’t generic enough to support datasets in multiple libraries and I didn’t feel like fiddling with the code, 2) with enough datasets and variables you’d likely run into the limits of sql, and 3) the original poster said he’d like to split the output into multiple output datasets, which would be easier with the macro/proc append approach.
If someone wants to compare the performance with the original sql approach, please post the results.
Hope this helps and sorry again for the length...
Scott
Scott - If you use the CATX function in your where clause then SAS cannot use the indexes (or whatever the equivalent is for dictionary tables) in the metadata table dictionary.columns. That is why it is hitting your Oracle tables and slowing down the process.
103 create table vars as
104 select libname, memname, name
105 from dictionary.columns
106 where libname = 'SASHELP' and memname in ("CLASS","CARS","ZIPCODE")
107 order by libname, memname, varnum
108 ;
NOTE: The query as specified involves ordering by an item that doesn't appear in its SELECT clause.
NOTE: Table WORK.VARS created, with 38 rows and 3 columns.
NOTE: SQL Statement used (Total process time):
real time 0.09 seconds
cpu time 0.03 seconds
109
110
111 create table vars as
112 select libname, memname, name
113 from dictionary.columns
114 where catx(".",libname,memname) in ("SASHELP.CLASS","SASHELP.CARS","WORK.ZIPCODE")
115 order by libname, memname, varnum
116 ;
NOTE: The query as specified involves ordering by an item that doesn't appear in its SELECT clause.
NOTE: Table WORK.VARS created, with 20 rows and 3 columns.
NOTE: SQL Statement used (Total process time):
real time 0.46 seconds
cpu time 0.32 seconds
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.