Hello,
I have a list of 17,888 unique ids (IDVAR) from a very large dataset (HAVE1) that I would like to split into several macro variables. I'd like to query based on these unique ids in another very large dataset (HAVE2). I define "large dataset" as having over 200 millions rows. Creating a sub-query would take hours to run which is why I'm creating the macro. I've used the code below when I have fewer unique IDs, which runs beautifully. But now I run into the issue of exceed the maximum macro limit.
PROC SQL NOPRINT; SELECT DISTINCT QUOTE(TRIM(idvar)) INTO :LIST1 SEPARATED BY "," FROM have1; QUIT; %PUT LIST1= &LIST1; PROC SQL; CREATE TABLE want AS SELECT DISTINCT *, FROM have2 WHERE idvar IN (&LIST1); QUIT;
I've attempted to use the below code, but I run into issues where the unique IDs get cut-off and additional wording gets input to the list.
%let n_per_list=4000 ; data _null_; length idlist $32000; length macrolist $1000; retain macrolist; do i=1 to &n_per_list until (eof); set HAVE1 end=eof; idlist=catx(',',idlist,QUOTE(TRIM(IDVAR))); end; listno+1; call symputx(cats('paralist',listno),idlist); macrolist=catx(',',macrolist,cats('&','paralist',listno)); call symputx('paralist',macrolist); run; %put Paralist=%superq(ParaList); Paralist=¶list1,¶list2,¶list3,¶list4,¶list5 %put &=Paralist;
Any assistance that explains how I can properly split up a long list of IDs into several macro variables, will be much appreciated.
Just add a test of the length in the DO loop.
Here is example using names from SASHELP.CLASS and stopping with each individual list is more than 20 characters long.
data have1 ;
set sashelp.class ;
rename name=idvar;
run;
data _null_;
length idlist $32000;
length macrolist $32000;
retain macrolist;
do i=1 by 1 until (eof or length(idlist)>20);
set have1 end=eof;
idlist=catx(',',idlist,QUOTE(TRIM(IDVAR)));
end;
listno+1;
call symputx(cats('paralist',listno),idlist);
macrolist=catx(',',macrolist,cats('&','paralist',listno));
if eof then call symputx('paralist',macrolist);
run;
%put paralist = %superq(paralist);
%put &=paralist;
39 %put paralist = %superq(paralist); paralist = ¶list1,¶list2,¶list3,¶list4,¶list5,¶list6,¶list7 40 %put &=paralist; PARALIST="Alfred","Alice","Barbara","Carol","Henry","James","Jane","Janet","Jeffrey","John","Joyce","Judy", "Louise","Mary","Philip","Robert","Ronald","Thomas","William"
Hi @glcoolj12
A join would normally be much faster. You could try this instead:
proc sql;
create table want as
select have2.*
from have2 inner join have1
on have2.idvar = have1.idvar;
quit;
If it is still too slow, it might be a good idea to create an index on have2, but the outcome is difficult to predict.
In general, macro is orders of magnitude slower than data step, SQL, or RDBMS joins. Especially if you're writing huge amounts of data to the macro symbol table.
Since your data is already in a dataset or table, just use it from there.
Following on from @ErikLund_Jensen 's post, I ran the below code as a simple test to see which would run faster:
data have;
do x=1 to 1E7;
random=ranuni(0);
output;
end;
run;
data list;
do y=1 to 1E8 by 3;
random=ranuni(0);
output;
end;
run;
proc sort data=have;
by random;
run;
proc sort data=list;
by random;
run;
%bench(start)
proc sql _method;
create table want1 as
select x
from have
where x in (
select y from list
)
order by x;
quit;
%bench(end)
%bench(start)
proc sql _method;
create table want2 as
select x
from have
join list
on have.x=list.y
order by x;
quit;
%bench(end)
want1: Total time: 00 hours, 02 minutes, 08 seconds 127.50099992752
want2: Total time: 00 hours, 00 minutes, 41 seconds 41.3870000839233
SQL execution methods:
want1:
sqxcrta
sqxsort
sqxfil
sqxsrc( WORK.HAVE )
NOTE: SQL subquery execution methods chosen are:
sqxsubq
sqxsrc( WORK.LIST )
want2:
Because the data was so narrow and would fit in memory, SQL used a hash join:
NOTE: SQL execution methods chosen are:
sqxcrta
sqxsort
sqxjhsh
sqxsrc( WORK.LIST )
sqxsrc( WORK.HAVE )
Hope this helps!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.