Hello, I have a list of 17,888 unique ids (IDVAR) from a very large dataset (HAVE1) that I would like to split into several macro variables. I'd like to query based on these unique ids in another very large dataset (HAVE2). I define "large dataset" as having over 200 millions rows. Creating a sub-query would take hours to run which is why I'm creating the macro. I've used the code below when I have fewer unique IDs, which runs beautifully. But now I run into the issue of exceed the maximum macro limit. PROC SQL NOPRINT;
SELECT DISTINCT QUOTE(TRIM(idvar))
INTO :LIST1 SEPARATED BY ","
FROM have1;
QUIT;
%PUT LIST1= &LIST1;
PROC SQL;
CREATE TABLE want AS
SELECT DISTINCT *,
FROM have2
WHERE idvar IN (&LIST1);
QUIT; I've attempted to use the below code, but I run into issues where the unique IDs get cut-off and additional wording gets input to the list. %let n_per_list=4000 ;
data _null_;
length idlist $32000;
length macrolist $1000;
retain macrolist;
do i=1 to &n_per_list until (eof);
set HAVE1 end=eof;
idlist=catx(',',idlist,QUOTE(TRIM(IDVAR)));
end;
listno+1;
call symputx(cats('paralist',listno),idlist);
macrolist=catx(',',macrolist,cats('&','paralist',listno));
call symputx('paralist',macrolist);
run;
%put Paralist=%superq(ParaList);
Paralist=¶list1,¶list2,¶list3,¶list4,¶list5
%put &=Paralist;
Any assistance that explains how I can properly split up a long list of IDs into several macro variables, will be much appreciated.
... View more