Re: Macro length of the value of the macro variable exceeds maximum le...

glcoolj12 · Posted 02-27-2019 11:38 AM

Hello,

I have a list of 17,888 unique ids (IDVAR) from a very large dataset (HAVE1) that I would like to split into several macro variables. I'd like to query based on these unique ids in another very large dataset (HAVE2). I define "large dataset" as having over 200 millions rows. Creating a sub-query would take hours to run which is why I'm creating the macro. I've used the code below when I have fewer unique IDs, which runs beautifully. But now I run into the issue of exceed the maximum macro limit.

PROC SQL NOPRINT;
	SELECT DISTINCT QUOTE(TRIM(idvar))
		INTO :LIST1 SEPARATED BY ","
			FROM have1;
QUIT;

%PUT LIST1= &LIST1;


PROC SQL;
	CREATE TABLE want AS 
		SELECT DISTINCT *,
				FROM have2
					WHERE  idvar IN (&LIST1);
QUIT;

I've attempted to use the below code, but I run into issues where the unique IDs get cut-off and additional wording gets input to the list.

%let n_per_list=4000 ;

data _null_;
  length idlist $32000;
  length macrolist $1000;
  retain macrolist;
  do i=1 to &n_per_list until (eof);
    set HAVE1 end=eof;
    idlist=catx(',',idlist,QUOTE(TRIM(IDVAR)));
  end;
  listno+1;
  call symputx(cats('paralist',listno),idlist);
  macrolist=catx(',',macrolist,cats('&','paralist',listno));
  call symputx('paralist',macrolist);
run;

%put Paralist=%superq(ParaList);
Paralist=&paralist1,&paralist2,&paralist3,&paralist4,&paralist5
%put &=Paralist;

Any assistance that explains how I can properly split up a long list of IDs into several macro variables, will be much appreciated.

Kurt_Bremser · Posted 02-27-2019 11:53 AM

Data belongs in datasets, not in macro variables.
You can create a format for your ~20000 values, or read them into a hash object, if you want to avoid a simple sub-select.
Mind that there may be other reasons for the bad performance of your sub-select attempt.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tom · Posted 02-27-2019 11:54 AM

Just add a test of the length in the DO loop.

Here is example using names from SASHELP.CLASS and stopping with each individual list is more than 20 characters long.

data have1 ;
 set sashelp.class ;
 rename name=idvar;
run;
data _null_;
  length idlist $32000;
  length macrolist $32000;
  retain macrolist;
  do i=1 by 1 until (eof or length(idlist)>20);
    set have1 end=eof;
    idlist=catx(',',idlist,QUOTE(TRIM(IDVAR)));
  end;
  listno+1;
  call symputx(cats('paralist',listno),idlist);
  macrolist=catx(',',macrolist,cats('&','paralist',listno));
  if eof then call symputx('paralist',macrolist);
run;

%put paralist = %superq(paralist);
%put &=paralist;

39    %put paralist = %superq(paralist);
paralist = &paralist1,&paralist2,&paralist3,&paralist4,&paralist5,&paralist6,&paralist7
40    %put &=paralist;
PARALIST="Alfred","Alice","Barbara","Carol","Henry","James","Jane","Janet","Jeffrey","John","Joyce","Judy",
"Louise","Mary","Philip","Robert","Ronald","Thomas","William"

ErikLund_Jensen · Posted 02-27-2019 02:21 PM

Hi @glcoolj12

A join would normally be much faster. You could try this instead:

proc sql; 
	create table want as
		select have2.* 
		from have2 inner join have1
		on have2.idvar = have1.idvar;
quit;

If it is still too slow, it might be a good idea to create an index on have2, but the outcome is difficult to predict.

ScottBass · Posted 02-28-2019 01:17 AM

In general, macro is orders of magnitude slower than data step, SQL, or RDBMS joins. Especially if you're writing huge amounts of data to the macro symbol table.

Since your data is already in a dataset or table, just use it from there.

Following on from @ErikLund_Jensen 's post, I ran the below code as a simple test to see which would run faster:

data have;
   do x=1 to 1E7;
      random=ranuni(0);
      output;
   end;
run;

data list;
   do y=1 to 1E8 by 3;
      random=ranuni(0);
      output;
   end;
run;

proc sort data=have;
   by random;
run;
proc sort data=list;
   by random;
run;

%bench(start) 
proc sql _method;
   create table want1 as
   select x
   from   have
   where  x in (
      select y from list
   )
   order by x;
quit;
%bench(end)

%bench(start) 
proc sql _method;
   create table want2 as
   select x
   from   have
   join   list
   on     have.x=list.y
   order by x;
quit;
%bench(end)

want1: Total time: 00 hours, 02 minutes, 08 seconds 127.50099992752

want2: Total time: 00 hours, 00 minutes, 41 seconds 41.3870000839233

SQL execution methods:

want1:

      sqxcrta
          sqxsort
              sqxfil
                  sqxsrc( WORK.HAVE )

NOTE: SQL subquery execution methods chosen are:

          sqxsubq
              sqxsrc( WORK.LIST )

want2:

Because the data was so narrow and would fit in memory, SQL used a hash join:

NOTE: SQL execution methods chosen are:

      sqxcrta
          sqxsort
              sqxjhsh
                  sqxsrc( WORK.LIST )
                  sqxsrc( WORK.HAVE )

Hope this helps!

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

Macro length of the value of the macro variable exceeds maximum length

Re: Macro length of the value of the macro variable exceeds maximum length

Re: Macro length of the value of the macro variable exceeds maximum length

Re: Macro length of the value of the macro variable exceeds maximum length

Re: Macro length of the value of the macro variable exceeds maximum length

Macro length of the value of the macro variable exceeds maximum length

Re: Macro length of the value of the macro variable exceeds maximum length

Re: Macro length of the value of the macro variable exceeds maximum length

Re: Macro length of the value of the macro variable exceeds maximum length

Re: Macro length of the value of the macro variable exceeds maximum length

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!