BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SASAlex101
Quartz | Level 8

I was experimenting with some techniques on how to lookup values in a data step without having to use costly SQL joins or memory limited Hash tables. I am attempting to use call execute to assign a macro variable with each iteration and then use that value in the next lines. But I'm not able to get 100% there. 


data table1;
length ID1 $1. ;
input ID1 $;
datalines;
A
B
C
D
E
F
G
H
I
;
run;
data table2;
length ID2 $1. LookupValue1 $10.;
infile datalines delimiter='#';
input ID2 $ LookupValue1 $ LookupValue2;
datalines;
A#Comm#10
B#Delt#20
C#Zebr#40
D#Rays#100
G#Dongle#500
Z#Ebra#90
E#Quint#88
;
run;

%GLOBAL sqlvar;

Data CallExecuteLookup;
SET WORK.Table1;
call symputx('rval',ID1,'G');
call execute('proc SQL; Select MAX(LookupValue2) into: sqlvar FROM WORK.table2 where ID2 = "&rval."; quit;');
newvar = &sqlvar.;
run;

From what I can tell, the symbolic put and the execute statements are working as per the log... but How do I use the value generated by the Proc SQL statement? 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

@SASAlex101 wrote:
can you demo how you can use DOSUBL ?

I think you could do it like:

data CallExecuteLookup;
  SET WORK.Table1;
  rc=dosubl('proc SQL noprint; Select MAX(LookupValue2) into: sqlvar FROM WORK.table2 where ID2 = "'||ID1||'"; quit;');
  newvar = resolve('&sqlvar');
  put id1= newvar= ;
run;

%symdel sqlvar ;

The log is:

32   data CallExecuteLookup;
33     SET WORK.Table1;
34     rc=dosubl('proc SQL noprint; Select MAX(LookupValue2) into: sqlvar FROM WORK.table2 where ID2 =
34 !  "'||ID1||'"; quit;');
35     newvar = resolve('&sqlvar');
36     put id1= newvar= ;
37   run;


ID1=A newvar=10

ID1=B newvar=20

ID1=C newvar=40

ID1=D newvar=100

ID1=E newvar=88

ID1=F newvar=.

ID1=G newvar=500

ID1=H newvar=.

ID1=I newvar=.
NOTE: There were 9 observations read from the data set WORK.TABLE1.
NOTE: The data set WORK.CALLEXECUTELOOKUP has 9 observations and 3 variables.

Again, I wouldn't use DOSUBL.  I would much prefer using a format, join, or hash table.  But DOSUBL is interesting in what it can do.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

11 REPLIES 11
Tom
Super User Tom
Super User

You have a very serious TIMING issue with that code.

Just look at the SAS log to see why. Let's first assign SQLVAR something that will allow the data step to compile. Also let's limit it to just 3 observations to make the log shorter.

3316  %let sqlvar=.;
3317  Data CallExecuteLookup;
3318  SET WORK.Table1 (obs=3);
3319  call symputx('rval',ID1,'G');
3320  call execute('proc SQL noprint; Select MAX(LookupValue2) into: sqlvar FROM WORK.table2 where ID2 = "&rval."; quit;');
3321  newvar = &sqlvar.;
3322  run;

NOTE: The data set WORK.CALLEXECUTELOOKUP has 3 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


NOTE: CALL EXECUTE generated line.
1    + proc SQL noprint;
1    +                   Select MAX(LookupValue2) into: sqlvar FROM WORK.table2 where ID2 = "A";
1    +                                                                                           quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


2    + proc SQL noprint;
2    +                   Select MAX(LookupValue2) into: sqlvar FROM WORK.table2 where ID2 = "B";
2    +                                                                                           quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


3    + proc SQL noprint;
3    +                   Select MAX(LookupValue2) into: sqlvar FROM WORK.table2 where ID2 = "C";
3    +                                                                                           quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

As you can clearly see you cannot reference the value of SQLVAR in the data step since the PROC SQL code that generates it has not run until after the data step has completed.

SASAlex101
Quartz | Level 8

Thanks for the insight. I guess from what your saying that a loop may be favorable? This seems to work.. but is this faster than an SQL join if the tables are large? 


data table1;
length ID1 $1. ;
input ID1 $ lookupval;
datalines;
A
B
C
D
E
F
G
H
I
;
run;
data table2;
length ID2 $1. LookupValue1 $10.;
infile datalines delimiter='#';
input ID2 $ LookupValue1 $ LookupValue2;
datalines;
A#Comm#10
B#Delt#20
C#Zebr#40
D#Rays#100
G#Dongle#500
Z#Ebra#90
E#Quint#88
;
run;

%GLOBAL sqlvar;
%macro loop;

%local num i cntry;
%let dsid = %sysfunc(open(WORK.Table1)); %let num = %sysfunc(attrn(&dsid,nlobs)); %let rc = %sysfunc(close(&dsid)); %put "&dsid"; %put "&num"; %put "&rc"; %do i=1 %to &num.; data _null_; p = &i; SET WORK.Table1 point=p; call symputx('rval',ID1,'G'); stop; run; proc SQL; Select MAX(LookupValue2) into: sqlvar FROM WORK.table2 where ID2 ="&rval." ; quit; Proc SQl; Update WORK.Table1 SET lookupval = &sqlvar. where ID1 = "&rval." ; quit; %end; %mend loop; %loop;
  
PaigeMiller
Diamond | Level 26

I'm guessing this is what you want (although as discussed below, I don't think this is at all necessary)

 

Data _null_;
SET WORK.Table1;
call symputx('rval',ID1,'G');
call execute('proc SQL; Select MAX(LookupValue2) into: sqlvar FROM WORK.table2 where ID2 = "&rval."; quit; DATA SOMETHING; newvar = &sqlvar.; run;');
run;

You can't do it the way you programmed it because &sqlvar isn't given a value until well after the data step creating table CallExecuteLookup has completed. CALL EXECUTE doesn't run until the data step finishes.

 

However, @SASAlex101 you have made the problem much too difficult.

 

If you are just trying to find max values of LookupValue2 for different levels of &rval, why not just use PROC SUMMARY and now the whole issue of CALL EXECUTE or costly SQL joins or hash tables or macro variables goes away, and is replaced with something a lot simpler. THIS IS A BETTER SOLUTION.

 

proc summary data=table2 nway;
    class id2;  
    var lookupvalue2;
    output out=maxx max=max_lookupvalue2;
run;

 

--
Paige Miller
SASAlex101
Quartz | Level 8

Not necessarily limited to Max(), But I wanted to devise a way to look up things that is a bit more intuitive than other SAS methods. 

 

PaigeMiller
Diamond | Level 26

@SASAlex101 wrote:

Not necessarily limited to Max(), But I wanted to devise a way to look up things that is a bit more intuitive than other SAS methods. 

 


 

PROC SUMMARY is not limited to MAX, you can get dozens of statistics from it, as many as you want. So please explain in much more detail the goals of this project, rather than the very un-detailed and impossible to program "look up things".


I have no idea what you mean by "a bit more intuitive" but to most beginners, macro variables and CALL EXECUTE isn't at all intuitive, while PROC SUMMARY is.

--
Paige Miller
svh
Lapis Lazuli | Level 10 svh
Lapis Lazuli | Level 10
Are you trying to get just one value? Because this statement defines &sqlvar several times based on the values of &rval from the input data set. After execution, &sqlvar is null because the last time PROC SQL it runs, there is no max date to produce.
Quentin
Super User

There are tricky timing issues for CALL EXECUTE that will make this not work.

 

CALL EXECUTE doesn't really execute SAS code (despite its name).  It generates code and writes it to the input stack, where it waits to be compiled and executed until after the step with the CALL EXECUTE has completed.  Basically with CALL EXECUTE, you can't have two different steps executing at the same time.

 

When your DATA step runs, it will generate 7 PROC SQL steps and write them to the input stack.  But the PROC SQL steps will be executed after your DATA CallExecuteLookup step has completed.  

 

This is a problem, because your assignment runs as part of the DATA CallExecuteLookup step:

newvar = &sqlvar.;

but the macro variable sqlvar still has a blank value at that point, because the PROC SQL code hasn't executed.

 

If you start a fresh SAS session and run your code, you should get an error:

34   Data CallExecuteLookup;
35   SET WORK.Table1;
36   call symputx('rval',ID1,'G');
37   call execute('proc SQL; Select MAX(LookupValue2) into: sqlvar FROM WORK.table2 where ID2 =
37 ! "&rval."; quit;');
38   newvar = &sqlvar.;
                      -
                      22
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string,
              a numeric constant, a datetime constant, a missing value, INPUT, PUT.

39   run;

That is because SQLVAR does not have a value.  If you remove the %global statement, you will see a clearer message:

194  newvar = &sqlvar.;
              -
              386
              200
WARNING: Apparent symbolic reference SQLVAR not resolved.
ERROR 386-185: Expecting an arithmetic expression.

It's possible to do this sort of thing with DOSUBL, because DOSUBL can actually execute the generated PROC SQL step (in a SAS side-session) while your main DATA step is executing.  But if you've got big data (and I assume you do, if you're worried about running out of memory for a hash table lookup), DOSUBL would be painfully slow.

 

Have you tried simply building a format to use as a lookup table?

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
SASAlex101
Quartz | Level 8
can you demo how you can use DOSUBL ?
Quentin
Super User

@SASAlex101 wrote:
can you demo how you can use DOSUBL ?

I think you could do it like:

data CallExecuteLookup;
  SET WORK.Table1;
  rc=dosubl('proc SQL noprint; Select MAX(LookupValue2) into: sqlvar FROM WORK.table2 where ID2 = "'||ID1||'"; quit;');
  newvar = resolve('&sqlvar');
  put id1= newvar= ;
run;

%symdel sqlvar ;

The log is:

32   data CallExecuteLookup;
33     SET WORK.Table1;
34     rc=dosubl('proc SQL noprint; Select MAX(LookupValue2) into: sqlvar FROM WORK.table2 where ID2 =
34 !  "'||ID1||'"; quit;');
35     newvar = resolve('&sqlvar');
36     put id1= newvar= ;
37   run;


ID1=A newvar=10

ID1=B newvar=20

ID1=C newvar=40

ID1=D newvar=100

ID1=E newvar=88

ID1=F newvar=.

ID1=G newvar=500

ID1=H newvar=.

ID1=I newvar=.
NOTE: There were 9 observations read from the data set WORK.TABLE1.
NOTE: The data set WORK.CALLEXECUTELOOKUP has 9 observations and 3 variables.

Again, I wouldn't use DOSUBL.  I would much prefer using a format, join, or hash table.  But DOSUBL is interesting in what it can do.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Quentin
Super User

@Quentin wrote:

 

Again, I wouldn't use DOSUBL.  I would much prefer using a format, join, or hash table.  But DOSUBL is interesting in what it can do.

@SASAlex101  thanks for taking the time to mark a correct answer.  Just want to be clear that my DOSUBL answer is only "correct" in that it shows how DOSUBL timing differs from CALL EXECUTE timing, and can make this approach work (for a tiny dataset).

 

But if you're worried about the SQL join being slow, or a hash table running out of memory, that would suggest you've got big data.  

 

DOSUBL is a useful tool to have in the toolbox for certain scenarios, but it's also painfully slow (because it does a LOT of work to create and destroy the "side-session").  I increased your input dataset Table1 to have 100,000 records and hit run, but got tired of waiting.   Then I tried it with 10,000 records and got tired.  Finally I tried 900 records, and it completed in 150 seconds.  So for even small data, calling DOSUBL for every record might give you an answer, but it will be painfully slow, and while I'm not usually one to worry about efficiency, this approach is so slow that it's hard to call it "correct."

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Tom
Super User Tom
Super User

Normally you do not want to think that way.  SAS (and SQL) operate on SETS of data all at once.  There is no need to craft step by step operations to work on each data point individually as if you were writing assembly code (or using Excel).

proc sql ;
 create table CallExecuteLookup as
   select a.*,max(b.lookupvalue2) as newvar
   from table1 a 
     left join table2 b
     on a.id1=b.id2
   group by a.id1
 ;
quit;

Result:

OBS    ID1    newvar

 1      A        10
 2      B        20
 3      C        40
 4      D       100
 5      E        88
 6      F         .
 7      G       500
 8      H         .
 9      I         .

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1684 views
  • 5 likes
  • 5 in conversation