BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
steinarv
Fluorite | Level 6

I struggle to understand the behavior of the following code (SAS 9.3). 

(Although the objective is to count distinct values for the columns in "Sashelp.Cars", i am primarily interested in help with understanding why it is not working as expected.)

 

 

data testing (keep = libname memname name dist_count colname1 colname2); 

  set sashelp.vcolumn(obs = 3); 
  where libname = 'SASHELP' and memname = 'CARS'; 

  call symput('colname1', name); 
  rc = dosubl("
    %let colname2 = &colname1;
    proc sql noprint;
      select 
        count(distinct &colname1) into :dist_count 
      from 
        SASHELP.CARS quit; 
    ");
 
  dist_count = symget('dist_count');
  colname1 = symget('colname1');
  colname2 = symget('colname2'); 
run;

%put &colname1; 
%put &colname2;

The first time the code is submitted it gives the following warning: 

 

WARNING: Apparent symbolic reference COLNAME1 not resolved.

The last two lines both outputs "Type" which is what I expected. 

 

In the resulting dataset, the column "dist_count" and "colnam1" is as expected. "colname2" on the other hand takes the value "&colname1".

testing_1_run.PNG

 

When the code is submited one more time, no warnings are generated but in the dataset the column "dist_count" are no longer as expected. Also all values in "colname2" are now "Type".

testing_2_run.PNG

Again, the last two lines generates "Type".

 

The most obvious problem is that "dist_count" no longer is assigned the right values the second time the code is submited.

 

Im interessted in knowing why is this is, and how is it is to be avoided in other scenarios?

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

Agree with Andreas's solution to change to single quotes to avoid the macro variables attempting to resolve before the CALL SYMPUTX has executed.  

 

Part of your confusion in not being able to replicate the results of your first run is the common problem related to global macro variables.  On the second submission, the macro variables exist from the first submission.  It's one of the reasons that global macro variables often prove problematic.  For this sort of playing around, if you are going to create global macro variables in the code, I would start the code by deleting them:

%symdel colname1 colname2 dist_count / nowarn;

Another side point is that you don't need to create macro variables colname1 or colname2 in order to pass values to the DOSUBL side session.  Since you already have the value in the PDV variable NAME, you can use that variable to build the string passed to DOSUBL, e.g.:

 

%symdel dist_count / nowarn;

data testing (keep = libname memname name dist_count); 

  set sashelp.vcolumn(obs = 3); 
  where libname = 'SASHELP' and memname = 'CARS'; 

  rc = dosubl('
    proc sql noprint;
      select 
        count(distinct ' || name || ') into :dist_count 
      from 
        SASHELP.CARS quit; 
    ');
 
  dist_count = symget('dist_count');

  put (_ALL_)(=) ; 
run;

%symdel dist_count ;

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

8 REPLIES 8
andreas_lds
Jade | Level 19

Seems to be a timing-issue: because you are using double-quotes in dosubl, the macro-statements are resolved before call symputx is executed. Switching to single-quotes seems to solve the problem.

Quentin
Super User

Agree with Andreas's solution to change to single quotes to avoid the macro variables attempting to resolve before the CALL SYMPUTX has executed.  

 

Part of your confusion in not being able to replicate the results of your first run is the common problem related to global macro variables.  On the second submission, the macro variables exist from the first submission.  It's one of the reasons that global macro variables often prove problematic.  For this sort of playing around, if you are going to create global macro variables in the code, I would start the code by deleting them:

%symdel colname1 colname2 dist_count / nowarn;

Another side point is that you don't need to create macro variables colname1 or colname2 in order to pass values to the DOSUBL side session.  Since you already have the value in the PDV variable NAME, you can use that variable to build the string passed to DOSUBL, e.g.:

 

%symdel dist_count / nowarn;

data testing (keep = libname memname name dist_count); 

  set sashelp.vcolumn(obs = 3); 
  where libname = 'SASHELP' and memname = 'CARS'; 

  rc = dosubl('
    proc sql noprint;
      select 
        count(distinct ' || name || ') into :dist_count 
      from 
        SASHELP.CARS quit; 
    ');
 
  dist_count = symget('dist_count');

  put (_ALL_)(=) ; 
run;

%symdel dist_count ;

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Quentin
Super User

Another possible DOSUBL approach would be to write a function-style macro that calculates the cardinality of a variable, e.g.:

%macro cardinality(data=,var=);
  %local rc emit; 

  %let rc=%sysfunc(dosubl(%nrstr(
     proc sql noprint;
       select 
         count(distinct &var) into :emit
       from 
         &data
     quit; 

  )));

  &emit 
%mend ;

Then you can read the driver data and use the RESOLVE function to build macro calls, it ends up looking like CALL EXECUTE:

 

data testing (keep = libname memname name dist_count); 

  set sashelp.vcolumn; 
  where libname = 'SASHELP' and memname = 'CARS'; 

  dist_count=resolve('%cardinality('
                                || ' data= ' || cats(libname,'.',memname)
                                || ',var= '  || name
                                || ')'
                      ) ;
  put (libname memname name dist_count)(=) ; 
run;

There are more efficient ways to calculate cardinality (see e.g. https://blogs.sas.com/content/sasdummy/2013/10/16/about-cardinality/ and the papers mentioned in the comment section), but if you're playing around with DOSUBL, the ability to write function-style macros is handy.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
steinarv
Fluorite | Level 6

Thank you for the explanation. Since you write "the macro variables attempting to resolve before the CALL SYMPUTX has executed", I assume this does not happen the first time I submit the code. And this is why it is working as expected the first time..

Quentin
Super User

 

 

@steinarv wrote:

Thank you for the explanation. Since you write "the macro variables attempting to resolve before the CALL SYMPUTX has executed", I assume this does not happen the first time I submit the code. And this is why it is working as expected the first time..


Not quite.  It happens every time you submit the code.  Here's a simpler example without DOSUBL.  Because for your original question, the key point is that macro references inside of double quotes are resolved when the step compiles (before it executes):

 

If you code (in a fresh SAS session):

 

data testing; 

  set sashelp.vcolumn(obs = 3); 
  where libname = 'SASHELP' and memname = 'CARS'; 

  call symput('colname1', name); 

  str="The value of colname1 is: &colname1" ;
  put str= ;
 
  colname1 = symget('colname1');
  put colname1= ;
run;

When the step compiles, the compiler gets to the assignment statement for STR, and attempts to resolve a macro variable named colname1.  There is no existing macro variable named colname1, so it throws the cannot resolve warning, and the failed macro resolution attempt returns &colname1.  The PUT str= statement will print the same line three times.  

 

Note that the SYMGET() function succeeds, because it does its work while the step is executing (not compiling).  When SYMGET executes, there is a macro variable named colname1, and it has a different value every time symget executes.  The log is:

 

1    data testing;
2
3      set sashelp.vcolumn(obs = 3);
4      where libname = 'SASHELP' and memname = 'CARS';
5
6      call symput('colname1', name);
7
8      str="The value of colname1 is: &colname1" ;
WARNING: Apparent symbolic reference COLNAME1 not resolved.
9      put str= ;
10
11     colname1 = symget('colname1');
12     put colname1= ;
13   run;

str=The value of colname1 is: &colname1
colname1=Make
str=The value of colname1 is: &colname1
colname1=Model
str=The value of colname1 is: &colname1
colname1=Type
NOTE: There were 3 observations read from the data set SASHELP.VCOLUMN.
      WHERE (libname='SASHELP') and (memname='CARS');
NOTE: The data set WORK.TESTING has 3 observations and 20 variables.

Now run the same code a second time, and the results are different:

14   data testing;
15
16     set sashelp.vcolumn(obs = 3);
17     where libname = 'SASHELP' and memname = 'CARS';
18
19     call symput('colname1', name);
20
21     str="The value of colname1 is: &colname1" ;
22     put str= ;
23
24     colname1 = symget('colname1');
25     put colname1= ;
26   run;

str=The value of colname1 is: Type
colname1=Make
str=The value of colname1 is: Type
colname1=Model
str=The value of colname1 is: Type
colname1=Type
NOTE: There were 3 observations read from the data set SASHELP.VCOLUMN.
      WHERE (libname='SASHELP') and (memname='CARS');
NOTE: The data set WORK.TESTING has 3 observations and 20 variables.

In the first submission, you created a global macro variable colname1 and ultimately assigned it the value Type.  During the second submission, when the step compiles, it can resolve the reference to &colname1.

 

It's the same issue with your original code.  The first time the reference cannot resolve, after that, it does resolve (to a "legacy" value which happened to be sitting in the global symbol table). 

 

If your first submission gave you the desired result (despite the resolution warnings), it's likely a side effect of the fact that even though the reference to &colname1 could not be resolved during data step compilation, when the code is executed in the DOSUBL environment it will be able to resolve.  Because by the time the DOSUBL() function executes, the CALL SYMPUT has already executed.

 

And just to add, your instinct to focus on timing is absolutely correct.  Thinking through the timing of DATA step compilation vs execution is critical to understanding SAS.  And the timing issues only become more complex (and more critical) when you introduce automated code generation, whether it's done via the macro language, DOSUBL, CALL EXECUTE, or even PUT statements and %INCLUDEs.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
jimbarbour
Meteorite | Level 14

Any code in double quotes using percent or ampersand notation will be resolved before execution begins, so DOSUBL will execute after %LET Colname2 = Colname1 has already been resolved.  Likewise, the "colname1" in COUNT(distinct &colname1) will have been resolved before the DOSUBL begins execution.

 

As earlier replies have suggested, a single quote delays resolution of the percent and ampersand notated code.

 

Another option is to bypass the use of DOSUBL (which is expensive in terms of CPU time) and the use of SYMGET and code the following:

 

%LET	Obs	=	3;

DATA	_NULL_;
	i	+	1;
	set sashelp.vcolumn(obs = &Obs); 
		where libname = 'SASHELP' and memname = 'CARS'; 

	call symput(CATS('colname', PUT(i, 5.)), name); 
RUN;

&Null	%MACRO	Get_Counts(Obs);
	%DO	i	=	1	%TO	&Obs;
		OPTIONS	NONOTES;
		proc sql;
			select	count(distinct &&colname&i)	AS	Distinct_Count
				into	:dist_count 
				from	SASHELP.CARS 
				;
		QUIT; 
		OPTIONS	NOTES;

		%PUT	&Nte2  ;
		%PUT	&Nte1  Colname&i=&&colname&i; 
		%PUT	&Nte2  Distinct Count=%SYSFUNC(STRIP(&dist_count));
	%END;
%MEND	Get_Counts;

%Get_Counts(&Obs);

The log from the above looks like:

 

 

NOTE:  Colname1=Make
       Distinct Count=38
     
NOTE:  Colname2=Model
       Distinct Count=425
     
NOTE:  Colname3=Type
       Distinct Count=6

The data step creates a little macro array, and then an SQL step is driven by the macro array.  To me, this is easier to follow the logic of, but each to their own.

 

 

Jim

steinarv
Fluorite | Level 6
Thank you for your explanation and suggestion. The order of execution of dosubl was my main concern, rather than cardinality. It is still very useful and instructive to see alternative approaches to these kinds of problems.

When I try to run your code I get some warnings about NULL and $nte: “Apparent symbolic reference not resolved”.
jimbarbour
Meteorite | Level 14

&Null is just a trick to preserve coloration in the SAS editor.  The definition is:

%let Null = ;

&Null therefore resolves to, well, nothing.  By putting &Null before the %MACRO, the colorization of the DATA step and other non-macro code is preserved.

 

&Nte1 resolves to NOTE:

&Nte2 resolves to NOTE-

I also use:

&Warn1 which resolves to WARNING:

&Warn2 which resolves to WARNING-

&Err1 which resolves to ERROR:

&Err2 which resolves to ERROR-

 

I use these to make text searches in the SAS log easier.  If I search on, for example, "Error: " I only find places where an actual error is issued by SAS, not occurances in the source code which may or may not have actually executed.

 

Jim

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1035 views
  • 8 likes
  • 4 in conversation