BookmarkSubscribeRSS Feed
JuSAS
Calcite | Level 5

Hi folks,

 

Not sure if that's the right place to ask for this, but I just identified a weird behavior of the proc sql.

Let's review this piece of code:

 

proc format;
    value yn 
       0='No' 
       1='Yes'
   ;
run;

data ds1;
    attrib pt format=$2.;
    attrib question format=yn.;
    
    pt='01'; question=0;  output;
    pt='02'; question=1;  output;
run;

proc sql noprint;
    create table test as
        select      pt 'Patient',
                    question label='Yes/no'
        from        ds1
    ;
quit;

The output dataset test is as expected, in particular the variable question has format yn. and label "Yes/no"

 

Now, let's update the code to add an union:

proc format;
    value yn 
       0='No' 
       1='Yes'
   ;
run;

data ds1;
    attrib pt format=$2.;
    attrib question format=yn.;
    
    pt='01'; question=0;  output;
    pt='02'; question=1;  output;
run;

data ds2;
    attrib pt format=$2.;
    attrib question format=yn.;
    
    pt='02'; question=0;  output;
    pt='03'; question=1;  output;
run;

proc sql noprint;
    create table test as
        select      pt 'Patient',
                    question label='Yes/no' /* la label fait sauter le format */ 
        from        ds1
        
        union corr
        
        select      pt,
                    question 
        from        ds2
    ;
quit;
    

The output dataset test is NOT as expected, in particular the variable question has no format anymore, but the expected "Yes/no" label.

 

If we remove the label modifier, the format is kept

 

Is this behavior expected?

 

 

5 REPLIES 5
SASJedi
SAS Super FREQ

It's not what I would expect. I suspect you've found a bug. Using this code, the problem is reproducible in SAS9.4 M8 and Viya 2024.04.

proc format;
	value yn
	0='No'
	1='Yes';
run;

data ds1;
	format pt $2. question yn.;
	label pt ='Patient' 
		  question ='Yes or No';
	pt='01';
	question=0;
	output;
	pt='02';
	question=1;
	output;
run;

data ds2;
	format pt $2. question yn.;
	label pt ='Patient' 
		  question ='Yes or No';
	pt='03';
	question=1;
	output;
	pt='04';
	question=2;
	output;
run;

proc sql ;
title "DS1";
select * from ds1;

title "DS2";
select * from ds2;

title "DS1 union DS2";
select pt
	 , question 
	from ds1
	union corr
	select *
	from ds2;

title "DS1 union DS2 ";
title2 "Changing the label of the second column removes formatting";
select pt
	 , question label='Y/N'
	from ds1
	union corr
	select *
	from ds2;
quit;
Check out my Jedi SAS Tricks for SAS Users
Tom
Super User Tom
Super User

PROC SQL uses different logic to consolidate the formats and labels attached to the same variables from different datasets than a DATA step would use. 

 

In a data step the first place where a FORMAT or LABEL exists is the one that "wins" even it is not the first dataset referenced in the DATA step.  

 

In a SQL step with UNION CORR then FORMAT and LABEL on the first dataset "wins", even when those they are empty. 

 

But you seem to have discovered a wrinkle on how this is done when the first sub-query of the UNION is not a simply a reference to an existing variable. 

 

I suspect that it caused by an order or operations issue in the logic. 

JuSAS
Calcite | Level 5

Well,

 

Despite I agree that data step and proc sql are probably different in the way they process the data, my concern is that for given example, the result is not intuive at all.

 

Both data tables used in the proc sql have the exact same columns (and format)

Thus, performing a join should not remove the format.

It could be acceptable if both columns question had the same type but different formats.

That's not the case here: both variable shave the same type (numeric) and same format (yn.)

Thus, it is expected that column questino in the output data table is numeric with yn. format.

 

There is no clear explanation why this is happening, and that's not documented.

Tom
Super User Tom
Super User

I was just thinking of reasons how/why they made this mistake.

 

Did you try it when the label was applied in a subquery?  Does it still have the issue?

select * from (select x label='new label' from have)
union corr
select x from have

 

 

SASJedi
SAS Super FREQ

This is definitely a bug. The easiest workaround is to re-apply the format when you change the label. 

proc sql;
select pt
	 ,question label='Y/N' format=yn.
	from ds1
	union corr
	select *
	from ds2;
quit;

If this is a concern for  you in your day-to-day SAS use, I recommend submitting it to SAS tech support for further evaluation via the Customer Service Portal - Customer Support (sas.com)

 

 

Check out my Jedi SAS Tricks for SAS Users

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 429 views
  • 6 likes
  • 3 in conversation