BookmarkSubscribeRSS Feed
inquistive
Quartz | Level 8

Hi experts,

I have two datasets as follows:

data grad;
    input club $ win loss year champions $;
    datalines;
Grad_1 5 5 2019 none
Grad_2 4 6 2019 once
Grad_3 7 3 2019 twice
Grad_4 2 8 2019 none
;
run;

data postgrad;
    input club $ win loss champions $;
    datalines;
Pstgrd_1 6 4 once
Pstgrd_2 8 2 once
Pstgrd_3 2 8 none
Pstgrd_4 7 3 twice
;
run;

I used proc compare:

proc compare base= grad 
             compare= postgrad
			 out= comparison
             outnoequal outbase outcomp outdif;
run;

It gave me:

                                        The COMPARE Procedure                                         
                              Comparison of WORK.GRAD with WORK.POSTGRAD                              
                                            (Method=EXACT)                                            
                                                                                                      
                                          Data Set Summary                                            
                                                                                                      
                   Dataset                 Created          Modified  NVar    NObs                    
                                                                                                      
                   WORK.GRAD      17JUN22:11:49:43  17JUN22:11:49:43     5       4                    
                   WORK.POSTGRAD  17JUN22:11:49:43  17JUN22:11:49:43     4       4                    
                                                                                                      
                                                                                                      
                                          Variables Summary                                           
                                                                                                      
                    Number of Variables in Common: 4.                                                 
                    Number of Variables in WORK.GRAD but not in WORK.POSTGRAD: 1.                     




                                                                                                      
                                                                                                      
                                         Observation Summary                                          
                                                                                                      
                                    Observation      Base  Compare                                    
                                                                                                      
                                    First Obs           1        1                                    
                                    First Unequal       1        1                                    
                                    Last  Unequal       4        4                                    
                                    Last  Obs           4        4                                    
                                                                                                      
                   Number of Observations in Common: 4.                                               
                   Total Number of Observations Read from WORK.GRAD: 4.                               
                   Total Number of Observations Read from WORK.POSTGRAD: 4.                           
                                                                                                      
                   Number of Observations with Some Compared Variables Unequal: 4.                    
                   Number of Observations with All Compared Variables Equal: 0.                       
                                                                                                      
                                                                                                      
                                      Values Comparison Summary                                       
                                                                                                      
                   Number of Variables Compared with All Observations Equal: 0.                       
                   Number of Variables Compared with Some Observations Unequal: 4.                    
                   Total Number of Values which Compare Unequal: 15.                                  
                   Maximum Difference: 5.                                                             
                                                                                                      






      The SAS System


                                        The COMPARE Procedure                                         
                              Comparison of WORK.GRAD with WORK.POSTGRAD                              
                                            (Method=EXACT)                                            
                                                                                                      
                              All Variables Compared have Unequal Values                              
                                                                                                      
                                Variable   Type  Len  Ndif   MaxDif                                   
                                                                                                      
                                club       CHAR    8     4                                            
                                win        NUM     8     4    5.000                                   
                                loss       NUM     8     4    5.000                                   
                                champions  CHAR    8     3                                            
                                                                                                      




                                                                                                      
                                                                                                      
                                Value Comparison Results for Variables                                
                                                                                                      
                      __________________________________________________________                      
                                 ||  Base Value           Compare Value                               
                             Obs ||  club                  club                                       
                       ________  ||  ________              ________                                   
                                 ||                                                                   
                              1  ||  Grad_1                Pstgrd_1                                   
                              2  ||  Grad_2                Pstgrd_2                                   
                              3  ||  Grad_3                Pstgrd_3                                   
                              4  ||  Grad_4                Pstgrd_4                                   
                      __________________________________________________________                      
                                                                                                      
                                                                                                      
                      __________________________________________________________                      
                                 ||       Base    Compare                                             
                             Obs ||        win        win      Diff.     % Diff                       
                       ________  ||  _________  _________  _________  _________                       
                                 ||                                                                   
                              1  ||     5.0000     6.0000     1.0000    20.0000                       
                              2  ||     4.0000     8.0000     4.0000   100.0000                       
                              3  ||     7.0000     2.0000    -5.0000   -71.4286                       
                              4  ||     2.0000     7.0000     5.0000   250.0000                       
                      __________________________________________________________                      
                                                                                                      
                                                                                                      
                      __________________________________________________________                      
                                 ||       Base    Compare                                             
                             Obs ||       loss       loss      Diff.     % Diff                       
                       ________  ||  _________  _________  _________  _________                       
                                 ||                                                                   
                              1  ||     5.0000     4.0000    -1.0000   -20.0000                       
                              2  ||     6.0000     2.0000    -4.0000   -66.6667                       
                              3  ||     3.0000     8.0000     5.0000   166.6667                       
                              4  ||     8.0000     3.0000    -5.0000   -62.5000                       
                      __________________________________________________________                      






      The SAS System


                                        The COMPARE Procedure                                         
                              Comparison of WORK.GRAD with WORK.POSTGRAD                              
                                            (Method=EXACT)                                            
                                                                                                      
                                Value Comparison Results for Variables                                
                                                                                                      
                      __________________________________________________________                      
                                 ||  Base Value           Compare Value                               
                             Obs ||  champions             champions                                  
                       ________  ||  ________              ________                                   
                                 ||                                                                   
                              1  ||  none                  once                                       
                              3  ||  twice                 none                                       
                              4  ||  none                  twice                                      
                      __________________________________________________________                      

Output dataset:

inquistive_0-1655482225421.png

 

Now, I have some questions:

1. Base        Grad_1 

   Compare  Pstgrd_1

  Diff:           XXXXXXXX

  By XXXXXXXX, what 8 difference is it showing? Is it showing Compare value's (under the variable 'club') length? X does not mean anything.  Why is it not  saying what exactly is different? For example: Grad_1 is 6 characters long and Pstgrd_1 is 8 characters long. Where is it showing what is different in the values?

 

How can I change the code so that it tells me what and how are there any differences?

I need it to say(Let's limit to character variable only here)-

Grad_1 and Pstgrd_1 has a difference:

Pstgrd...

Grad...

(still 'grd' are common letters.)

It also has similarities:

..._1

..._1

 

2. How to interpret and output the differences(turning them something legible like  aA/bB/Cc or 1/2/3 ...etc. instead of a cryptic XXXX... ) through SAS code ?

3. Suppose I capture the exact differences (through your expert tips and trick, of course😊), I now need to consolidate them into a single table. The problem is:

  There are widespread differences between the compared values in almost all the cells/rows, I need to tell it to give me the value differences ( such as 'once' & 'none', 'twice' & 'none') whenever there is any difference. It needs  to tell me what the base(Grad_1) and compare values(Pstgrd_1) are and what differences exist between them .

_obs_ and _type_ are  variables that are created by SAS, not my own program. Dif is attached to _Type_. 

How can I tell the program to:

1.First see the base and compare variable values.

2.Fnd the differences between them, if any

3. Give me those differences.

proc sql;
 create table comp as
   select CP.* 
    from comparison CP
    where CLUB like ('%X%') and and champions like ('X%')
 ;
 quit;

It gave me:

inquistive_1-1655486542604.png

It did not tell me what the base and compare values were under the 'club' variable/column. I need to explain this table to  a layperson replacing the X and/or X... with letters /words/ numbers/signs etc., whatever applicable. 

 

data test;
    set comp(drop= _obs_ _type_);
 run;

inquistive_2-1655486916965.png

 

Could anyone tell me how to crack/solve this? I appreciate your help in advance. 

 

Log-

102 data grad;
103 input club $ win loss year champions $;
104 datalines;

NOTE: The data set WORK.GRAD has 4 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds


109 ;
110 run;
111
112 data postgrad;
113 input club $ win loss champions $;
114 datalines;

NOTE: The data set WORK.POSTGRAD has 4 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds


119 ;
120 run;
121
122 proc compare base= grad
123 compare= postgrad
124 out= comparison
125 outnoequal outbase outcomp outdif;
126 run;

NOTE: There were 4 observations read from the data set WORK.GRAD.
NOTE: There were 4 observations read from the data set WORK.POSTGRAD.
NOTE: The data set WORK.COMPARISON has 12 observations and 6 variables.
NOTE: PROCEDURE COMPARE used (Total process time):
real time 0.03 seconds
cpu time 0.04 seconds


127
128 proc sql;
129 create table comp as
130 select CP.*
131 from comparison CP
132 where CLUB like ('%X%')and champions like ('X%')
133 ;
NOTE: Table WORK.COMP created, with 3 rows and 6 columns.

134 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds


135
136 data test;
137 set comp(drop= _obs_ _type_);
138 run;

NOTE: There were 3 observations read from the data set WORK.COMP.
NOTE: The data set WORK.TEST has 3 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

 

3 REPLIES 3
Tom
Super User Tom
Super User

The X mark the locations where there are differences.  The periods mark locations where they are the same.

Look at the output with FIXED width font, otherwise it is hard for humans to align the three strings and tell what they mean.

 

The one thing I would do is truncate DIF value to eliminate the extra dots. So truncate to the length of the longer of the two values being compared.  Your example variables are short ($8) so you don't see it. But if they were length $15 instead you would see something like:

BASE    Grad_1                
COMPARE Pstgrd_1
DIF     XXXXXXXX.......          

If you have some values where some of the character match it is also more obvious what the meaning of the X and . characters are in the DIF string.

BASE    Grad_1                
COMPARE Pragrd_1
DIF     X..XXXXX.......     

 

JOL
SAS Employee JOL
SAS Employee

Partial explanation:

The differences for character variables are noted with an X or a period (.). An X shows that the characters do not match. A period shows that the characters do match. For numeric variables, an E means that there is no difference.

 

You can explore various Proc Compare options to customize result, but for the details you require may need custom data step or proc sql code.

 

For more Proc Compare details and examples see documentation:

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n1f136xdchlfrtn1663tmzklctb6.htm

 

Ksharp
Super User
Try option BRIEF TRANSPOSE.


proc compare base= grad brief transpose
compare= postgrad
out= comparison
outnoequal outbase outcomp outdif;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 587 views
  • 3 likes
  • 4 in conversation