- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi experts,
I have two datasets as follows:
data grad;
input club $ win loss year champions $;
datalines;
Grad_1 5 5 2019 none
Grad_2 4 6 2019 once
Grad_3 7 3 2019 twice
Grad_4 2 8 2019 none
;
run;
data postgrad;
input club $ win loss champions $;
datalines;
Pstgrd_1 6 4 once
Pstgrd_2 8 2 once
Pstgrd_3 2 8 none
Pstgrd_4 7 3 twice
;
run;
I used proc compare:
proc compare base= grad
compare= postgrad
out= comparison
outnoequal outbase outcomp outdif;
run;
It gave me:
The COMPARE Procedure Comparison of WORK.GRAD with WORK.POSTGRAD (Method=EXACT) Data Set Summary Dataset Created Modified NVar NObs WORK.GRAD 17JUN22:11:49:43 17JUN22:11:49:43 5 4 WORK.POSTGRAD 17JUN22:11:49:43 17JUN22:11:49:43 4 4 Variables Summary Number of Variables in Common: 4. Number of Variables in WORK.GRAD but not in WORK.POSTGRAD: 1. Observation Summary Observation Base Compare First Obs 1 1 First Unequal 1 1 Last Unequal 4 4 Last Obs 4 4 Number of Observations in Common: 4. Total Number of Observations Read from WORK.GRAD: 4. Total Number of Observations Read from WORK.POSTGRAD: 4. Number of Observations with Some Compared Variables Unequal: 4. Number of Observations with All Compared Variables Equal: 0. Values Comparison Summary Number of Variables Compared with All Observations Equal: 0. Number of Variables Compared with Some Observations Unequal: 4. Total Number of Values which Compare Unequal: 15. Maximum Difference: 5. The SAS System The COMPARE Procedure Comparison of WORK.GRAD with WORK.POSTGRAD (Method=EXACT) All Variables Compared have Unequal Values Variable Type Len Ndif MaxDif club CHAR 8 4 win NUM 8 4 5.000 loss NUM 8 4 5.000 champions CHAR 8 3 Value Comparison Results for Variables __________________________________________________________ || Base Value Compare Value Obs || club club ________ || ________ ________ || 1 || Grad_1 Pstgrd_1 2 || Grad_2 Pstgrd_2 3 || Grad_3 Pstgrd_3 4 || Grad_4 Pstgrd_4 __________________________________________________________ __________________________________________________________ || Base Compare Obs || win win Diff. % Diff ________ || _________ _________ _________ _________ || 1 || 5.0000 6.0000 1.0000 20.0000 2 || 4.0000 8.0000 4.0000 100.0000 3 || 7.0000 2.0000 -5.0000 -71.4286 4 || 2.0000 7.0000 5.0000 250.0000 __________________________________________________________ __________________________________________________________ || Base Compare Obs || loss loss Diff. % Diff ________ || _________ _________ _________ _________ || 1 || 5.0000 4.0000 -1.0000 -20.0000 2 || 6.0000 2.0000 -4.0000 -66.6667 3 || 3.0000 8.0000 5.0000 166.6667 4 || 8.0000 3.0000 -5.0000 -62.5000 __________________________________________________________ The SAS System The COMPARE Procedure Comparison of WORK.GRAD with WORK.POSTGRAD (Method=EXACT) Value Comparison Results for Variables __________________________________________________________ || Base Value Compare Value Obs || champions champions ________ || ________ ________ || 1 || none once 3 || twice none 4 || none twice __________________________________________________________
Output dataset:
Now, I have some questions:
1. Base Grad_1
Compare Pstgrd_1
Diff: XXXXXXXX
By XXXXXXXX, what 8 difference is it showing? Is it showing Compare value's (under the variable 'club') length? X does not mean anything. Why is it not saying what exactly is different? For example: Grad_1 is 6 characters long and Pstgrd_1 is 8 characters long. Where is it showing what is different in the values?
How can I change the code so that it tells me what and how are there any differences?
I need it to say(Let's limit to character variable only here)-
Grad_1 and Pstgrd_1 has a difference:
Pstgrd...
Grad...
(still 'grd' are common letters.)
It also has similarities:
..._1
..._1
2. How to interpret and output the differences(turning them something legible like aA/bB/Cc or 1/2/3 ...etc. instead of a cryptic XXXX... ) through SAS code ?
3. Suppose I capture the exact differences (through your expert tips and trick, of course😊), I now need to consolidate them into a single table. The problem is:
There are widespread differences between the compared values in almost all the cells/rows, I need to tell it to give me the value differences ( such as 'once' & 'none', 'twice' & 'none') whenever there is any difference. It needs to tell me what the base(Grad_1) and compare values(Pstgrd_1) are and what differences exist between them .
_obs_ and _type_ are variables that are created by SAS, not my own program. Dif is attached to _Type_.
How can I tell the program to:
1.First see the base and compare variable values.
2.Fnd the differences between them, if any
3. Give me those differences.
proc sql;
create table comp as
select CP.*
from comparison CP
where CLUB like ('%X%') and and champions like ('X%')
;
quit;
It gave me:
It did not tell me what the base and compare values were under the 'club' variable/column. I need to explain this table to a layperson replacing the X and/or X... with letters /words/ numbers/signs etc., whatever applicable.
data test;
set comp(drop= _obs_ _type_);
run;
Could anyone tell me how to crack/solve this? I appreciate your help in advance.
Log-
102 data grad;
103 input club $ win loss year champions $;
104 datalines;
NOTE: The data set WORK.GRAD has 4 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
109 ;
110 run;
111
112 data postgrad;
113 input club $ win loss champions $;
114 datalines;
NOTE: The data set WORK.POSTGRAD has 4 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
119 ;
120 run;
121
122 proc compare base= grad
123 compare= postgrad
124 out= comparison
125 outnoequal outbase outcomp outdif;
126 run;
NOTE: There were 4 observations read from the data set WORK.GRAD.
NOTE: There were 4 observations read from the data set WORK.POSTGRAD.
NOTE: The data set WORK.COMPARISON has 12 observations and 6 variables.
NOTE: PROCEDURE COMPARE used (Total process time):
real time 0.03 seconds
cpu time 0.04 seconds
127
128 proc sql;
129 create table comp as
130 select CP.*
131 from comparison CP
132 where CLUB like ('%X%')and champions like ('X%')
133 ;
NOTE: Table WORK.COMP created, with 3 rows and 6 columns.
134 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
135
136 data test;
137 set comp(drop= _obs_ _type_);
138 run;
NOTE: There were 3 observations read from the data set WORK.COMP.
NOTE: The data set WORK.TEST has 3 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The X mark the locations where there are differences. The periods mark locations where they are the same.
Look at the output with FIXED width font, otherwise it is hard for humans to align the three strings and tell what they mean.
The one thing I would do is truncate DIF value to eliminate the extra dots. So truncate to the length of the longer of the two values being compared. Your example variables are short ($8) so you don't see it. But if they were length $15 instead you would see something like:
BASE Grad_1 COMPARE Pstgrd_1 DIF XXXXXXXX.......
If you have some values where some of the character match it is also more obvious what the meaning of the X and . characters are in the DIF string.
BASE Grad_1 COMPARE Pragrd_1 DIF X..XXXXX.......
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Partial explanation:
The differences for character variables are noted with an X or a period (.). An X shows that the characters do not match. A period shows that the characters do match. For numeric variables, an E means that there is no difference.
You can explore various Proc Compare options to customize result, but for the details you require may need custom data step or proc sql code.
For more Proc Compare details and examples see documentation:
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n1f136xdchlfrtn1663tmzklctb6.htm
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc compare base= grad brief transpose
compare= postgrad
out= comparison
outnoequal outbase outcomp outdif;
run;