Hi experts,
I have two datasets as follows:
data grad;
input club $ win loss year champions $;
datalines;
Grad_1 5 5 2019 none
Grad_2 4 6 2019 once
Grad_3 7 3 2019 twice
Grad_4 2 8 2019 none
;
run;
data postgrad;
input club $ win loss champions $;
datalines;
Pstgrd_1 6 4 once
Pstgrd_2 8 2 once
Pstgrd_3 2 8 none
Pstgrd_4 7 3 twice
;
run;
I used proc compare:
proc compare base= grad
compare= postgrad
out= comparison
outnoequal outbase outcomp outdif;
run;
It gave me:
The COMPARE Procedure
Comparison of WORK.GRAD with WORK.POSTGRAD
(Method=EXACT)
Data Set Summary
Dataset Created Modified NVar NObs
WORK.GRAD 17JUN22:11:49:43 17JUN22:11:49:43 5 4
WORK.POSTGRAD 17JUN22:11:49:43 17JUN22:11:49:43 4 4
Variables Summary
Number of Variables in Common: 4.
Number of Variables in WORK.GRAD but not in WORK.POSTGRAD: 1.
Observation Summary
Observation Base Compare
First Obs 1 1
First Unequal 1 1
Last Unequal 4 4
Last Obs 4 4
Number of Observations in Common: 4.
Total Number of Observations Read from WORK.GRAD: 4.
Total Number of Observations Read from WORK.POSTGRAD: 4.
Number of Observations with Some Compared Variables Unequal: 4.
Number of Observations with All Compared Variables Equal: 0.
Values Comparison Summary
Number of Variables Compared with All Observations Equal: 0.
Number of Variables Compared with Some Observations Unequal: 4.
Total Number of Values which Compare Unequal: 15.
Maximum Difference: 5.
The SAS System
The COMPARE Procedure
Comparison of WORK.GRAD with WORK.POSTGRAD
(Method=EXACT)
All Variables Compared have Unequal Values
Variable Type Len Ndif MaxDif
club CHAR 8 4
win NUM 8 4 5.000
loss NUM 8 4 5.000
champions CHAR 8 3
Value Comparison Results for Variables
__________________________________________________________
|| Base Value Compare Value
Obs || club club
________ || ________ ________
||
1 || Grad_1 Pstgrd_1
2 || Grad_2 Pstgrd_2
3 || Grad_3 Pstgrd_3
4 || Grad_4 Pstgrd_4
__________________________________________________________
__________________________________________________________
|| Base Compare
Obs || win win Diff. % Diff
________ || _________ _________ _________ _________
||
1 || 5.0000 6.0000 1.0000 20.0000
2 || 4.0000 8.0000 4.0000 100.0000
3 || 7.0000 2.0000 -5.0000 -71.4286
4 || 2.0000 7.0000 5.0000 250.0000
__________________________________________________________
__________________________________________________________
|| Base Compare
Obs || loss loss Diff. % Diff
________ || _________ _________ _________ _________
||
1 || 5.0000 4.0000 -1.0000 -20.0000
2 || 6.0000 2.0000 -4.0000 -66.6667
3 || 3.0000 8.0000 5.0000 166.6667
4 || 8.0000 3.0000 -5.0000 -62.5000
__________________________________________________________
The SAS System
The COMPARE Procedure
Comparison of WORK.GRAD with WORK.POSTGRAD
(Method=EXACT)
Value Comparison Results for Variables
__________________________________________________________
|| Base Value Compare Value
Obs || champions champions
________ || ________ ________
||
1 || none once
3 || twice none
4 || none twice
__________________________________________________________
Output dataset:
Now, I have some questions:
1. Base Grad_1
Compare Pstgrd_1
Diff: XXXXXXXX
By XXXXXXXX, what 8 difference is it showing? Is it showing Compare value's (under the variable 'club') length? X does not mean anything. Why is it not saying what exactly is different? For example: Grad_1 is 6 characters long and Pstgrd_1 is 8 characters long. Where is it showing what is different in the values?
How can I change the code so that it tells me what and how are there any differences?
I need it to say(Let's limit to character variable only here)-
Grad_1 and Pstgrd_1 has a difference:
Pstgrd...
Grad...
(still 'grd' are common letters.)
It also has similarities:
..._1
..._1
2. How to interpret and output the differences(turning them something legible like aA/bB/Cc or 1/2/3 ...etc. instead of a cryptic XXXX... ) through SAS code ?
3. Suppose I capture the exact differences (through your expert tips and trick, of course😊), I now need to consolidate them into a single table. The problem is:
There are widespread differences between the compared values in almost all the cells/rows, I need to tell it to give me the value differences ( such as 'once' & 'none', 'twice' & 'none') whenever there is any difference. It needs to tell me what the base(Grad_1) and compare values(Pstgrd_1) are and what differences exist between them .
_obs_ and _type_ are variables that are created by SAS, not my own program. Dif is attached to _Type_.
How can I tell the program to:
1.First see the base and compare variable values.
2.Fnd the differences between them, if any
3. Give me those differences.
proc sql;
create table comp as
select CP.*
from comparison CP
where CLUB like ('%X%') and and champions like ('X%')
;
quit;
It gave me:
It did not tell me what the base and compare values were under the 'club' variable/column. I need to explain this table to a layperson replacing the X and/or X... with letters /words/ numbers/signs etc., whatever applicable.
data test;
set comp(drop= _obs_ _type_);
run;
Could anyone tell me how to crack/solve this? I appreciate your help in advance.
Log-
102 data grad; 103 input club $ win loss year champions $; 104 datalines;
NOTE: The data set WORK.GRAD has 4 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
109 ; 110 run; 111 112 data postgrad; 113 input club $ win loss champions $; 114 datalines;
NOTE: The data set WORK.POSTGRAD has 4 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
119 ; 120 run; 121 122 proc compare base= grad 123 compare= postgrad 124 out= comparison 125 outnoequal outbase outcomp outdif; 126 run;
NOTE: There were 4 observations read from the data set WORK.GRAD. NOTE: There were 4 observations read from the data set WORK.POSTGRAD. NOTE: The data set WORK.COMPARISON has 12 observations and 6 variables. NOTE: PROCEDURE COMPARE used (Total process time): real time 0.03 seconds cpu time 0.04 seconds
127 128 proc sql; 129 create table comp as 130 select CP.* 131 from comparison CP 132 where CLUB like ('%X%')and champions like ('X%') 133 ; NOTE: Table WORK.COMP created, with 3 rows and 6 columns.
134 quit; NOTE: PROCEDURE SQL used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
135 136 data test; 137 set comp(drop= _obs_ _type_); 138 run;
NOTE: There were 3 observations read from the data set WORK.COMP. NOTE: The data set WORK.TEST has 3 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
... View more