BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
eawh100
Obsidian | Level 7

Hello, I have generated two summary tables in SAS from my data for two different years. I want to identify changes and differences in cell counts. At present, I have merged the two tables using PROC SQL and planned to perform a column-by-column comparison for all grades and totals. While this approach works, I am curious if there is a more efficient method for comparing the counts of two tables and showing the differences. So, I am wondering whether PROC COMPARE could facilitate such a comparison, particularly for two significantly larger count tables?

 

 

YEAR 1:

CATEGORYGrade_1Grade_2Grade_3
A330
B210
C020
D001
E422
total983

YEAR 2:

CATEGORYGrade_1Grade_2Grade_3
A330
C020
D001
E922
F200
total1473

table from join: I'll just do a flag if grade_1 differs grade1a for each row etc. 

CATEGORYGrade_1Grade_2Grade_3Category_year2Grade_1aGrade_2aGrade_3a
A330A330
B210    
C020C020
D001D001
E922E922
    F200
total1483total1473
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

As @ballardw  pointed out, using option OUT= or other option to output the difference ,and you will see them.

proc compare base=a compare=b brief transpose out=want;
id category;
run;

Ksharp_0-1747097603980.png

 

View solution in original post

10 REPLIES 10
sasgorilla
Pyrite | Level 9

Can you explain more clearly what you are trying to compare? When you say "identify changes and differences in cell counts" do you mean you want to compare Category A, Grade 1 from Year 1 with Category A, Grade 1 from Year 2? Or, are you just interested in row or column totals? 

eawh100
Obsidian | Level 7

Yes, I want to compare Category A, Grade 1 from Year 1 with Category A, Grade 1 from Year 2. 

sasgorilla
Pyrite | Level 9

I am not familiar with proc compare, and I personally would probably write some manual code if I knew exactly what I wanted. 

 

That said, it looks like there are some options in proc compare that might get you what you want. I think the "Value comparison results for variables" pages might be of interest to you. 

 

I started with the below, but there are several other options here

 

data a;
input category $ grade_1 grade_2 grade_3;
datalines;
A 3 3 0
B 2 1 0
C 0 2 0
D 0 0 1
E 4 2 2
Total 9 8 3
;
RUN;

data b;
input category $ grade_1 grade_2 grade_3;
datalines;
A 3 3 0
C 0 2 0
D 0 0 1
E 9 2 2
F 2 0 0
Total 14 7 3
;
RUN;


proc compare base=a compare=b printall;
run;

 

 

quickbluefish
Barite | Level 11

Or just for variety, assuming your datasets are of the same dimensions and are named test1 and test2:

proc iml;
use test1;
read all into A; close test1;
use test2;
read all into B; close test2;
title 'differences';
print(A-B);
quit;

quickbluefish_0-1746843648998.png

...or if you really want to get crazy:


proc iml;
use test1;
read all into A [colname=names]; 
read all var {category} into rnames;
mattrib A rowname=rnames; 	close test1;
use test2;
read all into B [colname=names];  
mattrib B rowname=rnames;   close test2;
A=A-B;
title 'matrix A';
print(A);

names={'category'} || names;
names=names`;

create diffs from rnames A [colname=names];
append from rnames A;
close diffs;
quit;

title 'dataset DIFFS';
proc print data=diffs noobs; run;

quickbluefish_1-1746843767179.png

 

 

eawh100
Obsidian | Level 7

I'll look into this because I'm not too familiar with proc iml. The datasets aren't guaranteed to be the same dimensions, at least row wise. They will have same columns

Ksharp
Super User

Yes. PROC COMPARE is your best company .

data a;
input category $ grade_1 grade_2 grade_3;
datalines;
A 3 3 0
B 2 1 0
C 0 2 0
D 0 0 1
E 4 2 2
Total 9 8 3
;
RUN;

data b;
input category $ grade_1 grade_2 grade_3;
datalines;
A 3 3 0
C 0 2 0
D 0 0 1
E 9 2 2
F 2 0 0
Total 14 7 3
;
RUN;


proc compare base=a compare=b brief transpose;
id category;
run;
                   COMPARE 过程                                                          
                                                     比较 WORK.A 与 WORK.B                                                     
                                                        (METHOD=EXACT)                                                         
                                                                                                                               
                                                        观测的比较结果                                                         
                                                                                                                               
                                category=E:                                                                                    
                                    变量        基准值          比较          差异       差异(%)                               
                                 grade_1      4.000000      9.000000      5.000000    125.000000                               
                                                                                                                               
                                category=Total:                                                                                
                                    变量        基准值          比较          差异       差异(%)                               
                                 grade_1      9.000000     14.000000      5.000000     55.555556                               
                                 grade_2      8.000000      7.000000     -1.000000    -12.500000                               

 



                                                                                                                               
NOTE: 数据集 WORK.A 包含 1 个不在“WORK.B”中的观测。                                                                          
NOTE: 数据集 WORK.B 包含 1 个不在“WORK.A”中的观测。                                                                          
NOTE: 下列 2 个变量的值经比较不相等: grade_1 grade_2     
 
eawh100
Obsidian | Level 7

I like this. And is there any way to output it into a table for easier reading?

ballardw
Super User

There are many options for output data set creation in Proc Compare. The option OUT=libname.datasetname creates a data set, use the options OUTALL, OUTBASE, OUTCOMP, OUTDIF, OUTNOEQUAL and OUTPERCENT to control output. Note that some of these data set formats are harder to read than the results until you figure them out. Look in the Details section of the online help for descriptions of some of the variables that appear in the data set.

 

ODS OUTPUT can produce data sets that more closely resemble the output in the results window. Reference the online help for the name s of the objects to request in one or more ODS OUTPUT statements.

Ksharp
Super User

As @ballardw  pointed out, using option OUT= or other option to output the difference ,and you will see them.

proc compare base=a compare=b brief transpose out=want;
id category;
run;

Ksharp_0-1747097603980.png

 

eawh100
Obsidian | Level 7

This appears to be the solution I needed, as it seems to offer a faster method compared to my manual approach of merging and calculating differences.

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 624 views
  • 8 likes
  • 5 in conversation