Given: Two sets of box plots. Suppose one represents Time 0. The second one represents Time 1.
(As a for-instance, perhaps the data are from population surveys, something like Likert scales, prior to a large marketing campaign, and given again following the marketing campaign.)
Through SAS Boxplot we are able to create a "history" dataset, which holds the key information essential to the box plots.
Thus, here is the partial results of "history" Data Set 0:
Combo | v50501L | v505011 | v50501X | v50501M | v505013 | v50501H | v50501S | v50501N |
20201 | 3.07941 | 3.16402 | 3.43257 | 3.28578 | 3.56120 | 4.57473 | 0.41611 | 16 |
20202 | 3.08643 | 3.21101 | 3.85322 | 3.88925 | 4.47217 | 4.66491 | 0.56595 | 29 |
20203 | 3.09275 | 3.93011 | 5.04295 | 4.88278 | 6.10475 | 8.10604 | 1.35891 | 51 |
20204 | 3.09588 | 3.59688 | 4.41027 | 4.26929 | 5.07548 | 6.18251 | 0.89046 | 50 |
20205 | 3.07901 | 3.63219 | 4.43230 | 4.26111 | 5.11600 | 6.70591 | 1.00551 | 46 |
And here is the partial results of "history" Data Set 1:
Combo | v50508L | v505081 | v50508X | v50508M | v505083 | v50508H | v50508S | v50508N |
20201 | 9.7643 | 10.2730 | 10.6535 | 10.6346 | 11.1293 | 11.4314 | 0.5046217883 | 34 |
20202 | 9.8003 | 10.7847 | 11.3163 | 11.5175 | 11.8450 | 12.4824 | 0.7071127263 | 52 |
20203 | 9.7950 | 11.0608 | 12.3718 | 12.4882 | 13.4592 | 15.1603 | 1.4240605479 | 59 |
20204 | 10.1470 | 11.3915 | 12.8449 | 13.0140 | 14.4063 | 15.5220 | 1.7020733552 | 57 |
20205 | 9.7520 | 9.9268 | 10.1919 | 10.1796 | 10.3164 | 10.8573 | 0.3095559921 | 25 |
What I'd like to do is get the difference between Data Set 0 and Data Set 1. Plain and simple. Cell for cell.
(Never mind that some cells may be meaningless, i.e., the last column, difference in N.)
For instance....
Combo 20201, v50508L = 9.7643
and
Combo 20201, v50501L = 3.07941
Difference = 6.68489
And on, and on.
Can anyone think of a way to do this directly in SAS? If not in SAS, then in any other program?
Thanks much!
Nicholas Kormanik
.
Thanks for the input, folks. Introduced me to some new ideas.
The way I handled the problem was to use "Merge". I merged the different data sets into one data set, using "By" the first column. Then subtracted one variable (column) from another, as needed. Perhaps the most basic of approaches. I'll look into, though, your more sophisticated ways.
Thanks again.
have you looked at PROC COMPARE?
sample code:
data time0;
input id $ var1-var5;
cards;
aa 5 4 3 2 1
bb 6 7 8 9 10
;
data time1;
input id $ var1-var5;
cards;
aa 5 9 3 2 1
bb 8 7 8 9 11
;
proc sort data=time0;by id;run;
proc sort data=time1;by id;run;
%let lib=work;
%let dsn=time0;
/* put your variables in a macro variable */
proc sql noprint;
select catx('=',name,cats(name,'_t1')) into : renames separated by ' '
from dictionary.columns
where libname="%upcase(&lib)" and memname="%upcase(&dsn)" and type='num' and lowcase(name) not in ('id')/* put
the variables you don't want to be calculated here */;
select cats('diff_',name,'=',name,cats('-',name,'_t1')) into : diff separated by '; '
from dictionary.columns
where libname="%upcase(&lib)" and memname="%upcase(&dsn)" and type='num' and lowcase(name) not in ('id');
quit;
/* rename your vareiables in dataset time1 */
proc datasets lib=work nolist;
modify time1;
rename &renames;
quit;
data want;
merge time0(in=a) time1(in=b);
by id;
if a and b;
&diff;
run;
proc print;run;
d d d d d
i i i i i
v v v v v f f f f f
a a a a a f f f f f
r r r r r _ _ _ _ _
v v v v v 1 2 3 4 5 v v v v v
O a a a a a _ _ _ _ _ a a a a a
b i r r r r r t t t t t r r r r r
s d 1 2 3 4 5 1 1 1 1 1 1 2 3 4 5
1 aa 5 4 3 2 1 5 9 3 2 1 0 -5 0 0 0
2 bb 6 7 8 9 10 8 7 8 9 11 -2 0 0 0 -1
LinLin
with your datasets try :
proc compare base= time0 compare= time1 outall outnoequal
out= compares ;
id id ;
run ;
it looks like your ID values don't match, so no DIF rows will apear.
When IDs match, extra obs are created with _TYPE_ = DIF and PERCENT
I used this code
data h0 ;
input id var1-var8 ; *
Combo v50501L v505011 v50501X v50501M v505013 v50501H v50501S v50501N
;cards;
20201 3.07941 3.16402 3.43257 3.28578 3.5612 4.57473 0.41611 16
20202 3.08643 3.21101 3.85322 3.88925 4.47217 4.66491 0.56595 29
20203 3.09275 3.93011 5.04295 4.88278 6.10475 8.10604 1.35891 51
20204 3.09588 3.59688 4.41027 4.26929 5.07548 6.18251 0.89046 50
20205 3.07901 3.63219 4.4323 4.26111 5.116 6.70591 1.00551 46
;
*And here is the partial results of "history" Data Set 1:
;
data h1 ;
input id var1-var8 ;*
Combo v50508L v505081 v50508X v50508M v505083 v50508H v50508S v50508N
;cards;
20201 9.7643 10.273 10.6535 10.6346 11.1293 11.4314 0.504621788 34
20202 9.8003 10.7847 11.3163 11.5175 11.845 12.4824 0.707112726 52
20203 9.795 11.0608 12.3718 12.4882 13.4592 15.1603 1.424060548 59
20204 10.147 11.3915 12.8449 13.014 14.4063 15.522 1.702073355 57
20205 9.752 9.9268 10.1919 10.1796 10.3164 10.8573 0.309555992 25
;
proc compare base= h0 compare= h1 outall outnoequal
out= compares noprint ;
id id ;
run;
proc print label ;
title compare history ;
id id _obs_ _type_ ;
var var: ;
run ;
I've attached the "results viewer" window just for the proc print
Do you want OUTNOEQUAL? Seems like you want all rows even if the diff for all variables is zero.
of course
thank you for setting that right
I'm just so used to using that proc for differences that I forget others have other uses!
(not much of an excuse)
Hi Peter,
below is what I got by running your code. It is very nice. Thank you!
data time0;
input id $ var1-var5;
cards;
aa 5 4 3 2 1
bb 6 7 8 9 10
;
data time1;
input id $ var1-var5;
cards;
aa 5 9 3 2 1
bb 8 7 8 9 11
;
proc sort data=time0;by id;run;
proc sort data=time1;by id;run;
proc compare base= time0 compare= time1 outall
out= compares noprint;
id id ;
run ;
proc print data=compares label ;
title time0 time1 ;
id id _obs_ _type_ ;
var var: ;
run ;
Observation Type of
id Number Observation var1 var2 var3 var4 var5
aa 1 BASE 5.0000 4 3 2 1
aa 1 COMPARE 5.0000 9 3 2 1
aa 1 DIF 0.0000 5 0 0 0
aa 1 PERCENT 0.0000 125 0 0 0
bb 2 BASE 6.0000 7 8 9 10
bb 2 COMPARE 8.0000 7 8 9 11
bb 2 DIF 2.0000 0 0 0 1
bb 2 PERCENT 33.3333 0 0 0 10
Thanks for the input, folks. Introduced me to some new ideas.
The way I handled the problem was to use "Merge". I merged the different data sets into one data set, using "By" the first column. Then subtracted one variable (column) from another, as needed. Perhaps the most basic of approaches. I'll look into, though, your more sophisticated ways.
Thanks again.
I had a similar problem in the previous decade and wrote this General Solution:
http://www.sascommunity.org/wiki/Macro_Extract
this routine uses a merge to find the adds and deletes
and proc compare to find the changes
Ron Fehd add/change/delete maven
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.