BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NKormanik
Barite | Level 11

Given:  Two sets of box plots.  Suppose one represents Time 0.  The second one represents Time 1.

(As a for-instance, perhaps the data are from population surveys, something like Likert scales, prior to a large marketing campaign, and given again following the marketing campaign.)

Through SAS Boxplot we are able to create a "history" dataset, which holds the key information essential to the box plots.

Thus, here is the partial results of "history" Data Set 0:

Combov50501Lv505011v50501Xv50501Mv505013v50501Hv50501Sv50501N
202013.079413.164023.432573.285783.561204.574730.4161116
202023.086433.211013.853223.889254.472174.664910.5659529
202033.092753.930115.042954.882786.104758.106041.3589151
202043.095883.596884.410274.269295.075486.182510.8904650
202053.079013.632194.432304.261115.116006.705911.0055146

And here is the partial results of "history" Data Set 1:

Combov50508Lv505081v50508Xv50508Mv505083v50508Hv50508Sv50508N
202019.764310.273010.653510.634611.129311.43140.504621788334
202029.800310.784711.316311.517511.845012.48240.707112726352
202039.795011.060812.371812.488213.459215.16031.424060547959
2020410.147011.391512.844913.014014.406315.52201.702073355257
202059.75209.926810.191910.179610.316410.85730.309555992125

What I'd like to do is get the difference between Data Set 0 and Data Set 1.  Plain and simple.  Cell for cell.

(Never mind that some cells may be meaningless, i.e., the last column, difference in N.)

For instance....

Combo 20201, v50508L = 9.7643

and

Combo 20201, v50501L = 3.07941

Difference = 6.68489

And on, and on.

Can anyone think of a way to do this directly in SAS?  If not in SAS, then in any other program?

Thanks much!

Nicholas Kormanik

.

1 ACCEPTED SOLUTION

Accepted Solutions
NKormanik
Barite | Level 11

Thanks for the input, folks.  Introduced me to some new ideas.

The way I handled the problem was to use "Merge".  I merged the different data sets into one data set, using "By" the first column.  Then subtracted one variable (column) from another, as needed.  Perhaps the most basic of approaches.  I'll look into, though, your more sophisticated ways.

Thanks again.

View solution in original post

8 REPLIES 8
Peter_C
Rhodochrosite | Level 12

have you looked at PROC COMPARE?

Linlin
Lapis Lazuli | Level 10

sample code:

data time0;

input id $ var1-var5;

cards;

aa 5 4 3 2 1

bb 6 7 8 9 10

;

data time1;

input id $ var1-var5;

cards;

aa 5 9 3 2 1

bb 8 7 8 9 11

;

proc sort data=time0;by id;run;

proc sort data=time1;by id;run;

%let lib=work;

%let dsn=time0;

/* put your variables in a macro variable */

proc sql noprint;

  select catx('=',name,cats(name,'_t1')) into : renames separated by '  '

    from dictionary.columns

  where libname="%upcase(&lib)" and memname="%upcase(&dsn)" and type='num' and lowcase(name) not in ('id')/* put

  the variables you don't want to be calculated here */;

select cats('diff_',name,'=',name,cats('-',name,'_t1')) into : diff separated by '; '

    from dictionary.columns

  where libname="%upcase(&lib)" and memname="%upcase(&dsn)" and type='num' and lowcase(name) not in ('id');

quit;

/* rename your vareiables in dataset time1 */

proc datasets lib=work nolist;

  modify time1;

  rename &renames;

  quit;

  data want;

   merge time0(in=a) time1(in=b);

   by id;

   if a and b;

   &diff;

run;

proc print;run;

                                                 d    d   d   d    d

                                                          i    i   i   i    i

                                    v   v   v   v    v    f    f   f   f    f

                                    a   a   a   a    a    f    f   f   f    f

                                    r   r   r   r    r    _    _   _   _    _

               v   v   v   v    v   1   2   3   4    5    v    v   v   v    v

      O        a   a   a   a    a   _   _   _   _    _    a    a   a   a    a

      b   i    r   r   r   r    r   t   t   t   t    t    r    r   r   r    r

      s   d    1   2   3   4    5   1   1   1   1    1    1    2   3   4    5

      1   aa   5   4   3   2    1   5   9   3   2    1    0   -5   0   0    0

      2   bb   6   7   8   9   10   8   7   8   9   11   -2    0   0   0   -1

Peter_C
Rhodochrosite | Level 12

LinLin

with your datasets try :

proc compare base= time0 compare= time1 outall outnoequal

  out= compares ;

id id ;

run ;

it looks like your ID values don't match, so no DIF rows will apear.

When IDs match, extra obs are created with _TYPE_ = DIF and PERCENT

I used this code

data h0 ;

input id var1-var8 ; *

Combo v50501L v505011 v50501X v50501M v505013 v50501H v50501S v50501N

;cards;

20201 3.07941 3.16402 3.43257 3.28578 3.5612 4.57473 0.41611 16

20202 3.08643 3.21101 3.85322 3.88925 4.47217 4.66491 0.56595 29

20203 3.09275 3.93011 5.04295 4.88278 6.10475 8.10604 1.35891 51

20204 3.09588 3.59688 4.41027 4.26929 5.07548 6.18251 0.89046 50

20205 3.07901 3.63219 4.4323 4.26111 5.116 6.70591 1.00551 46

;

*And here is the partial results of "history" Data Set 1:       

;

data h1 ;

input id var1-var8 ;*

Combo v50508L v505081 v50508X v50508M v505083 v50508H v50508S v50508N

;cards;

20201 9.7643 10.273 10.6535 10.6346 11.1293 11.4314 0.504621788 34

20202 9.8003 10.7847 11.3163 11.5175 11.845 12.4824 0.707112726 52

20203 9.795 11.0608 12.3718 12.4882 13.4592 15.1603 1.424060548 59

20204 10.147 11.3915 12.8449 13.014 14.4063 15.522 1.702073355 57

20205 9.752 9.9268 10.1919 10.1796 10.3164 10.8573 0.309555992 25

;

proc compare base= h0 compare= h1 outall outnoequal

out= compares  noprint ;

id id ;

run;

proc print label ;

title compare history ;

id id _obs_ _type_ ;

var var: ;

run ;

I've attached the "results viewer" window just for the proc print

data_null__
Jade | Level 19

Do you want OUTNOEQUAL?  Seems like you want all rows even if the diff for all variables is zero.

Peter_C
Rhodochrosite | Level 12

of course

thank you for setting that right

I'm just so used to using that proc for differences that I forget others have other uses!

(not much of an excuse)

Linlin
Lapis Lazuli | Level 10

Hi Peter,

below is what I got by running your code. It is very nice. Thank you!

data time0;

input id $ var1-var5;

cards;

aa 5 4 3 2 1

bb 6 7 8 9 10

;

data time1;

input id $ var1-var5;

cards;

aa 5 9 3 2 1

bb 8 7 8 9 11

;

proc sort data=time0;by id;run;

proc sort data=time1;by id;run;

proc compare base= time0 compare= time1 outall

  out= compares noprint;

id id ;

run ;

proc print data=compares label ;

title time0 time1 ;

id id _obs_ _type_ ;

var var: ;

run ;

          Observation      Type of

    id       Number      Observation      var1     var2    var3    var4    var5

    aa         1           BASE              5.0000       4      3       2       1

    aa         1           COMPARE       5.0000       9      3       2       1

    aa         1           DIF                 0.0000       5      0       0       0

    aa         1           PERCENT        0.0000     125    0       0       0

    bb         2           BASE              6.0000       7      8       9      10

    bb         2           COMPARE       8.0000       7      8       9      11

    bb         2           DIF                 2.0000       0      0       0       1

    bb         2           PERCENT      33.3333       0      0       0      10

NKormanik
Barite | Level 11

Thanks for the input, folks.  Introduced me to some new ideas.

The way I handled the problem was to use "Merge".  I merged the different data sets into one data set, using "By" the first column.  Then subtracted one variable (column) from another, as needed.  Perhaps the most basic of approaches.  I'll look into, though, your more sophisticated ways.

Thanks again.

Ron_MacroMaven
Lapis Lazuli | Level 10

I had a similar problem in the previous decade and wrote this General Solution:

http://www.sascommunity.org/wiki/Macro_Extract

this routine uses a merge to find the adds and deletes

and proc compare to find the changes

Ron Fehd  add/change/delete maven

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 5230 views
  • 7 likes
  • 5 in conversation