## Comparison of groups within groups

Solved
Occasional Contributor
Posts: 15

# Comparison of groups within groups

I have the following dataset.

 id1 id2 date value 1 A 2014 0.5 1 A 2014 0.7 1 B 2014 0.8 1 C 2014 0.1 1 C 2014 1.1 1 D 2014 1.2 1 A 2015 1.3 1 A 2015 2.4 1 C 2015 5.6 1 D 2015 0.4 1 E 2015 0.7 1 F 2015 0.3 1 C 2016 1.3 1 D 2016 2.5 1 D 2016 2 1 F 2016 4.1 1 G 2016 3.1 1 H 2016 2.1 2 B 2014 5.4

I want to compare the distinct id2s by each id1 and year group. For instance, I want to compare id2s where id1=1 and year=2014 with id2s where id1=1 and year=2015. So, in the first case(id1=1 and year=2014), the distinct values of id2 are A, B, C, D and for the second case (id1=1 and year=2015) distinct values are A, C, D, E. I can co this by using first.id1, first.year etc. After this step the dataset will look like this:

 id1 id2 date value 1 A 2014 0.5 1 B 2014 0.8 1 C 2014 0.1 1 D 2014 1.2 1 A 2015 1.3 1 C 2015 5.6 1 D 2015 0.4 1 E 2015 0.7 1 F 2015 0.3 1 C 2016 1.3 1 D 2016 2.5 1 F 2016 4.1 1 G 2016 3.1 1 H 2016 2.1

Now, I want to compare these three groups and get the id2 that exist in first case but not exist in the second case (i.e. B exists in first id1xyear group but not in second id1xyear group). In the end, the measure I want will divide the total number of distinct id2s in first id1xyear with total number of nonexisting id2s in the second id1xyear (1/4 in the first comparison from 2014 to 2015).

It will repeat the same procedure with id1=2 and so on.

 id1 date measure 1 2014 . 1 2015 1/4 1 2016 2/5 2 2014

I hope I was able to convey what I need. I would appreciate any help. Thanks.

Accepted Solutions
Solution
‎11-13-2017 12:46 PM
Super User
Posts: 10,689

## Re: Comparison of groups within groups

The following code is the fixed my original code. Sorry .

``````/*******For measure1*********/
data want;
array x{9999} \$ 32 _temporary_;
m=0;
do until(last.date);
set have;
by id1 date;
if id2 in x then m+1;
end;
want=divide(k-m,k);
if id1 ne lag(id1) then call missing(want);
output;

call missing(of x{*}); /* <--*/
k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;
run;``````

``````/******For measure2***********/

data want;
array x{9999} \$ 32 _temporary_;
m=0;
do until(last.date);
set have;
by id1 date;
if id2 not in x then m+1; /*********/
end;
want=divide(m,k);  /*********/
if id1 ne lag(id1) then call missing(want);
output;

call missing(of x{*}); /*<---*/
k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;``````

All Replies
Super User
Posts: 10,689

## Re: Comparison of groups within groups

I think it should be

 id1 date measure 1 2014 . 1 2015 1/5

``````data have;
infile cards expandtabs ;
input id1	id2	\$ date	value;
cards;
1	A	2014	0.5
1	B	2014	0.8
1	C	2014	0.1
1	D	2014	1.2
1	A	2015	1.3
1	C	2015	5.6
1	D	2015	0.4
1	E	2015	0.7
1	F	2015	0.3
1	C	2016	1.3
1	D	2016	2.5
1	F	2016	4.1
1	G	2016	3.1
1	H	2016	2.1
;
run;
data want;
array x{9999} \$ 32 _temporary_;
m=0;
do n=1 by 1 until(last.date);
set have;
by id1 date;
if id2 in x then m+1;
end;
want=divide(k-m,n);
if id1 ne lag(id1) then call missing(want);
output;

k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;
run;``````
Occasional Contributor
Posts: 15

## Re: Comparison of groups within groups

Thank you very much. I appreciate your help but I think it is not giving what I am looking for. From 2014 to 2015 A,B,C,D becomes A,C,D,E,F. From 2015 to 2016 A,C,D,E,F becomes C,D,F,G,H.

So what I need is from 2014 to 2015: identify that B is missing in 2015 and divide this (1 missing) with all four (A,B,C,D) in 2014=1/4

From 2015 to 2016: identify that A and E is missing (2 missing). Divide 2 by what I have for 2015 (A,C,D,E,F) by 5= 2/5.

It should repeat this by id1. When next id1 comes the code should compare next first id1xdate group with the next date of same id1. I hope this is more clear now. Again, thank you very much.

Super User
Posts: 10,689

## Re: Comparison of groups within groups

OK. I get confused with denominator.

``````data have;
infile cards expandtabs ;
input id1	id2	\$ date	value;
cards;
1	A	2014	0.5
1	B	2014	0.8
1	C	2014	0.1
1	D	2014	1.2
1	A	2015	1.3
1	C	2015	5.6
1	D	2015	0.4
1	E	2015	0.7
1	F	2015	0.3
1	C	2016	1.3
1	D	2016	2.5
1	F	2016	4.1
1	G	2016	3.1
1	H	2016	2.1
;
run;
data want;
array x{9999} \$ 32 _temporary_;
m=0;
do until(last.date);
set have;
by id1 date;
if id2 in x then m+1;
end;
want=divide(k-m,k);
if id1 ne lag(id1) then call missing(want);
output;

k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;
run;``````
Occasional Contributor
Posts: 15

## Re: Comparison of groups within groups

Thanks for your patience. I understand if you don't have time for this. You have already been very helpful. I appreciate that. I realized that I need to decompose the measure into two parts. Let me give you a new example of each measure. There are only subtle changes.

Measure1: Suppose we have A,B,C,D in id1xdate1 and A,B,C,E in id1xdate2 and C,D,E,F in id1xdate3. Measure1 finds that D from id1xdate1 is missing in id1xdate2 and divides D (1) by A,B,C,D (4) of id1xdate1 and returns 1/4. Similarly, from id1xdate2 to id1xdate3, this measure finds that A,B are missing in id1xdate2 and divides (2) by A,B,C,E (4) and returns 2/4. This measure makes the second comparison between id2s of id1xdate2 and id1xdate3. The change from id1xdate2 to id1xdate3 matters now.

Measure 2: Again, suppose we have A,B,C,D in id1xdate1 and A,B,C,E in id1xdate2 and C,D,E,F in id1xdate3. Measure2 finds that E is not present in id1xdate1 but is present in id1xdate2. This time measure2 divides E (1) by A,B,C,D (4) in id1xdate1 and return 1/4. From id1xdate2 to id1xdate3, the code finds that D,F are not present in id1xdate2 but are present in id1xdate3. Measure divides D,F (2) by A,B,C,E (4) and return 2/4. Similarly, this measure makes the second comparison between id2s of id1xdate2 and id1xdate3. The change from id1xdate2 to id1xdate3 matters now.

Dataset for this example is below. I tried to modify your code to separate the two measures but I couldn't get the code to make the comparison between only two id1xdates. Not from the beginning.

 id1 id2 date value 1 A 20140530 0.5 1 B 20140530 0.8 1 C 20140530 0.1 1 D 20140530 1.2 1 A 20140831 1.3 1 B 20140831 5.6 1 C 20140831 0.4 1 E 20140831 0.7 1 C 20141130 0.3 1 D 20141130 1.3 1 E 20141130 2.5 1 F 20141130 4.1 2 A 19990130 3
Super User
Posts: 10,689

## Re: Comparison of groups within groups

OK. the following code should work for your second measure.

``````data have;
infile cards expandtabs ;
input id1	id2	\$ date	value;
cards;
1	A	20140530	0.5
1	B	20140530	0.8
1	C	20140530	0.1
1	D	20140530	1.2
1	A	20140831	1.3
1	B	20140831	5.6
1	C	20140831	0.4
1	E	20140831	0.7
1	C	20141130	0.3
1	D	20141130	1.3
1	E	20141130	2.5
1	F	20141130	4.1
2	A	19990130	3
;
run;
data want;
array x{9999} \$ 32 _temporary_;
m=0;
do until(last.date);
set have;
by id1 date;
if id2 not in x then m+1; /*********/
end;
want=divide(m,k);  /*********/
if id1 ne lag(id1) then call missing(want);
output;

k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;
run;``````

Occasional Contributor
Posts: 15

## Re: Comparison of groups within groups

In the first sheet of attached excel file I included a portion of the data. The first measure mostly works fine but sometimes returns negative values where it should return 0. The second measure does not work properly and I wasn't able to detect the problem. In the second sheet of the attached excel file I included the dateset transposed to show you how it should work. I included what the measure should return and what it returns when you use the dataset in the first sheet. The discrepancies are in red. Thank you very much.

Super User
Posts: 10,689

## Re: Comparison of groups within groups

``````call missing(of x{*}) ;
k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;
run;``````
Super User
Posts: 10,689

## Re: Comparison of groups within groups

Hi. I rewrite it by Hash Table. Hope you like it.

``````/*****For measure1 *******/
proc import datafile='/folders/myfolders/sascommunity.xlsx' out=have dbms=xlsx replace;
run;

proc sort data=have;
by id1 date;
run;

data want;
if _n_=1 then do;
if 0 then set have;
declare hash h1();
h1.definekey('id2');
h1.definedone();
end;

n=h1.num_items;
do until(last.date);
set have;
by id1 date;
rc=h1.remove();
end;
m=h1.num_items;
want=divide(m,n);
if id1 ne lag(id1) then call missing(want);
output;
h1.clear();
do until(last.date);
set have;
by id1 date;
h1.ref();
end;
keep id1 id2 date want m n;
run;``````

``````/******For measure2 ***********/
proc import datafile='/folders/myfolders/sascommunity.xlsx' out=have dbms=xlsx replace;
run;

proc sort data=have;
by id1 date;
run;

data want;
if _n_=1 then do;
if 0 then set have;
declare hash h1();
h1.definekey('id2');
h1.definedone();
end;

m=0;
n=h1.num_items;
do until(last.date);
set have;
by id1 date;
if h1.check() ne 0 then m+1;
end;
want=divide(m,n);
if id1 ne lag(id1) then call missing(want);
output;
h1.clear();
do until(last.date);
set have;
by id1 date;
h1.ref();
end;
keep id1 id2 date want m n;
run;``````
Solution
‎11-13-2017 12:46 PM
Super User
Posts: 10,689

## Re: Comparison of groups within groups

The following code is the fixed my original code. Sorry .

``````/*******For measure1*********/
data want;
array x{9999} \$ 32 _temporary_;
m=0;
do until(last.date);
set have;
by id1 date;
if id2 in x then m+1;
end;
want=divide(k-m,k);
if id1 ne lag(id1) then call missing(want);
output;

call missing(of x{*}); /* <--*/
k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;
run;``````

``````/******For measure2***********/

data want;
array x{9999} \$ 32 _temporary_;
m=0;
do until(last.date);
set have;
by id1 date;
if id2 not in x then m+1; /*********/
end;
want=divide(m,k);  /*********/
if id1 ne lag(id1) then call missing(want);
output;

call missing(of x{*}); /*<---*/
k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;``````
Occasional Contributor
Posts: 15

## Re: Comparison of groups within groups

This has been very helpful. Thank you very much for your kind and quick replies.

☑ This topic is solved.