turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Comparison of groups within groups

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-09-2017 11:49 PM

I have the following dataset.

id1 | id2 | date | value |

1 | A | 2014 | 0.5 |

1 | A | 2014 | 0.7 |

1 | B | 2014 | 0.8 |

1 | C | 2014 | 0.1 |

1 | C | 2014 | 1.1 |

1 | D | 2014 | 1.2 |

1 | A | 2015 | 1.3 |

1 | A | 2015 | 2.4 |

1 | C | 2015 | 5.6 |

1 | D | 2015 | 0.4 |

1 | E | 2015 | 0.7 |

1 | F | 2015 | 0.3 |

1 | C | 2016 | 1.3 |

1 | D | 2016 | 2.5 |

1 | D | 2016 | 2 |

1 | F | 2016 | 4.1 |

1 | G | 2016 | 3.1 |

1 | H | 2016 | 2.1 |

2 | B | 2014 | 5.4 |

I want to compare the distinct id2s by each id1 and year group. For instance, I want to compare id2s where id1=1 and year=2014 with id2s where id1=1 and year=2015. So, in the first case(id1=1 and year=2014), the distinct values of id2 are A, B, C, D and for the second case (id1=1 and year=2015) distinct values are A, C, D, E. I can co this by using first.id1, first.year etc. After this step the dataset will look like this:

id1 | id2 | date | value |

1 | A | 2014 | 0.5 |

1 | B | 2014 | 0.8 |

1 | C | 2014 | 0.1 |

1 | D | 2014 | 1.2 |

1 | A | 2015 | 1.3 |

1 | C | 2015 | 5.6 |

1 | D | 2015 | 0.4 |

1 | E | 2015 | 0.7 |

1 | F | 2015 | 0.3 |

1 | C | 2016 | 1.3 |

1 | D | 2016 | 2.5 |

1 | F | 2016 | 4.1 |

1 | G | 2016 | 3.1 |

1 | H | 2016 | 2.1 |

Now, I want to compare these three groups and get the id2 that exist in first case but not exist in the second case (i.e. B exists in first id1xyear group but not in second id1xyear group). In the end, the measure I want will divide the total number of distinct id2s in first id1xyear with total number of nonexisting id2s in the second id1xyear (1/4 in the first comparison from 2014 to 2015).

It will repeat the same procedure with id1=2 and so on.

id1 | date | measure |

1 | 2014 | . |

1 | 2015 | 1/4 |

1 | 2016 | 2/5 |

2 | 2014 |

I hope I was able to convey what I need. I would appreciate any help. Thanks.

Accepted Solutions

Solution

11-13-2017
12:46 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to lezgin

11-13-2017 09:19 AM

The following code is the fixed my original code. Sorry .

```
/*******For measure1*********/
data want;
array x{9999} $ 32 _temporary_;
m=0;
do until(last.date);
set have;
by id1 date;
if id2 in x then m+1;
end;
want=divide(k-m,k);
if id1 ne lag(id1) then call missing(want);
output;
call missing(of x{*}); /* <--*/
k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;
run;
```

```
/******For measure2***********/
data want;
array x{9999} $ 32 _temporary_;
m=0;
do until(last.date);
set have;
by id1 date;
if id2 not in x then m+1; /*********/
end;
want=divide(m,k); /*********/
if id1 ne lag(id1) then call missing(want);
output;
call missing(of x{*}); /*<---*/
k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to lezgin

11-10-2017 08:46 AM

I think it should be

id1 | date | measure |

1 | 2014 | . |

1 | 2015 | 1/5 |

Use your second dataset.

```
data have;
infile cards expandtabs ;
input id1 id2 $ date value;
cards;
1 A 2014 0.5
1 B 2014 0.8
1 C 2014 0.1
1 D 2014 1.2
1 A 2015 1.3
1 C 2015 5.6
1 D 2015 0.4
1 E 2015 0.7
1 F 2015 0.3
1 C 2016 1.3
1 D 2016 2.5
1 F 2016 4.1
1 G 2016 3.1
1 H 2016 2.1
;
run;
data want;
array x{9999} $ 32 _temporary_;
m=0;
do n=1 by 1 until(last.date);
set have;
by id1 date;
if id2 in x then m+1;
end;
want=divide(k-m,n);
if id1 ne lag(id1) then call missing(want);
output;
k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

11-10-2017 01:47 PM

Thank you very much. I appreciate your help but I think it is not giving what I am looking for. From 2014 to 2015 A,B,C,D becomes A,C,D,E,F. From 2015 to 2016 A,C,D,E,F becomes C,D,F,G,H.

So what I need is from 2014 to 2015: identify that B is missing in 2015 and divide this (1 missing) with all four (A,B,C,D) in 2014=1/4

From 2015 to 2016: identify that A and E is missing (2 missing). Divide 2 by what I have for 2015 (A,C,D,E,F) by 5= 2/5.

It should repeat this by id1. When next id1 comes the code should compare next first id1xdate group with the next date of same id1. I hope this is more clear now. Again, thank you very much.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to lezgin

11-11-2017 05:27 AM

OK. I get confused with denominator.

```
data have;
infile cards expandtabs ;
input id1 id2 $ date value;
cards;
1 A 2014 0.5
1 B 2014 0.8
1 C 2014 0.1
1 D 2014 1.2
1 A 2015 1.3
1 C 2015 5.6
1 D 2015 0.4
1 E 2015 0.7
1 F 2015 0.3
1 C 2016 1.3
1 D 2016 2.5
1 F 2016 4.1
1 G 2016 3.1
1 H 2016 2.1
;
run;
data want;
array x{9999} $ 32 _temporary_;
m=0;
do until(last.date);
set have;
by id1 date;
if id2 in x then m+1;
end;
want=divide(k-m,k);
if id1 ne lag(id1) then call missing(want);
output;
k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

11-11-2017 10:01 PM

Thanks for your patience. I understand if you don't have time for this. You have already been very helpful. I appreciate that. I realized that I need to decompose the measure into two parts. Let me give you a new example of each measure. There are only subtle changes.

Measure1: Suppose we have A,B,C,D in id1xdate1 and A,B,C,E in id1xdate2 and C,D,E,F in id1xdate3. Measure1 finds that D from id1xdate1 is missing in id1xdate2 and divides D (1) by A,B,C,D (4) of id1xdate1 and returns 1/4. Similarly, from id1xdate2 to id1xdate3, this measure finds that A,B are missing in id1xdate2 and divides (2) by A,B,C,E (4) and returns 2/4. **This measure makes the second comparison between id2s of id1xdate2 and id1xdate3. The change from id1xdate2 to id1xdate3 matters now.**

Measure 2: Again, suppose we have A,B,C,D in id1xdate1 and A,B,C,E in id1xdate2 and C,D,E,F in id1xdate3. Measure2 finds that E is not present in id1xdate1 but is present in id1xdate2. This time measure2 divides E (1) by A,B,C,D (4) in id1xdate1 and return 1/4. From id1xdate2 to id1xdate3, the code finds that D,F are not present in id1xdate2 but are present in id1xdate3. Measure divides D,F (2) by A,B,C,E (4) and return 2/4. **Similarly, this measure makes the second comparison between id2s of id1xdate2 and id1xdate3. The change from id1xdate2 to id1xdate3 matters now. **

Dataset for this example is below. I tried to modify your code to separate the two measures but I couldn't get the code to make the comparison between only two id1xdates. Not from the beginning.

id1 | id2 | date | value |

1 | A | 20140530 | 0.5 |

1 | B | 20140530 | 0.8 |

1 | C | 20140530 | 0.1 |

1 | D | 20140530 | 1.2 |

1 | A | 20140831 | 1.3 |

1 | B | 20140831 | 5.6 |

1 | C | 20140831 | 0.4 |

1 | E | 20140831 | 0.7 |

1 | C | 20141130 | 0.3 |

1 | D | 20141130 | 1.3 |

1 | E | 20141130 | 2.5 |

1 | F | 20141130 | 4.1 |

2 | A | 19990130 | 3 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to lezgin

11-12-2017 06:00 AM

OK. the following code should work for your second measure.

the code I have already posted is for your first measure.

Note: use your second table in your first post .

```
data have;
infile cards expandtabs ;
input id1 id2 $ date value;
cards;
1 A 20140530 0.5
1 B 20140530 0.8
1 C 20140530 0.1
1 D 20140530 1.2
1 A 20140831 1.3
1 B 20140831 5.6
1 C 20140831 0.4
1 E 20140831 0.7
1 C 20141130 0.3
1 D 20141130 1.3
1 E 20141130 2.5
1 F 20141130 4.1
2 A 19990130 3
;
run;
data want;
array x{9999} $ 32 _temporary_;
m=0;
do until(last.date);
set have;
by id1 date;
if id2 not in x then m+1; /*********/
end;
want=divide(m,k); /*********/
if id1 ne lag(id1) then call missing(want);
output;
k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

11-12-2017 01:01 PM

In the first sheet of attached excel file I included a portion of the data. The first measure mostly works fine but sometimes returns negative values where it should return 0. The second measure does not work properly and I wasn't able to detect the problem. In the second sheet of the attached excel file I included the dateset transposed to show you how it should work. I included what the measure should return and what it returns when you use the dataset in the first sheet. The discrepancies are in red. Thank you very much.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to lezgin

11-13-2017 09:07 AM

I feel very Sorry. My bad. Add CALL MISSING(of x{*});

```
call missing(of x{*}) ;
k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to lezgin

11-13-2017 09:13 AM

Hi. I rewrite it by Hash Table. Hope you like it.

```
/*****For measure1 *******/
proc import datafile='/folders/myfolders/sascommunity.xlsx' out=have dbms=xlsx replace;
run;
proc sort data=have;
by id1 date;
run;
data want;
if _n_=1 then do;
if 0 then set have;
declare hash h1();
h1.definekey('id2');
h1.definedone();
end;
n=h1.num_items;
do until(last.date);
set have;
by id1 date;
rc=h1.remove();
end;
m=h1.num_items;
want=divide(m,n);
if id1 ne lag(id1) then call missing(want);
output;
h1.clear();
do until(last.date);
set have;
by id1 date;
h1.ref();
end;
keep id1 id2 date want m n;
run;
```

```
/******For measure2 ***********/
proc import datafile='/folders/myfolders/sascommunity.xlsx' out=have dbms=xlsx replace;
run;
proc sort data=have;
by id1 date;
run;
data want;
if _n_=1 then do;
if 0 then set have;
declare hash h1();
h1.definekey('id2');
h1.definedone();
end;
m=0;
n=h1.num_items;
do until(last.date);
set have;
by id1 date;
if h1.check() ne 0 then m+1;
end;
want=divide(m,n);
if id1 ne lag(id1) then call missing(want);
output;
h1.clear();
do until(last.date);
set have;
by id1 date;
h1.ref();
end;
keep id1 id2 date want m n;
run;
```

Solution

11-13-2017
12:46 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to lezgin

11-13-2017 09:19 AM

The following code is the fixed my original code. Sorry .

```
/*******For measure1*********/
data want;
array x{9999} $ 32 _temporary_;
m=0;
do until(last.date);
set have;
by id1 date;
if id2 in x then m+1;
end;
want=divide(k-m,k);
if id1 ne lag(id1) then call missing(want);
output;
call missing(of x{*}); /* <--*/
k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;
run;
```

```
/******For measure2***********/
data want;
array x{9999} $ 32 _temporary_;
m=0;
do until(last.date);
set have;
by id1 date;
if id2 not in x then m+1; /*********/
end;
want=divide(m,k); /*********/
if id1 ne lag(id1) then call missing(want);
output;
call missing(of x{*}); /*<---*/
k=0;
do until(last.date);
set have;
by id1 date;
k+1;x{k}=id2;
end;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

11-13-2017 12:47 PM

This has been very helpful. Thank you very much for your kind and quick replies.