10-02-2012 01:16 PM
I have the following data
ID Year Code
G7 2007 0
G8 2007 0
G9 2007 0
G10 2007 T
G10 2007 T
G6 2007 0
G19 2007 A
H1 2008 0
H6 2008 0
G7 2008 T
G7 2008 T
H5 2008 T
H5 2008 T
H2 2008 0
H7 2008 0
H3 2008 A
G9 2008 A
I wish to calculate the following:
The percentage of "A" codes based on the number of unique IDs for each year. For example, for 2007 the percentage "A" would be 1/6 or 16.7% and for 2008 the percentage "A" would be 2/8 or 25%. I only wish to count each ID once. Alternatively, I would also like to calculate the percentage of individuals that are not "A", still counting each ID only once.
Thanks.
10-02-2012 01:53 PM
Hi,
SQL seems direct, while may not be the most efficient way:
data have;
input (ID Year Code) (:$8.);
cards;
G7 2007 0
G8 2007 0
G9 2007 0
G10 2007 T
G10 2007 T
G6 2007 0
G19 2007 A
H1 2008 0
H6 2008 0
G7 2008 T
G7 2008 T
H5 2008 T
H5 2008 T
H2 2008 0
H7 2008 0
H3 2008 A
G9 2008 A
;
proc sql;
select year, sum(code='A')/count(distinct id) as A_percent, 1- calculated A_percent as NON_A_percent
from have group by year;
quit;
Haikuo
10-02-2012 01:59 PM
Evidently, the same ID can appear more than once per year. What should happen if both records have an A? (Does that count as one "A" or two?) What should happen if one has an A and the other doesn't?
The programming isn't that difficult ... it's thinking through the possibilities and deciding on the rules to follow.
Good luck.
10-02-2012 03:51 PM
Hi,
Something is not right on your figures. For example the first one: 133%? There are only 4'0's, out of distinct 6'ID's, so it should be 2/3, 66.7% if you will.
If that is kind of result OP is seeking, SQL can also deliver as:
proc sql;
select a.year, code, n/d as percent from
(select year, code, count(id) as N from have group by year, code) a,
(select year, count(distinct id) as D from have group by year) b
where a.year=b.year;
quit;
Year | Code | percent |
2007 | 0 | 0.666667 |
2007 | A | 0.166667 |
2007 | T | 0.333333 |
2008 | 0 | 0.5 |
2008 | A | 0.25 |
2008 | T | 0.5 |
Haikuo