Calculating percentages and not including duplicates?

Reply
Contributor
Posts: 21

Calculating percentages and not including duplicates?

I have the following data

ID          Year          Code

G7          2007          0

G8          2007          0

G9          2007          0

G10         2007          T

G10         2007          T

G6          2007          0

G19        2007          A

H1          2008          0

H6          2008          0

G7          2008          T

G7          2008          T

H5          2008          T

H5          2008          T

H2          2008          0

H7          2008          0

H3          2008          A

G9          2008          A 

         

I wish to calculate the following:

The percentage of "A" codes based on the number of unique IDs for each year. For example, for 2007 the percentage "A" would be 1/6 or 16.7% and for 2008 the percentage "A" would be 2/8 or 25%. I only wish to count each ID once. Alternatively, I would also like to calculate the percentage of individuals that are not "A", still counting each ID only once.

Thanks.

Respected Advisor
Posts: 3,124

Re: Calculating percentages and not including duplicates?

Hi,

SQL seems direct, while may not be the most efficient way:

data have;

input (ID          Year          Code) (:$8.);

cards;

G7          2007          0

G8          2007          0

G9          2007          0

G10         2007          T

G10         2007          T

G6          2007          0

G19        2007          A

H1          2008          0

H6          2008          0

G7          2008          T

G7          2008          T

H5          2008          T

H5          2008          T

H2          2008          0

H7          2008          0

H3          2008          A

G9          2008          A 

;

proc sql;

select year, sum(code='A')/count(distinct id) as A_percent, 1- calculated A_percent as NON_A_percent

       from have group by year;

quit;

Haikuo

Super User
Posts: 5,083

Re: Calculating percentages and not including duplicates?

Evidently, the same ID can appear more than once per year.  What should happen if both records have an A?  (Does that count as one "A" or two?)  What should happen if one has an A and the other doesn't?

The programming isn't that difficult ... it's thinking through the possibilities and deciding on the rules to follow.

Good luck.

Respected Advisor
Posts: 3,124

Re: Calculating percentages and not including duplicates?

Hi,

Something is not right on your figures. For example the first one: 133%? There are only 4'0's, out of distinct 6'ID's, so it should be 2/3, 66.7% if you will.

If that is kind of result OP is seeking, SQL can also deliver as:

proc sql;

select a.year, code, n/d as percent from

(select year, code, count(id) as N from have group by year, code) a,

(select year, count(distinct id) as D from have group by year) b

where a.year=b.year;

quit;

Year

Code

percent

2007

0

0.666667

2007

A

0.166667

2007

T

0.333333

2008

0

0.5

2008

A

0.25

2008

T

0.5

Haikuo

Ask a Question
Discussion stats
  • 3 replies
  • 223 views
  • 3 likes
  • 3 in conversation