Solved: Re: Calculate distinct observations in all population

Jedrzej · Posted 02-17-2021 04:56 AM

Hi, i have a dataset;

ID	VAR
1	A
1	B
2	B
3	A
3	C
4

and I want to calculate the percentege of distinct value in var a for all distinct ID's:

for example: population is 4 (distinct ID), occurance of var "B" is 2 so the result is 50%

thank you

example output:

so as I wrote earlier the entire population=distinct ID, no matter if there are blanks in var. The output is to show percentage of occurance of var in the entire population, like this:

VAR	Percentage of occurance var/ entire poplulation (distinct ID)
A	50%
B	50%
C	25%

Ksharp · Posted 02-17-2021 06:50 AM

data have;
input ID	VAR $;
cards;
1	A
1	B
2	B
3	A
3	C
4	 .
;
proc sql;
create table want as
select var,count(distinct id)/(select count(distinct id) from have) as per format=percent8.2
 from have 
  where var is not missing
  group by var;
quit;

View solution in original post

PeterClemmensen · Posted 02-17-2021 04:59 AM

What do you mean by 'Population is 4' ?

What does your desired result look like given this data?

The DATA to DATA Step Macro
Blog: SASnrd

Jedrzej · Posted 02-17-2021 05:03 AM

distinct ID

ballardw · Posted 02-17-2021 05:20 AM

@Jedrzej wrote:
distinct ID

Since your "example" data has exactly zero occurrences of B for Id=4 I suggest that you provide an actual example of data where the B would actually be "50%". Better would be to provide expected output for ALL of the "example" data so we know what you expect us to do with missing values of Var.

andreas_lds · Posted 02-17-2021 05:23 AM

@Jedrzej wrote:

Hi, i have a dataset;

ID VAR

1 A

1 B

2 B

3 A

3

4 A

and I want to calculate the percentege of distinct value in var a for all ID:

for example: population is 4, occurance of var "B" is 2 so the result is 50%

thank you

No, this does not make any sense at all.

You have six obs in the table, so the two B are 33% or 40% if the obs with missing VAR should be ignored.

So, if you want help, please show what you expect as result and explain the logic you want applied.

Jedrzej · Posted 02-17-2021 05:34 AM

No, I have 6 obs, but distinct ID's are 4 so the entire population is 4, one ID= one item. So there are two B, one for ID1 second for ID2. The question is: how many B are there for the entire population?

ballardw · Posted 02-17-2021 05:48 AM

As I suggested previously show the entrie expected output for your example data.

like what is the result for A?

As phrased there are exactly 3 "distinct" values of Var: A, B and blank.

And maybe a larger completely worked example as your logic isn't really clear.

Jedrzej · Posted 02-17-2021 06:02 AM

ok, so as I wrote earlier the entire population=distinct ID, no matter if there are blanks in var. The output is to show percentage of occurance of var in the entire population, like this:

VAR	Percentage of occurance var/ entire poplulation (distinct ID)
A	50%
B	50%
C	25%

Ksharp · Posted 02-17-2021 06:50 AM

data have;
input ID	VAR $;
cards;
1	A
1	B
2	B
3	A
3	C
4	 .
;
proc sql;
create table want as
select var,count(distinct id)/(select count(distinct id) from have) as per format=percent8.2
 from have 
  where var is not missing
  group by var;
quit;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away