Solved: Re: Count Id once in a group

sascode · Posted 10-04-2022 10:42 PM

Hello,

I have a dataset as below;
data a;
input ID $ Item_id $5.;
datalines;
a1 ABC10
a2 ABC11
a2 ABC12
a2 ABC13
b1 ABC11
b2 ABC11
c1 ABC10
;
run;

Uisng the link below , i can count distinct ID's for each item using below code:
proc sql;
select item_id,
count(distinct ID) as Distinct_Id from a
group by Item_id;
quit;
After seeing the output i am noticing in last two rows that

ABC12 and ABC13 have 1 disctinct id.

However, this ID is same (a2) , therefore , if it has been included for first item count than it should not be

counted in the second item group.

I would appreciate any suggestion.

ballardw · Posted 10-05-2022 10:27 AM

@sascode wrote:
yes, it is fine if it show 0 .
Thank you.

Then maybe something like:

proc sort data=a;
  by id;
run;

data need;
   set a;
   by id;
   if first.id then do;
   end;
   else call missing(id);
run;

proc sql;
select item_id,
count(distinct ID) as Distinct_Id from need
group by Item_id;
quit;

The sort groups all of the ID values together and then the data step sets all but one to missing. That way there is no "duplicate" Id to count.

Caution: this makes no attempt to control which "item_id" gets the Id. You might sort by Id and Item_id, then the first Item_id by alphabetic value would be associated with the ID. More complex rules of which Item_id should have the Id may be a bit difficult to enforce without a lot of work.

View solution in original post

ballardw · Posted 10-05-2022 12:47 AM

So should Item_id ABC12 and ABC13 even be in the output since they do not have "a distinct" id? Or should the Id be blank/missing?

sascode · Posted 10-05-2022 07:33 AM

yes, it is fine if it show 0 .
Thank you.

ballardw · Posted 10-05-2022 10:27 AM

@sascode wrote:
yes, it is fine if it show 0 .
Thank you.

Then maybe something like:

proc sort data=a;
  by id;
run;

data need;
   set a;
   by id;
   if first.id then do;
   end;
   else call missing(id);
run;

proc sql;
select item_id,
count(distinct ID) as Distinct_Id from need
group by Item_id;
quit;

The sort groups all of the ID values together and then the data step sets all but one to missing. That way there is no "duplicate" Id to count.

Caution: this makes no attempt to control which "item_id" gets the Id. You might sort by Id and Item_id, then the first Item_id by alphabetic value would be associated with the ID. More complex rules of which Item_id should have the Id may be a bit difficult to enforce without a lot of work.

Patrick · Posted 10-05-2022 12:55 AM

a2 already exists for item_id ABC11.

So what exactly is your rule for counting distinct IDs? Only count for their first appearance in any group? And if so what's the sort order for the groups (or you don't care)?

sascode · Posted 10-05-2022 07:34 AM

Hi,
Yes, if an id is counted once in a given group , it should not counted again in the next group .
Sorting is not a problem

SAS Innovate 2025: Call for Content

Classroom Training Available!