Hello,
I have a dataset as below;
data a;
input ID $ Item_id $5.;
datalines;
a1 ABC10
a2 ABC11
a2 ABC12
a2 ABC13
b1 ABC11
b2 ABC11
c1 ABC10
;
run;
Uisng the link below , i can count distinct ID's for each item using below code:
proc sql;
select item_id,
count(distinct ID) as Distinct_Id from a
group by Item_id;
quit;
After seeing the output i am noticing in last two rows that
ABC12 and ABC13 have 1 disctinct id.
However, this ID is same (a2) , therefore , if it has been included for first item count than it should not be
counted in the second item group.
I would appreciate any suggestion.
@sascode wrote:
yes, it is fine if it show 0 .
Thank you.
Then maybe something like:
proc sort data=a; by id; run; data need; set a; by id; if first.id then do; end; else call missing(id); run; proc sql; select item_id, count(distinct ID) as Distinct_Id from need group by Item_id; quit;
The sort groups all of the ID values together and then the data step sets all but one to missing. That way there is no "duplicate" Id to count.
Caution: this makes no attempt to control which "item_id" gets the Id. You might sort by Id and Item_id, then the first Item_id by alphabetic value would be associated with the ID. More complex rules of which Item_id should have the Id may be a bit difficult to enforce without a lot of work.
So should Item_id ABC12 and ABC13 even be in the output since they do not have "a distinct" id? Or should the Id be blank/missing?
@sascode wrote:
yes, it is fine if it show 0 .
Thank you.
Then maybe something like:
proc sort data=a; by id; run; data need; set a; by id; if first.id then do; end; else call missing(id); run; proc sql; select item_id, count(distinct ID) as Distinct_Id from need group by Item_id; quit;
The sort groups all of the ID values together and then the data step sets all but one to missing. That way there is no "duplicate" Id to count.
Caution: this makes no attempt to control which "item_id" gets the Id. You might sort by Id and Item_id, then the first Item_id by alphabetic value would be associated with the ID. More complex rules of which Item_id should have the Id may be a bit difficult to enforce without a lot of work.
a2 already exists for item_id ABC11.
So what exactly is your rule for counting distinct IDs? Only count for their first appearance in any group? And if so what's the sort order for the groups (or you don't care)?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.