BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sascode
Quartz | Level 8

Hello,

I have a dataset as below;
data a;
input ID $ Item_id $5.;
datalines;
a1 ABC10
a2 ABC11
a2 ABC12
a2 ABC13
b1 ABC11
b2 ABC11
c1 ABC10
;
run;

Uisng the link below , i can count distinct ID's for each item using below code:
proc sql;
select item_id,
count(distinct ID) as Distinct_Id from a
group by Item_id;
quit;
After seeing the output i am noticing in last two rows that 

ABC12 and ABC13 have 1 disctinct id. 

However, this ID is same (a2) , therefore , if it has been included for first item count than it should not be 

counted in the second item group.

I would appreciate any suggestion.

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@sascode wrote:
yes, it is fine if it show 0 .
Thank you.

Then maybe something like:

proc sort data=a;
  by id;
run;

data need;
   set a;
   by id;
   if first.id then do;
   end;
   else call missing(id);
run;

proc sql;
select item_id,
count(distinct ID) as Distinct_Id from need
group by Item_id;
quit;

The sort groups all of the ID values together and then the data step sets all but one to missing. That way there is no "duplicate" Id to count.

Caution: this makes no attempt to control which "item_id" gets the Id. You might sort by Id and Item_id, then the first Item_id by alphabetic value would be associated with the ID. More complex rules of which Item_id should have the Id may be a bit difficult to enforce without a lot of work.

View solution in original post

5 REPLIES 5
ballardw
Super User

So should Item_id ABC12 and ABC13 even be in the output since they do not have "a distinct" id? Or should the Id be blank/missing?

sascode
Quartz | Level 8
yes, it is fine if it show 0 .
Thank you.
ballardw
Super User

@sascode wrote:
yes, it is fine if it show 0 .
Thank you.

Then maybe something like:

proc sort data=a;
  by id;
run;

data need;
   set a;
   by id;
   if first.id then do;
   end;
   else call missing(id);
run;

proc sql;
select item_id,
count(distinct ID) as Distinct_Id from need
group by Item_id;
quit;

The sort groups all of the ID values together and then the data step sets all but one to missing. That way there is no "duplicate" Id to count.

Caution: this makes no attempt to control which "item_id" gets the Id. You might sort by Id and Item_id, then the first Item_id by alphabetic value would be associated with the ID. More complex rules of which Item_id should have the Id may be a bit difficult to enforce without a lot of work.

Patrick
Opal | Level 21

a2 already exists for item_id ABC11. 

Patrick_0-1664945567054.png

So what exactly is your rule for counting distinct IDs? Only count for their first appearance in any group? And if so what's the sort order for the groups (or you don't care)?

sascode
Quartz | Level 8
Hi,
Yes, if an id is counted once in a given group , it should not counted again in the next group .
Sorting is not a problem

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 740 views
  • 1 like
  • 3 in conversation