Solved: Re: How to count distinct observation?

Phil_NZ · Posted 04-25-2021 07:21 PM

Hi all,

Today I try to count the distinct observation of variable "Type" in my dataset. A part of my sample of the dataset merge_treat_con is as below:

TYPE	GEOGN	    yr
13016K	AUSTRALIA	2002
13016K	AUSTRALIA	2004
13016K	AUSTRALIA	2005
13016K	AUSTRALIA	2007
13016K	AUSTRALIA	2008
130347	AUSTRALIA	2001
130495	AUSTRALIA	2002
130495	AUSTRALIA	2004
130495	AUSTRALIA	2005
130496	AUSTRALIA	2004
130496	AUSTRALIA	2006
130496	AUSTRALIA	2007
130496	AUSTRALIA	2008
130560	AUSTRALIA	2002
130560	AUSTRALIA	2004
130560	AUSTRALIA	2005
130560	AUSTRALIA	2008
50234Q	FRANCE	    2004
50234Q	FRANCE  	2005
50234Q	FRANCE	    2006
50290W	FRANCE	    2004
50290W	FRANCE      2005
50290W	FRANCE	    2006
50343P	FRANCE	    2005
50343P	FRANCE	    2006
50367K	FRANCE	    1999

What I want is the output unique_firms is as below

GEOGN        Unique_Type   
AUSTRALIA    5
FRANCE       4

The first procedure popping up in my head is PROC SQL, my code is

 proc sql;
 create table unique_firms as
  select distinct Type as uniquetype, count(uniquetype) as  Unique_Type, geogn
   from merge_treat_con
  group by geogn;
quit;

But the result is not the desired result (this is the result for my whole sample)

uniquetype	Unique_Type	GEOGN
130042	    50089	UNITEDS
130045	    50089	UNITEDS
130056	    50089	UNITEDS
130062      50089	UNITEDS
130079  	50089	UNITEDS
130088	    50089	UNITEDS
130092	    1507	MEXICO
130104	    2380	ISRAEL
130113	    50089	UNITEDS
130117	    1361	BRAZIL
13012F	    939	    PERU
130165	    1361	BRAZIL
13016K  	7255	AUSTRALIA
13016L	    1615	FINLAND
130171	    1361	BRAZIL
130189	    1361	BRAZIL
130194	    1361	BRAZIL
13021K	    50089	UNITEDS

Can you please hint me to sort it out?

Thank you and warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

ballardw · Posted 04-25-2021 08:17 PM

Supplying results from full data is not helpful. From the subset you show as an example to start with is the way to show results.

First thing is that your problem description is incomplete. You say: "Today I try to count the distinct observation of variable "Type" in my dataset". but then very clearly show that you want the distinct observation of type within the values of country.

And sometimes it just plain easier to count after subsetting

Proc sql;
   create table want as
   select geogn, count(*) as unique_types
  from (select distinct geogn, type from merge)
  group by geogn
  ;
run;

generates something closer to what you show for want.

I don't know why you are creating a new type variable when you say you only want a count

View solution in original post

ballardw · Posted 04-25-2021 08:17 PM

Supplying results from full data is not helpful. From the subset you show as an example to start with is the way to show results.

First thing is that your problem description is incomplete. You say: "Today I try to count the distinct observation of variable "Type" in my dataset". but then very clearly show that you want the distinct observation of type within the values of country.

And sometimes it just plain easier to count after subsetting

Proc sql;
   create table want as
   select geogn, count(*) as unique_types
  from (select distinct geogn, type from merge)
  group by geogn
  ;
run;

generates something closer to what you show for want.

I don't know why you are creating a new type variable when you say you only want a count

Phil_NZ · Posted 04-25-2021 09:36 PM

Thank you very much
@ballardw

I created a new variable because I do not know we can count directly from a subsetting as your approach by using SQL.

Your approach solved my problem perfectly. I just want to confirm one last thing from my understanding

For your set of code below:

   select geogn, count(*) as unique_types
  from (select distinct geogn, type from merge)
  group by geogn

So, SAS will group the data by geogn, then SAS will select the column geogn and unique value of column type from dataset merge. Then, lastly SAS will perform the code

select geogn, count(*) as unique_types

, am I correct?

Thank you and warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

ballardw · Posted 04-25-2021 11:01 PM

You understand correctly.

Reeza · Posted 04-25-2021 09:53 PM

Distinct used in the SQL and it's common to include your grouping variables at the start of your query for clarity.

 proc sql;
 create table unique_firms as
  select geogn, count(distinct Type) as  Unique_Type
   from merge_treat_con
  group by geogn;
quit;

Phil_NZ · Posted 04-25-2021 11:58 PM

Thank you so much, @Reeza

Your code is even shorter and direct.

Many thanks and warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

Reeza · Posted 04-26-2021 11:22 AM

https://github.com/statgeek/SAS-Tutorials/blob/master/count_distinct_by_group.sas

Classroom Training Available!