Hi all,
Today I try to count the distinct observation of variable "Type" in my dataset. A part of my sample of the dataset merge_treat_con
is as below:
TYPE GEOGN yr
13016K AUSTRALIA 2002
13016K AUSTRALIA 2004
13016K AUSTRALIA 2005
13016K AUSTRALIA 2007
13016K AUSTRALIA 2008
130347 AUSTRALIA 2001
130495 AUSTRALIA 2002
130495 AUSTRALIA 2004
130495 AUSTRALIA 2005
130496 AUSTRALIA 2004
130496 AUSTRALIA 2006
130496 AUSTRALIA 2007
130496 AUSTRALIA 2008
130560 AUSTRALIA 2002
130560 AUSTRALIA 2004
130560 AUSTRALIA 2005
130560 AUSTRALIA 2008
50234Q FRANCE 2004
50234Q FRANCE 2005
50234Q FRANCE 2006
50290W FRANCE 2004
50290W FRANCE 2005
50290W FRANCE 2006
50343P FRANCE 2005
50343P FRANCE 2006
50367K FRANCE 1999
What I want is the output unique_firms
is as below
GEOGN Unique_Type
AUSTRALIA 5
FRANCE 4
The first procedure popping up in my head is PROC SQL, my code is
proc sql;
create table unique_firms as
select distinct Type as uniquetype, count(uniquetype) as Unique_Type, geogn
from merge_treat_con
group by geogn;
quit;
But the result is not the desired result (this is the result for my whole sample)
uniquetype Unique_Type GEOGN
130042 50089 UNITEDS
130045 50089 UNITEDS
130056 50089 UNITEDS
130062 50089 UNITEDS
130079 50089 UNITEDS
130088 50089 UNITEDS
130092 1507 MEXICO
130104 2380 ISRAEL
130113 50089 UNITEDS
130117 1361 BRAZIL
13012F 939 PERU
130165 1361 BRAZIL
13016K 7255 AUSTRALIA
13016L 1615 FINLAND
130171 1361 BRAZIL
130189 1361 BRAZIL
130194 1361 BRAZIL
13021K 50089 UNITEDS
Can you please hint me to sort it out?
Thank you and warm regards.
Supplying results from full data is not helpful. From the subset you show as an example to start with is the way to show results.
First thing is that your problem description is incomplete. You say: "Today I try to count the distinct observation of variable "Type" in my dataset". but then very clearly show that you want the distinct observation of type within the values of country.
And sometimes it just plain easier to count after subsetting
Proc sql; create table want as select geogn, count(*) as unique_types from (select distinct geogn, type from merge)
group by geogn ; run;
generates something closer to what you show for want.
I don't know why you are creating a new type variable when you say you only want a count
Supplying results from full data is not helpful. From the subset you show as an example to start with is the way to show results.
First thing is that your problem description is incomplete. You say: "Today I try to count the distinct observation of variable "Type" in my dataset". but then very clearly show that you want the distinct observation of type within the values of country.
And sometimes it just plain easier to count after subsetting
Proc sql; create table want as select geogn, count(*) as unique_types from (select distinct geogn, type from merge)
group by geogn ; run;
generates something closer to what you show for want.
I don't know why you are creating a new type variable when you say you only want a count
Thank you very much
@ballardw
I created a new variable because I do not know we can count directly from a subsetting as your approach by using SQL.
Your approach solved my problem perfectly. I just want to confirm one last thing from my understanding
For your set of code below:
select geogn, count(*) as unique_types
from (select distinct geogn, type from merge)
group by geogn
So, SAS will group the data by geogn, then SAS will select the column geogn and unique value of column type from dataset merge. Then, lastly SAS will perform the code
select geogn, count(*) as unique_types
, am I correct?
Thank you and warm regards.
You understand correctly.
Distinct used in the SQL and it's common to include your grouping variables at the start of your query for clarity.
proc sql;
create table unique_firms as
select geogn, count(distinct Type) as Unique_Type
from merge_treat_con
group by geogn;
quit;
Thank you so much, @Reeza
Your code is even shorter and direct.
Many thanks and warm regards.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.