BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Shayan2012
Quartz | Level 8

Hi guys,

I need to count the number of observations for "each firm" in "each year" to be able to ignore those years with less than (for example) 3 observations.

my data set is something like this (and the count column is what I need!)

year    permno        count

1               1                  3

1               1                 3

1               1                 3

2               1                  1

3               1                 1

1               2                 1

2               2                 4

2               2                 4    

2               2                 4

2               2                 4

3               2                 1

4               2                 1

can any one help me to write this with SQL this, please? (and since I am a newbie in sas, would you please describe it a little Smiley Happy?)

Thanks a lot

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

proc sql;

create table want as

select * from have natural join

     (select *, count(*) as count

      from have

      group by permno, year

/*  having count(*) >=3 */ ) ;

select * from want;

quit;

Says : Count observations within each group defined by permno and year (that's the part within parenthesis) then join these counts with the original table, joining on the variables that they have in common (natural join). 

The commented out (having clause) is where you would exclude years with less than 3 observations .

PG

PG

View solution in original post

15 REPLIES 15
PGStats
Opal | Level 21

proc sql;

create table want as

select * from have natural join

     (select *, count(*) as count

      from have

      group by permno, year

/*  having count(*) >=3 */ ) ;

select * from want;

quit;

Says : Count observations within each group defined by permno and year (that's the part within parenthesis) then join these counts with the original table, joining on the variables that they have in common (natural join). 

The commented out (having clause) is where you would exclude years with less than 3 observations .

PG

PG
Shayan2012
Quartz | Level 8

That worked like a charm. Thanks a lot PG

Linlin
Lapis Lazuli | Level 10

Hi PG,

What is natural join? I have used left, right, inner join, but have never used natural join. I am not too lazy to read. Your expanation would help me to remeber better.

Updated after Art's comments:smileysilly:.

Thank you!

PGStats
Opal | Level 21

natural join is like inner join with an implicit ON clause on all the variables that have the same name in both tables. You can also do a natural left join.

PG

PG
Linlin
Lapis Lazuli | Level 10

Thank you Art and PG!

The expanation I get from you would help me to understand betterSmiley Wink.

thdang
Calcite | Level 5

if i also want to take the average of those specific observation and excluding the one which the average less than e.g. n. how can I do that?

Thanks

Shayan2012
Quartz | Level 8

I think this might work for you:

assume you want to take an average for variable VAR1

Proc sql;

create want as

select *, avg(VAR1) as average

from have

where average gt N;



It would be great if expert guys mention whether it works or not!

thdang
Calcite | Level 5

thank you very much. But can i combine this

Proc sql;

create want as

select *, avg(VAR1) as average

from have

where average gt N;


and this  eg

proc sql noprint ;

create table want as

   select *,count(*) as count

   from have

   group by year, permno

   having calculated count > 2

;

quit;


to have 1 proc insteads of two

Tom
Super User Tom
Super User

Actually the SAS specific implementation of SQL helps with this type of query.  Try this program.

proc sql noprint ;

create table want as

   select *,count(*) as count

   from have

   group by year, permno

;

quit;

Notice the log message :

NOTE: The query requires remerging summary statistics back with the original data.

So you get all of the original data and the new COUNT variable is merged on so that it has the same value for all observations for that year/permno combination. You can add a HAVING clause to eliminate the observations from year/permno combinations with insufficient counts.

proc sql noprint ;

create table want as

   select *,count(*) as count

   from have

   group by year, permno

   having calculated count > 2

;

quit;

PGStats
Opal | Level 21

That doesn't work Tom if the dataset contains only variables year and permno. In that case you get normal summarization without remerging.

PG

PG
Tom
Super User Tom
Super User

In that case use the new COUNT variable as a FREQ or WEIGHT variable in future analysis and save the disk space of store multiple copies of the group by variables.

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

hey guys,

what does select *,do in the above example? Can someone please explain Smiley Happy

i mean i know about select var1,var2,..... from table or select*from table but i haven't seen this above before

art297
Opal | Level 21

It is just asking for all of the variables in the table.  Since it is followed by count(*) as count, between the two statements it is asking for all of the variables as well as a count of the number of records.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 2604 views
  • 8 likes
  • 7 in conversation