BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dimuly0
Fluorite | Level 6
Hello dear SAS community,

I‘m a fairly unexperienced user of SAS and would like to know how and if the following (data cleaning) procedure is possible:

I am analyzing time series panel data for multiple companies‘ earnings. The issue is that the number of available observation years differs for the companies (total number of companies > 20000). Many of the companies have only 2-3 years of observations which is insufficient to run a grouped time series analysis.

How can I exclude the companies (the rows) which hold, say, less than 10 observations (years)?

Any help will be much appreciated!

Thanks in advance and best regards
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
proc sql;
create table want as
select * from table1
where ID in (Select ID from table1 group by ID having count(ID)>10);
quit;

That will ensure that each ID is in the data set more than 10 times. But....do those years need to be consecutive? Can a symbol or company name be 'reused' later on in your data set? 

 


@dimuly0 wrote:
Hello dear SAS community,

I‘m a fairly unexperienced user of SAS and would like to know how and if the following (data cleaning) procedure is possible:

I am analyzing time series panel data for multiple companies‘ earnings. The issue is that the number of available observation years differs for the companies (total number of companies > 20000). Many of the companies have only 2-3 years of observations which is insufficient to run a grouped time series analysis.

How can I exclude the companies (the rows) which hold, say, less than 10 observations (years)?

Any help will be much appreciated!

Thanks in advance and best regards

 

View solution in original post

4 REPLIES 4
Reeza
Super User
proc sql;
create table want as
select * from table1
where ID in (Select ID from table1 group by ID having count(ID)>10);
quit;

That will ensure that each ID is in the data set more than 10 times. But....do those years need to be consecutive? Can a symbol or company name be 'reused' later on in your data set? 

 


@dimuly0 wrote:
Hello dear SAS community,

I‘m a fairly unexperienced user of SAS and would like to know how and if the following (data cleaning) procedure is possible:

I am analyzing time series panel data for multiple companies‘ earnings. The issue is that the number of available observation years differs for the companies (total number of companies > 20000). Many of the companies have only 2-3 years of observations which is insufficient to run a grouped time series analysis.

How can I exclude the companies (the rows) which hold, say, less than 10 observations (years)?

Any help will be much appreciated!

Thanks in advance and best regards

 

dimuly0
Fluorite | Level 6

Thank you very much @Reeza!

I will try this code.

I assume I have to replace the " * " as well as " ID " by the firm ID variable in my dataset, right?

 

The years Need to be consecutive and they are consecutively sorted in the dataset.

The deleted companies don't need to be reused later, I can simply delete them out of my dataset.

 

Reeza
Super User

@dimuly0 wrote:

Thank you very much @Reeza!

I will try this code.

I assume I have to replace the " * " as well as " ID " by the firm ID variable in my dataset, right?

 

 


No, you would want the full data set with just the ID's selected though I suppose you could do it in two steps.


@dimuly0 wrote:

 

The years Need to be consecutive and they are consecutively sorted in the dataset.

The deleted companies don't need to be reused later, I can simply delete them out of my dataset.

 


Sometimes a new company can use an old ticker symbol is what I was thinking. So, fictional example, Sears used SRS for the last 50 years. They've now gone bankrupt. 10 years later there's a new company called Space Rocket Services using the same ticker symbol. If that's not an issue you can ignore it. 

 

This code does not test for consecutive dates though, so if you are missing dates in the middle it will give you incorrect results. 

 

 

dimuly0
Fluorite | Level 6

The Code worked perfectly

Many thanks!

 

I see, the two accepts you are Talking About aren't issues for me though.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1893 views
  • 2 likes
  • 2 in conversation