BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dimuly0
Fluorite | Level 6
Hello dear SAS community,

I‘m a fairly unexperienced user of SAS and would like to know how and if the following (data cleaning) procedure is possible:

I am analyzing time series panel data for multiple companies‘ earnings. The issue is that the number of available observation years differs for the companies (total number of companies > 20000). Many of the companies have only 2-3 years of observations which is insufficient to run a grouped time series analysis.

How can I exclude the companies (the rows) which hold, say, less than 10 observations (years)?

Any help will be much appreciated!

Thanks in advance and best regards
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
proc sql;
create table want as
select * from table1
where ID in (Select ID from table1 group by ID having count(ID)>10);
quit;

That will ensure that each ID is in the data set more than 10 times. But....do those years need to be consecutive? Can a symbol or company name be 'reused' later on in your data set? 

 


@dimuly0 wrote:
Hello dear SAS community,

I‘m a fairly unexperienced user of SAS and would like to know how and if the following (data cleaning) procedure is possible:

I am analyzing time series panel data for multiple companies‘ earnings. The issue is that the number of available observation years differs for the companies (total number of companies > 20000). Many of the companies have only 2-3 years of observations which is insufficient to run a grouped time series analysis.

How can I exclude the companies (the rows) which hold, say, less than 10 observations (years)?

Any help will be much appreciated!

Thanks in advance and best regards

 

View solution in original post

4 REPLIES 4
Reeza
Super User
proc sql;
create table want as
select * from table1
where ID in (Select ID from table1 group by ID having count(ID)>10);
quit;

That will ensure that each ID is in the data set more than 10 times. But....do those years need to be consecutive? Can a symbol or company name be 'reused' later on in your data set? 

 


@dimuly0 wrote:
Hello dear SAS community,

I‘m a fairly unexperienced user of SAS and would like to know how and if the following (data cleaning) procedure is possible:

I am analyzing time series panel data for multiple companies‘ earnings. The issue is that the number of available observation years differs for the companies (total number of companies > 20000). Many of the companies have only 2-3 years of observations which is insufficient to run a grouped time series analysis.

How can I exclude the companies (the rows) which hold, say, less than 10 observations (years)?

Any help will be much appreciated!

Thanks in advance and best regards

 

dimuly0
Fluorite | Level 6

Thank you very much @Reeza!

I will try this code.

I assume I have to replace the " * " as well as " ID " by the firm ID variable in my dataset, right?

 

The years Need to be consecutive and they are consecutively sorted in the dataset.

The deleted companies don't need to be reused later, I can simply delete them out of my dataset.

 

Reeza
Super User

@dimuly0 wrote:

Thank you very much @Reeza!

I will try this code.

I assume I have to replace the " * " as well as " ID " by the firm ID variable in my dataset, right?

 

 


No, you would want the full data set with just the ID's selected though I suppose you could do it in two steps.


@dimuly0 wrote:

 

The years Need to be consecutive and they are consecutively sorted in the dataset.

The deleted companies don't need to be reused later, I can simply delete them out of my dataset.

 


Sometimes a new company can use an old ticker symbol is what I was thinking. So, fictional example, Sears used SRS for the last 50 years. They've now gone bankrupt. 10 years later there's a new company called Space Rocket Services using the same ticker symbol. If that's not an issue you can ignore it. 

 

This code does not test for consecutive dates though, so if you are missing dates in the middle it will give you incorrect results. 

 

 

dimuly0
Fluorite | Level 6

The Code worked perfectly

Many thanks!

 

I see, the two accepts you are Talking About aren't issues for me though.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 693 views
  • 2 likes
  • 2 in conversation