proc sql;
create table want as
select * from table1
where ID in (Select ID from table1 group by ID having count(ID)>10);
quit;
That will ensure that each ID is in the data set more than 10 times. But....do those years need to be consecutive? Can a symbol or company name be 'reused' later on in your data set?
@dimuly0 wrote:
Hello dear SAS community,
I‘m a fairly unexperienced user of SAS and would like to know how and if the following (data cleaning) procedure is possible:
I am analyzing time series panel data for multiple companies‘ earnings. The issue is that the number of available observation years differs for the companies (total number of companies > 20000). Many of the companies have only 2-3 years of observations which is insufficient to run a grouped time series analysis.
How can I exclude the companies (the rows) which hold, say, less than 10 observations (years)?
Any help will be much appreciated!
Thanks in advance and best regards
proc sql;
create table want as
select * from table1
where ID in (Select ID from table1 group by ID having count(ID)>10);
quit;
That will ensure that each ID is in the data set more than 10 times. But....do those years need to be consecutive? Can a symbol or company name be 'reused' later on in your data set?
@dimuly0 wrote:
Hello dear SAS community,
I‘m a fairly unexperienced user of SAS and would like to know how and if the following (data cleaning) procedure is possible:
I am analyzing time series panel data for multiple companies‘ earnings. The issue is that the number of available observation years differs for the companies (total number of companies > 20000). Many of the companies have only 2-3 years of observations which is insufficient to run a grouped time series analysis.
How can I exclude the companies (the rows) which hold, say, less than 10 observations (years)?
Any help will be much appreciated!
Thanks in advance and best regards
Thank you very much @Reeza!
I will try this code.
I assume I have to replace the " * " as well as " ID " by the firm ID variable in my dataset, right?
The years Need to be consecutive and they are consecutively sorted in the dataset.
The deleted companies don't need to be reused later, I can simply delete them out of my dataset.
@dimuly0 wrote:
Thank you very much @Reeza!
I will try this code.
I assume I have to replace the " * " as well as " ID " by the firm ID variable in my dataset, right?
No, you would want the full data set with just the ID's selected though I suppose you could do it in two steps.
@dimuly0 wrote:
The years Need to be consecutive and they are consecutively sorted in the dataset.
The deleted companies don't need to be reused later, I can simply delete them out of my dataset.
Sometimes a new company can use an old ticker symbol is what I was thinking. So, fictional example, Sears used SRS for the last 50 years. They've now gone bankrupt. 10 years later there's a new company called Space Rocket Services using the same ticker symbol. If that's not an issue you can ignore it.
This code does not test for consecutive dates though, so if you are missing dates in the middle it will give you incorrect results.
The Code worked perfectly
Many thanks!
I see, the two accepts you are Talking About aren't issues for me though.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.