Hi all,
I am having a dataset (specific range like Jan 1926 to Dec 1938) of diifferent companies across 12 years having time variables. There are some companies whose starting date is falling after my start date and some are having ending dates before mine. So, can anyone please help me in figuring out, how can I work only on the dataset having companies as per my selected range and all rest of the companies get deleted from the dataset with a different start and ending date in my dataset ?
Regards
You did very well to get this far. Good job! Here are a few items to note, to build upon your work to date.
PROC FREQ supports the OUT= option on the TABLE statement. That might be slightly simpler compared to using ODS to capture the frequencies in a data set.
The final result also contains a couple of additional variables (from the ODS data set). You can get rid of the extra variables, and cut out a step from the program at the same time:
data final (drop=frequency);
merge data1 (keep=permno frequency) ana;
by permno;
if frequency NE 155 then delete;
run;
Notice how the data set DATA1 is available, and does not need to be copied to SORT_FREQ. When you are processing small data sets, that makes little difference. But when the data gets larger, it makes a much larger difference.
Along similar lines, procedures can process permanent SAS data sets. You could remove the first DATA step entirely, and use the original data set (in two places):
proc freq data=sasusers.q0d27e531449c3c15;
........
1. Figure out the start and end date for each company. You can use a variety of options for this, using MIN and MAX statistics on the dates is one option.
2. Filter the company list to get a master list
3. Filter big data to only include company for Step #2.
You could also check if PROC TIMESERIES (SAS/ETS) has an option to drop series without full data, as it has many options to deal with time series data.
@amanjot_42 wrote:
Hi all,
I am having a dataset (specific range like Jan 1926 to Dec 1938) of diifferent companies across 12 years having time variables. There are some companies whose starting date is falling after my start date and some are having ending dates before mine. So, can anyone please help me in figuring out, how can I work only on the dataset having companies as per my selected range and all rest of the companies get deleted from the dataset with a different start and ending date in my dataset ?
Regards
Thank you for your prompt reply,
I am a beginner in SAS,
can you please help me to understand this more elaborately (with codes, if possible)?
Like, presently, I am using this code;
%let start_date=31jan1926;
%let end_date=31dec1938;
data want;
set work.fmdata;
do date="&start_date"d to "&end_date"d;
output;
end;
format date date9.;
I dont know this will work or not ?
run;
Regards
If you want specific answers, you need to provide specific information.
What are your starting and ending dates?
What are the names of key variables in your data set? (company identifiers, date-related variables)
For date-related variables, what values do they take on?
Any rules about which companies to delete? For example, if a company's ending date is a week before your ending date, do you want to get rid of it? What if it is two days before your ending date?
Thanks for your reply,
so I have date permno retx sprtrn as columns in my file. permno relates to companies' codes. retx is monthly stock returns of the respective companies across the year. sprtrn is monthly market returns across the sample years. I have collected the data for the range (start date=31/01/1926 and end date=31/12/1938) with yymmdd6. as date format.
I want to keep only those companies (permno) whose starting date and ending date is exactly the one mentioned above. The companies falling under different range like starting at 31/03/1926 or ending at 31/05/1936 should not be part of the proposed dataset (full deletion).
So that I may have a dataset of date permno retx and sprtrn of only those companies having sample across 31/01/1926 and 31/12/1938
Please help in this regard
Start at step 1 then. Do you know how to calculate summary statistics for a dataset? If not, that’s something you need to learn anyways.
If you do, calculate her min/max for the date field for each company. That’ll be what you need to start.
If you’re new to programming the trick is to break it into smaller steps that are each manageable.
And if if you don’t here some resources
https://github.com/statgeek/SAS-Tutorials/blob/master/proc_means_basic.sas
I'm going to point you in a mildly different direction to get started. You will still need to learn basic SAS procedures (SORT, MEANS, FREQ) if you ever want to be able to program in SAS. But for starters, I would use:
proc freq data=have;
tables permno;
where ('31jan1926'd <= date <= '31dec1938'd);
run;
That gives you a count of how many observations each company has, within your time period. That's probably a better way than taking the minimum and maximum date for each company, because the output here will let you know how many observations you have. That way, you won't decide to keep data for a company that has 3 years missing in the middle of the series.
Once you get a feel for what PROC FREQ can do, we can talk about getting the PROC FREQ results into a data set, to automate the selection process.
Hi,
I got the prelim results of FREQ (attached). So, I want companies with observations equivalent to 155 and the rest one to get dropped from the dataset.
Can we move further ?
Regards
Hello,
After going through the material (you suggested);
I am able to get the data for the companies within the specified range.
regards
You did very well to get this far. Good job! Here are a few items to note, to build upon your work to date.
PROC FREQ supports the OUT= option on the TABLE statement. That might be slightly simpler compared to using ODS to capture the frequencies in a data set.
The final result also contains a couple of additional variables (from the ODS data set). You can get rid of the extra variables, and cut out a step from the program at the same time:
data final (drop=frequency);
merge data1 (keep=permno frequency) ana;
by permno;
if frequency NE 155 then delete;
run;
Notice how the data set DATA1 is available, and does not need to be copied to SORT_FREQ. When you are processing small data sets, that makes little difference. But when the data gets larger, it makes a much larger difference.
Along similar lines, procedures can process permanent SAS data sets. You could remove the first DATA step entirely, and use the original data set (in two places):
proc freq data=sasusers.q0d27e531449c3c15;
........
thanks a ton for your response and resources,
Now moving further I want to run regression taking retx as dependent and sprtrn as an explanatory variables across the years 1926 to 1929, and on the basis of beta values generated for the individual firms, I want to create 10 portfolios (or decile groups) which I want to use further for rolling regression across the years 1930 to 1938. Like I want to use portfolio returns (of 10 portfolios respectively) data for Jan 1930 to Dec 1934 (60 months) to compute market betas (dvar: portfolio returns and Ivar: sprtrn) for the month Jan 1935, then rolling over by one month up till Dec 1938.
Can you please help!
There's very little I can tell you about regression. For the simplest cases, PROC REG will do. To gain access to more of the bells and whistles, take a look at PROC GLM.
I'd suggest you post this as a new question. Many posters won't bother to read about a question that is already solved.
Ok sure,
Thanks a lot for your help,
Regards
You want to generate CAPM (capital asset pricing model, i.e. MODEL retx=sprtrn) beta estimates using CRSP monthly data for your sample for a subperiod, then use the beta estimates to form decile portfolios.
Then you want rolling 60 month PORTFOLIO returns, meaning that after the regresssion, you also will need a data step or proc to compound monthly returns of each portfolio component. (I presume there is no reweighting within portfolios over the 60 month window).
It's time to start a new question. Don't move the goal posts within a topic.
Also, what do you want to do with stocks that have a negative BETA? A very negative beta has a high correlation with the market, unlike the low but positive beta's in the same decile.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.