Hi all, I am really new to SAS...I would like to know whether I can remove certain rows by restricting that gvkey_startdate <= fiscal_year <= gvkey_enddate. The raw database looks like this:
Person_id |
Company_id |
gvkey |
Gvkey_startdate |
Gvkey_enddate |
Fiscal year |
roa |
1 |
Abcd |
1234 |
27Nov1999 |
27Sep2010 |
2000 |
7.35 |
1 |
Abcd |
8801 |
20Jan2011 |
|
2000 |
7.35 |
1 |
Abcd |
1234 |
27Nov1999 |
27Sep2010 |
2012 |
6.42 |
1 |
Abcd |
8801 |
20Jan2011 |
|
2012 |
6.42 |
2 |
Sdfg |
2367 |
01Apr2000 |
|
2001 |
1.1 |
2 |
Sdfg |
2367 |
01Apr2000 |
|
2002 |
1.49 |
2 |
Cvbn |
5678 |
01Sep1999 |
30Jun2005 |
2000 |
0.52 |
3 |
Cvbn |
5678 |
01Sep1999 |
30Jun2005 |
2001 |
0.53 |
3 |
Cvbn |
5678 |
01Sep1999 |
30Jun2005 |
2002 |
0.91 |
The final product I want should be like this:
Person_id |
Company_id |
gvkey |
Gvkey_startdate |
Gvkey_enddate |
Fiscal_year |
roa |
1 |
Abcd |
1234 |
27Nov1999 |
27Sep2010 |
2000 |
7.35 |
1 |
abcd |
8801 |
20Jan2011 |
|
2012 |
6.42 |
2 |
Sdfg |
2367 |
01Apr2000 |
|
2001 |
1.1 |
2 |
Sdfg |
2367 |
01Apr2000 |
|
2002 |
1.49 |
2 |
Cvbn |
5678 |
01Sep1999 |
30Jun2005 |
2000 |
0.52 |
3 |
Cvbn |
5678 |
01Sep1999 |
30Jun2005 |
2001 |
0.53 |
3 |
cvbn |
5678 |
01Sep1999 |
30Jun2005 |
2002 |
0.91 |
It will definitely help me a lot if you have a solution if you can remove a row based on other columns. Thank you.
What does it mean to test if a year is between two dates? Can you explain this?
Do you mean any part of the year is between the two dates? Or do you mean the entire year is between the two dates? Or do you mean Jan 1 of that year is between the two dates? Or do you mean something else?
How does the logic work when gvkey_enddate is missing? Explain that too.
Do you mean any part of the year is between the two dates?
Answer: yes. So any date of the fiscal_year 2000 should be between gvkey_startdate and gvkey_enddate.
How does the logic work when gvkey_enddate is missing? Explain that too.
Answer: if the gvkey_enddate is missing, treat it as not ending yet. The reason why there is this gvkey_startdate and gvkey_enddate is due to the M&A. Hence, a company_id named 'Abcd' was acquired by the gvkey=8801 on 20Jan2011 and continues to be acquired until now.
Hope my reply clarifies! Thank you for your help.
Okay, I will assume your dates are actual numeric SAS date values, and that year is the 4 digit integer you show. Please let me know if those assumptions are correct.
/* UNTESTED CODE */
data want;
set have;
if not missing(gvkey_enddate) then do;
if year(gvkey_startdate)<=fiscal_year<=year(gvkey_enddate) then output;
end;
else do;
if year(gvkey_startdate)<=fiscal_year then output;
end;
run;
From now on, please provide data as working SAS data step code (examples and instructions) and not as Excel files and not as data copied and pasted from Excel and not in any other format.
You are using GVKEY, as a company identifier, which means you are using Compustat data. Because you also have a person id variable, I'm guessing you may be using Compustat's Execucomp database. It looks like you have date ranges for a given executive being associated with a given company.
All the above is to point out that fiscal year in Compustat is NOT the same as calendar year, and that may be important to your research. The fiscal year Compustat assigns to a company annual report is based on what month its fiscal year ends (I believe this follows IRS conventions). For instance, a fiscal year that ends anytime in January through May is assigned the preceding year value. So, for example, if fiscal year ends on May 31, 2020, it will be identified as fiscal year 2019 (June 1, 2019 through May 31, 2020). The rule is basically whichever calendar year has the bulk of the months in a fiscal year, that calendar year will be used as the fiscal year identifier. And ties, i.e. fiscal years ending in June, are assigned the latter year.
Compustat used to have (I suppose it still does, but I don't have access currently) a variable FY or FYM, with values 1 through 12 identifying the last month of the company's fiscal year, so you could construct the actual calendar date ranges of a fiscal year using fiscal_year (FYEAR) and FY. Compustat has also added an actual calendar date (variable DATADATE) in the Comlpustat Fundamentals database for quarterly and annual statements, but perhaps that's not in Execucomp.
The upshot is if you had a person leave a company on, say Feb 1, 2020, but the company's FY 2019 was June 1, 2019 through May 31, 2020, your test as I understand it would wrongly assume the person left the company AFTER the fiscal year was over.
Depending on the actual needs for your research, you may need to consider generating actual calendar dates for each GVKEY fiscal year, either the joining with tables containing DATADATE, or by finding and using FY to generate calendar date ranges.
Thanks, @mkeintz . I also thought about fiscal years not aligning with calendar years, and how it is can be different from company to company. I leave it up to the original poster @Jarvin99 to clarify this, as this issue regarding fiscal years is something that he has not mentioned (and really should have been mentioned in the first post).
Thank you both! I went back to rectify my mistake of the fiscal year, fiscal begin date, and fiscal end date. Below are my new codes:
data ciqt.company_matching9;
set ciqt.company_matching9;
fyrbegyear = year(fyrbegdt);
fyrbegmonth = month(fyrbegdt);
fyrbegday = day(fyrbegdt);
fyrendyear = year(fyrenddt);
fyrendmonth = month(fyrenddt);
fyrendday = day(fyrenddt);
run;
data ciqt.company_matching10;
set ciqt.company_matching9;
if not missing(startyear) and not missing(startmonh) and not missing(startday) then do;
if fyrbegyear>=startyear and fyrbegmonth>=startmonth and fyrbegday>=startday then output;
end;
else if not missing(startyear) and not missing(startmonh) and missing(startday) then do;
if fyrbegyear>=startyear and fyrbegmonth>=startmonth then output;
end;
else if not missing(startyear) and missing(startmonh) then do;
if fyrbegyear>=startyear then output;
end;
else if missing(startyear) then output;
run;
I think my logic is correct, but I do not know why, for example:
an observation of fyrbegdt=01JAN2010, startyear=2010, startmonth=10, and missing startday is still kept in my dataset in the end.
Please let me know if I should provide a subset of my raw data. Thank you.
If you have the fiscal year begin and end dates as dates then just use them in your comparisons. No need to break them into components and rebuild a date out of it.
Emmm...I need to do it because my startyear, startmonth, and startday can be missing. So in cases where my startmonth and startday are missing, I can only compare startyear with the year of the fiscal date. Same thing for the cases where startmonth and startyear are non-missing but the startday is missing.
FYI, startyear, startmonth and startday are from an employee's start job date whose form can be yyyy-mm, yyyy, or yyyy-mm-dd, depending on the information available.
I don't think many companies change FY in the middle of a month, so for the START dates that are only missing the day of the month just assume the first of the month.
And for the START dates that only include a year what is the logic you want to use? Do you assume it is the same as the FY? Do you want to assume that START was the first of the year? The first of July? The first of the month that company uses to start a new FY?
Before looking at your data, look at your log. If the code you posted is an accurate copy of your program, then this should appear in the log:
NOTE: Variable startmonh is uninitialized.
You mis-spelled startmonth at least once in your code, so the statement
else if not missing(startyear) and missing(startmonh) then do;
if fyrbegyear>=startyear then output;
end;
so even if previous IF conditions are not met your sample observation would qualify for the last one.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.