BookmarkSubscribeRSS Feed
Jarvin99
Obsidian | Level 7

Hi all, I am really new to SAS...I would like to know whether I can remove certain rows by restricting that gvkey_startdate <= fiscal_year <= gvkey_enddate. The raw database looks like this:

Person_id

Company_id

gvkey

Gvkey_startdate

Gvkey_enddate

Fiscal year

roa

1

Abcd

1234

27Nov1999

27Sep2010

2000

7.35

1

Abcd

8801

20Jan2011

 

2000

7.35

1

Abcd

1234

27Nov1999

27Sep2010

2012

6.42

1

Abcd

8801

20Jan2011

 

2012

6.42

2

Sdfg

2367

01Apr2000

 

2001

1.1

2

Sdfg

2367

01Apr2000

 

2002

1.49

2

Cvbn

5678

01Sep1999

30Jun2005

2000

0.52

3

Cvbn

5678

01Sep1999

30Jun2005

2001

0.53

3

Cvbn

5678

01Sep1999

30Jun2005

2002

0.91

 

The final product I want should be like this:

Person_id

Company_id

gvkey

Gvkey_startdate

Gvkey_enddate

Fiscal_year

roa

1

Abcd

1234

27Nov1999

27Sep2010

2000

7.35

1

abcd

8801

20Jan2011

 

2012

6.42

2

Sdfg

2367

01Apr2000

 

2001

1.1

2

Sdfg

2367

01Apr2000

 

2002

1.49

2

Cvbn

5678

01Sep1999

30Jun2005

2000

0.52

3

Cvbn

5678

01Sep1999

30Jun2005

2001

0.53

3

cvbn

5678

01Sep1999

30Jun2005

2002

0.91

 

It will definitely help me a lot if you have a solution if you can remove a row based on other columns. Thank you.

10 REPLIES 10
PaigeMiller
Diamond | Level 26

What does it mean to test if a year is between two dates? Can you explain this?

 

Do you mean any part of the year is between the two dates? Or do you mean the entire year is between the two dates? Or do you mean Jan 1 of that year is between the two dates? Or do you mean something else?

 

How does the logic work when gvkey_enddate is missing? Explain that too.

--
Paige Miller
Jarvin99
Obsidian | Level 7

Do you mean any part of the year is between the two dates?

Answer: yes. So any date of the fiscal_year 2000 should be between gvkey_startdate and gvkey_enddate.

 

 

How does the logic work when gvkey_enddate is missing? Explain that too.

Answer: if the gvkey_enddate is missing, treat it as not ending yet. The reason why there is this gvkey_startdate and gvkey_enddate is due to the M&A. Hence, a company_id named 'Abcd' was acquired by the gvkey=8801 on 20Jan2011 and continues to be acquired until now.

 

Hope my reply clarifies! Thank you for your help.

PaigeMiller
Diamond | Level 26

Okay, I will assume your dates are actual numeric SAS date values, and that year is the 4 digit integer you show. Please let me know if those assumptions are correct.

 

/* UNTESTED CODE */

data want;
    set have;
    if not missing(gvkey_enddate) then do;
        if year(gvkey_startdate)<=fiscal_year<=year(gvkey_enddate) then output;
    end;
    else do;
        if year(gvkey_startdate)<=fiscal_year then output;
    end;
run;
   

 

From now on, please provide data as working SAS data step code (examples and instructions) and not as Excel files and not as data copied and pasted from Excel and not in any other format.

--
Paige Miller
mkeintz
PROC Star

You are using GVKEY, as a company identifier, which means you are using Compustat data.  Because you also have a person id variable, I'm guessing you may be using Compustat's Execucomp database.  It looks like you have date ranges for a given executive being associated with a given company.

 

All the above is to point out that fiscal year in Compustat is NOT the same as calendar year, and that may be important to your research.  The fiscal year Compustat assigns to a company annual report is based on what month its fiscal year ends (I believe this follows IRS conventions).  For instance, a fiscal year that ends anytime in January through May is assigned the preceding year value.  So, for example, if fiscal year ends on May 31, 2020, it will be identified as fiscal year 2019 (June 1, 2019 through May 31, 2020).   The rule is basically whichever calendar year has the bulk of the months in a fiscal year, that calendar year will be used as the fiscal year identifier.  And ties, i.e. fiscal years ending in June, are assigned the latter year.

 

Compustat used to have (I suppose it still does, but I don't have access currently) a variable FY or FYM, with values 1 through 12 identifying the last month of the company's fiscal year, so you could construct the actual calendar date ranges of a fiscal year using fiscal_year (FYEAR) and FY.   Compustat has also added an actual calendar date (variable DATADATE) in the Comlpustat Fundamentals database for quarterly and annual statements, but perhaps that's not in Execucomp.  

 

The upshot is if you had a person leave a company on, say Feb 1, 2020, but the company's FY 2019 was June 1, 2019 through May 31, 2020, your test as I understand it would wrongly assume the person left the company AFTER the fiscal year was over.

 

Depending on the actual needs for your research, you may need to consider generating actual calendar dates for each GVKEY fiscal year, either the joining with tables containing DATADATE, or by finding and using FY to generate calendar date ranges.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PaigeMiller
Diamond | Level 26

Thanks, @mkeintz . I also thought about fiscal years not aligning with calendar years, and how it is can be different from company to company. I leave it up to the original poster @Jarvin99 to clarify this, as this issue regarding fiscal years is something that he has not  mentioned (and really should have been mentioned in the first post).

--
Paige Miller
Jarvin99
Obsidian | Level 7

Thank you both! I went back to rectify my mistake of the fiscal year, fiscal begin date, and fiscal end date. Below are my new codes:

data ciqt.company_matching9;
	set ciqt.company_matching9;
	fyrbegyear = year(fyrbegdt);
	fyrbegmonth = month(fyrbegdt);
	fyrbegday = day(fyrbegdt);
	fyrendyear = year(fyrenddt);
	fyrendmonth = month(fyrenddt);
	fyrendday = day(fyrenddt);
run;


data ciqt.company_matching10;
	set ciqt.company_matching9;
	if not missing(startyear) and not missing(startmonh) and not missing(startday) then do;
		if fyrbegyear>=startyear and fyrbegmonth>=startmonth and fyrbegday>=startday then output;
	end;
	else if not missing(startyear) and not missing(startmonh) and missing(startday) then do;
		if fyrbegyear>=startyear and fyrbegmonth>=startmonth then output;
	end;
	else if not missing(startyear) and missing(startmonh) then do;
		if fyrbegyear>=startyear then output;
	end;
	else if missing(startyear) then output;
run;

I think my logic is correct, but I do not know why, for example:

an observation of fyrbegdt=01JAN2010, startyear=2010, startmonth=10, and missing startday is still kept in my dataset in the end.

 

Please let me know if I should provide a subset of my raw data. Thank you.

Tom
Super User Tom
Super User

If you have the fiscal year begin and end dates as dates then just use them in your comparisons.  No need to break them into components and rebuild a date out of it.

Jarvin99
Obsidian | Level 7

Emmm...I need to do it because my startyear, startmonth, and startday can be missing. So in cases where my startmonth and startday are missing, I can only compare startyear with the year of the fiscal date. Same thing for the cases where startmonth and startyear are non-missing but the startday is missing.

FYI, startyear, startmonth and startday are from an employee's start job date whose form can be yyyy-mm, yyyy, or yyyy-mm-dd, depending on the information available.

Tom
Super User Tom
Super User

I don't think many companies change FY in the middle of a month, so for the START dates that are only missing the day of the month just assume the first of the month.

 

And for the START dates that only include a year what is the logic you want to use?  Do you assume it is the same as the FY?  Do you want to assume that START was the first of the year?  The first of July?  The first of the month that company uses to start a new FY?

mkeintz
PROC Star

Before looking at your data, look at your log.  If the code you posted is an accurate copy of your program, then this should appear in the log:

 

NOTE: Variable startmonh is uninitialized.

You mis-spelled startmonth at least once in your code, so the statement

 

 else if not missing(startyear) and missing(startmonh) then do;
    if fyrbegyear>=startyear then output;
  end;

so even if previous IF conditions are not met your sample observation would qualify for the last one.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1095 views
  • 2 likes
  • 4 in conversation