Hi All,
I've got an abbreviated dataset named narating with the following:
gvkey year month rating rtchg
1004 2001 8 . .
1004 2001 9 BBB 1
1004 2001 10 . .
1004 2001 12 . .
1004 2002 1 BBB- 1
1004 2002 2 . .
1004 2003 1 . .
1004 2003 2 BBB- 0
1004 2003 3 . .
1004 2003 4 BB- 1
1004 2003 5 . .
1004 2003 6 BB- 0
1004 2003 7 . .
1036 2000 8 . .
1036 2000 9 BBB 1
1036 2000 10 . .
1036 2000 11 . .
1036 2000 12 BBB 0
The data above are monthly data with some records holding input for rating and rtchg and some not. All records have gvkey, year and month. My second data set named compq has:
gvkey year month LEV
1004 2000 2 0.87595
1004 2000 5 0.87637
1004 2000 8 0.8868
1004 2000 11 0.88743
1004 2001 2 0.88669
1004 2001 5 0.8683
1004 2001 8 0.89654
1004 2001 11 0.8987
1004 2002 2 0.8626
1004 2002 5 0.88574
1004 2002 8 0.89588
1004 2002 11 0.89536
1004 2003 2 0.89408
1004 2003 5 0.88452
1004 2003 8 0.88299
1004 2003 11 0.86728
1004 2004 2 0.88613
1004 2004 5 0.87953
1004 2004 8 0.87573
1004 2004 11 0.8751
1036 1999 3 0.54841
1036 1999 6 0.55955
1036 1999 9 0.5813
1036 1999 12 0.57969
1036 2000 3 0.61093
1036 2000 6 0.60252
1036 2000 9 0.60428
1036 2000 12 0.60025
These data are quarterly data and notice that the first firm has quarter-ends of 2, 5, 8, and 11 whereas the second firm has quarter-ends of (the typical) 3, 6, 9, and 12. Now what I want to do is join these such that if the monthly data fits in naturally between two quarters, then the monthly data will be put on the same line as the NEXT quarter-end. If the monthly data naturally is already at the same time as quarter-end, keep it there. When I use the following code:
proc sql;
create table narating as
select a.*, b.*
from naratings as a
left join
compq as b
on a.gvkey=b.gvkey and a.year=b.year and a.month=b.month;
quit;
I get the following output:
gvkey year month LEV rating rtchg
1004 2001 1 . . .
1004 2001 2 0.88669 . .
1004 2001 3 . . .
1004 2001 4 . . .
1004 2001 5 0.8683 . .
1004 2001 6 . . .
1004 2001 7 . . .
1004 2001 8 0.89654 . .
1004 2001 9 . BBB 1
1004 2001 10 . . .
1004 2001 11 0.8987 . .
1004 2001 12 . . .
1004 2002 1 . BBB- 1
1004 2002 2 0.8626 . .
1004 2002 3 . . .
1004 2002 4 . . .
1004 2002 5 0.88574 . .
1004 2002 6 . . .
1004 2002 7 . . .
1004 2002 8 0.89588 . .
1004 2002 9 . . .
1004 2002 10 . . .
1004 2002 11 0.89536 . .
1004 2002 12 . . .
1004 2003 1 . . .
1004 2003 2 0.89408 BBB- 0
1004 2003 3 . . .
1004 2003 4 . BB- 1
1004 2003 5 0.88452 . .
1004 2003 6 . BB- 0
1004 2003 7 . . .
1004 2003 8 0.88299 . .
1004 2003 9 . . .
1004 2003 10 . . .
1004 2003 11 0.86728 . .
1004 2003 12 . . .
1036 2000 1 . . .
1036 2000 2 . . .
1036 2000 3 0.61093 . .
1036 2000 4 . . .
1036 2000 5 . . .
1036 2000 6 0.60252 . .
1036 2000 7 . . .
1036 2000 8 . . .
1036 2000 9 0.60428 BBB 1
1036 2000 10 . . .
1036 2000 11 . . .
1036 2000 12 0.60025 BBB 0
If you look at the records that are emboldened and underlined, they are stuck on months between quarter-ends and need to be moved to quarter-ends to look like this:
gvkey year month LEV rating rtchg
1004 2001 2 0.88669 . .
1004 2001 5 0.8683 . .
1004 2001 8 0.89654 . .
1004 2001 11 0.8987 BBB 1
1004 2002 2 0.8626 BBB- 1
1004 2002 5 0.88574 . .
1004 2002 8 0.89588 . .
1004 2002 11 0.89536 . .
1004 2003 2 0.89408 BBB- 0
1004 2003 5 0.88452 BB- 1
1004 2003 8 0.88299 BB- 0
1004 2003 11 0.86728 . .
1004 2004 2 0.88613 . .
1004 2004 5 0.87953 . .
1004 2004 8 0.87573 . .
1004 2004 11 0.8751 . .
1036 1999 3 0.54841 . .
1036 1999 6 0.55955 . .
1036 1999 9 0.5813 . .
1036 1999 12 0.57969 . .
1036 2000 3 0.61093 . .
1036 2000 6 0.60252 . .
1036 2000 9 0.60428 BBB 1
1036 2000 12 0.60025 BBB 0
Notice that I have nothing showing but quarterly data showing now. Thank you for any help you can provide!
Joe
Do you only want the quarter ends in your outcome, or all months from naratings? If you only want the quarter ends, you need to anchor your join on compq (right join), and not on naratings (left join).
Hi Kurt,
I want the final result/dataset to only show quarter ends; however, I don't want to lose the monthly data. If monthly data match up on quarter ends (same year, same month), that's great! But if the monthly data are between quarter ends, I need the monthly data moved to the quarter end and shown there.
Thanks for your thoughts and help.
Joe
Then try this:
data naratings;
input gvkey :$4. year month rating $ rtchg;
date = mdy(month,1,year);
datalines;
1004 2001 8 . .
1004 2001 9 BBB 1
1004 2001 10 . .
1004 2001 12 . .
1004 2002 1 BBB- 1
1004 2002 2 . .
1004 2003 1 . .
1004 2003 2 BBB- 0
1004 2003 3 . .
1004 2003 4 BB- 1
1004 2003 5 . .
1004 2003 6 BB- 0
1004 2003 7 . .
1036 2000 8 . .
1036 2000 9 BBB 1
1036 2000 10 . .
1036 2000 11 . .
1036 2000 12 BBB 0
;
data compq;
input gvkey :$4. year month LEV;
date = mdy(month,1,year);
datalines;
1004 2000 2 0.87595
1004 2000 5 0.87637
1004 2000 8 0.8868
1004 2000 11 0.88743
1004 2001 2 0.88669
1004 2001 5 0.8683
1004 2001 8 0.89654
1004 2001 11 0.8987
1004 2002 2 0.8626
1004 2002 5 0.88574
1004 2002 8 0.89588
1004 2002 11 0.89536
1004 2003 2 0.89408
1004 2003 5 0.88452
1004 2003 8 0.88299
1004 2003 11 0.86728
1004 2004 2 0.88613
1004 2004 5 0.87953
1004 2004 8 0.87573
1004 2004 11 0.8751
1036 1999 3 0.54841
1036 1999 6 0.55955
1036 1999 9 0.5813
1036 1999 12 0.57969
1036 2000 3 0.61093
1036 2000 6 0.60252
1036 2000 9 0.60428
1036 2000 12 0.60025
;
proc sql;
create table shift as
select
b.gvkey,
b.year,
b.month,
a.rating,
a.rtchg
from naratings (where=(rating ne "")) a
left join compq b
on a.gvkey = b.gvkey and a.date le b.date
group by b.gvkey, b.date
having a.date = max(a.date)
;
create table want as
select
a.gvkey,
a.year,
a.month,
a.lev,
b.rating,
b.rtchg
from compq a
left join shift b
on a.gvkey = b.gvkey and a.year = b.year and a.month = b.month
;
quit;
Hi Kurt,
The two datasets that I originally posted here were actually just very small output portions of what were already in naratings and compq at that point in my SAS program. Thus, both naratings and compq have many more variables and observations in them and, as I indicated before, I desire to move my monthly data in naratings to fit appropriately at the next quarter-end in compq as mentioned before (in the case of the dates not coinciding with quarter-ends). Since I did not have the date variable (date=mdy(month,1,year)); referenced by you, I created that variable in both of these datasets and then made a slight adjustment to your code:
proc sql;
create table shift as
select a.*, b.*
from naratings (where=(rating ne "")) a
left join compq b
on a.gvkey = b.gvkey and a.date le b.date
group by b.gvkey, b.date
having a.date = max(a.date)
;
create table want as
select a.*, b.*
from compq a
left join shift b
on a.gvkey = b.gvkey and a.year = b.year and a.month = b.month
;
quit;
On the log file with the creation of the first table (shift), we have:
WARNING: Variable gvkey already exists on file WORK.SHIFT.
WARNING: Variable datadate already exists on file WORK.SHIFT.
WARNING: Variable cusip already exists on file WORK.SHIFT.
WARNING: Variable year already exists on file WORK.SHIFT.
WARNING: Variable month already exists on file WORK.SHIFT.
WARNING: Variable date already exists on file WORK.SHIFT.
With the creation of the second file (want), there is similar output but with many more variables. A small portion of the beginning of it is provided:
WARNING: Variable gvkey already exists on file WORK.WANT.
WARNING: Variable datadate already exists on file WORK.WANT.
WARNING: Variable cusip already exists on file WORK.WANT.
WARNING: Variable year already exists on file WORK.WANT.
WARNING: Variable month already exists on file WORK.WANT.
WARNING: Variable date already exists on file WORK.WANT.
WARNING: Variable fyearq already exists on file WORK.WANT.
WARNING: Variable fqtr already exists on file WORK.WANT.
WARNING: Variable fyr already exists on file WORK.WANT.
Note that the time references of year and month are indicated above. Am I correct to assume at this point that everything lines up well on date, but not year and month? Thank you for your help!
Joe
Do NOT use asterisks in the select of joins, always use specific lists of variables to include, and use table aliases to define from which dataset to take the variables.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.