Hello! I have a coding question for SAS 9.4.
I am trying adjust my quarterly ROA variable for each firm by industry means for each quarter. Thus, my end goal is to have IAROA= ROA - Industry Average for each quarter for each firm.
For a visual, I have a primary dataset that looks similar to this :
Firm Year.Quarter Industry ROA
1 2003.1 01 0.0125
1 2003.2 01 0.0359
1 2003.3 01 0.0451
1 2003.4 01 0.0215
2 2003.1 02 0.125
2 2003.2 02 0.135
2 2003.3 02 0.097
2 2003.4 02 0.102
3 2003.1 01 0.029
3 2003.2 01 0.024
3 2003.3 01 0.035
3 2003.4 01 0.028
... and so forth through 2014 and 75 industry types.
Then, I have found the industry ROA means for each quarter (2003-2014). using PROC MEANS. Specifically I use
proc means data=data.master mean noprint;
output out=IROA mean=;
by industry;
class yearquarter;
var roa ;
So, for example, I have:
Industry year.quarter Industry ROA (IROA)
01 2003.1 .0171
01 2003.2 .0172
01 2003.3 .0173
01 2003.4 .0174
01 2004.1
01 2004.2
01 2004.3
01 2004.4
02 2003.1 .111
02 2003.2 .112
02 2003.3 .113
02 2003.4 .114
02 2004.1
02 2004.2
02 2004.3
02 2004.4
... and so forth over all industries through 2014.
My end goal is to have:
Firm Year.Quarter Industry ROA IROA Industry Adjusted ROA
1 2003.1 01 0.0125 .0171 -.0046 (= .0125-.0171)
1 2003.2 01 0.0359 .0172 .0187 (= .0359-.0172)
1 2003.3 01 0.0451 .0173 .0278
1 2003.4 01 0.0215 .0174 .0041
2 2003.1 02 0.125 .111 .014
2 2003.2 02 0.135 .112 .023
2 2003.3 02 0.097 .113 -.016
2 2003.4 02 0.102 .114 -.012
3 2003.1 01 0.029 .0171 .0119
3 2003.2 01 0.024 .0172 .0068
3 2003.3 01 0.035 .0173 .0177
3 2003.4 01 0.028 .0174 .0106
As you can see, all firms will have the IROA value depending on what industry they belong to and the year.quarter of the observation. For example, Firms 1 and 3 belong to the same industry, so IROA takes on the same value for the same quarters.
Since I have thousand of firms, 75ish industries, and my data spans 11 years, it is not feasible for me to code this manually.
A) Is there a macro that works well?
or
B) I can find the quarterly industry means, but I have not been able to merge it back into the primary dataset so every firm with the matching industry and quarter display the correct value for the industry roa variable. Only the first firm receives the industry variable value an the rest of the firms have "."
Please let me know if there is any pertinent information missing that will allow you to help me with this code piece!
I thank you in advance!!!
Hi @klndsy,
I think, your code should work in principle (it could be simplified here and there, though) if the data are what they should be. The output dataset in your attached screenshot, however, does not look like the result of a merge by SIC2 YEAR, but rather like the result of a merge by SIC2 YQ (see obs. no. 1553 in particular!). One possible explanation is that some variable values shown in the screenshot are in fact formatted values. If, for example, variable YEAR contains (varying!) SAS date values (like those in YQ) which are only formatted with format YEAR., a merge by SIC2 YEAR could indeed yield results very similar to those shown in the screenshot.
Comparing YQ with QUARTER we see sort of an inconsistency which could impact your quarterly results as well: Variable YQ contains SAS date values, but sometimes they are varying within one quarter: YQ=15764='28FEB2003'd in obs. no. 1553 and YQ=15795='31MAR2003'd in obs. no. 1554 belong to the same quarter, but only a variable with a constant value (such as '2003.1') throughout a quarter would be suitable for merging by quarter. Again, it is not clear if variable QUARTER contains values such as 2003.1 (numeric or character?) or if it contains SAS date values, formatted with format YYQP..
I think, once this issue is clarified, you are very close to getting the intended results.
Just to suggest an alternative approach: You could let PROC STDIZE compute both the IROA values and the differences ROA - IROA. Please find below some sample code. However, you are so close to the finish line with your more elementary approach that you may want to continue on this route.
/* Create test data */
data have;
input firm yearquarter industry $ ROA;
cards;
1 2003.1 01 0.0125
1 2003.2 01 0.0359
1 2003.3 01 0.0451
1 2003.4 01 0.0215
2 2003.1 02 0.125
2 2003.2 02 0.135
2 2003.3 02 0.097
2 2003.4 02 0.102
3 2003.1 01 0.029
3 2003.2 01 0.024
3 2003.3 01 0.035
3 2003.4 01 0.028
;
/* Sort input data and perform computations */
proc sort data=have;
by industry yearquarter;
run;
proc stdize data=have method=mean oprefix sprefix=IndAdj
out=have_adj outstat=stats(where=(_type_='LOCATION') rename=(ROA=IROA));
var ROA;
by industry yearquarter;
run;
/* Merge firm-quarter level with industry-quarter level data and restore original sort order */
data want;
merge have_adj
stats(drop=_type_);
by industry yearquarter;
label IndAdjROA='Industry Adjusted ROA';
run;
proc sort data=want;
by firm yearquarter;
run;
I'll be back tomorrow (CET).
Post your code. You should be able to merge the results back in and then do your substraction in a data step.
I would recommend perhaps a SQL join and then you could even do the subtraction in one step. If you want, you can even calculate them all in one step.
Hi Reeza; thank you for your reply.
So far, I have been trying this with annual ROA (ideally I would like quarterly), so my code reflects that. I figured once I get it figured out with annual, then I can apply it to my quarterly data. Additionally, sic2 is the industry variable.
I have done:
proc sort data=compustatrevised3; by sic2 year; run;
proc means data=compustatrevised3 mean noprint;
output out=IROA mean=;
by sic2;
class year;
var roa roa2;
run;
data IROA2;
set IROA (drop=_freq_);
if _type_=0 then delete;
rename roa=indroa roa2=indroa2;
run;
proc sort data=compustatrevised3; by sic2 year;
proc sort data=IROA2; by sic2 year; run;
data trialmerge;
merge compustatrevised3 IROA2;
by sic2 year;
run;
I have tried multiple variations of the merge command, but this is the one that gets me closest. However, it only fills the industry roa mean variable (indroa) for the first few firms of each year. Even then, I can't exactly figure out the problem because most of the time it will only fill the first firm, but sometimes it will fill 2, 3 or 4 firms with the industry roa value! I attached a screenshot (word doc) of my merged output problem.
Is your primary data sample reflective of your actual data format/structure?
Yes, but it is currently sorted differently and includes a lot more variables.
I have tens of thousands of firms spanning many different industries over a period of 2003-2014 with quarterly data. Each firm has a unique ROA each quarter (with some firms missing observations based on data availability).
My actual data set is sorted by industry(sic2) then year.quarter.
For a visual, my actual dataset looks similarly sorted to this (plus many other variables) :
Firm Year.Quarter Industry ROA
1 2003.1 01 0.0125
3 2003.1 01 0.029
1 2003.2 01 0.0359
3 2003.2 01 0.024
1 2003.3 01 0.0451
3 2003.3 01 0.035
1 2003.4 01 0.0215
3 2003.4 01 0.028
2 2003.1 02 0.125
2 2003.2 02 0.135
2 2003.3 02 0.097
2 2003.4 02 0.102
Hi @klndsy,
I think, your code should work in principle (it could be simplified here and there, though) if the data are what they should be. The output dataset in your attached screenshot, however, does not look like the result of a merge by SIC2 YEAR, but rather like the result of a merge by SIC2 YQ (see obs. no. 1553 in particular!). One possible explanation is that some variable values shown in the screenshot are in fact formatted values. If, for example, variable YEAR contains (varying!) SAS date values (like those in YQ) which are only formatted with format YEAR., a merge by SIC2 YEAR could indeed yield results very similar to those shown in the screenshot.
Comparing YQ with QUARTER we see sort of an inconsistency which could impact your quarterly results as well: Variable YQ contains SAS date values, but sometimes they are varying within one quarter: YQ=15764='28FEB2003'd in obs. no. 1553 and YQ=15795='31MAR2003'd in obs. no. 1554 belong to the same quarter, but only a variable with a constant value (such as '2003.1') throughout a quarter would be suitable for merging by quarter. Again, it is not clear if variable QUARTER contains values such as 2003.1 (numeric or character?) or if it contains SAS date values, formatted with format YYQP..
I think, once this issue is clarified, you are very close to getting the intended results.
Just to suggest an alternative approach: You could let PROC STDIZE compute both the IROA values and the differences ROA - IROA. Please find below some sample code. However, you are so close to the finish line with your more elementary approach that you may want to continue on this route.
/* Create test data */
data have;
input firm yearquarter industry $ ROA;
cards;
1 2003.1 01 0.0125
1 2003.2 01 0.0359
1 2003.3 01 0.0451
1 2003.4 01 0.0215
2 2003.1 02 0.125
2 2003.2 02 0.135
2 2003.3 02 0.097
2 2003.4 02 0.102
3 2003.1 01 0.029
3 2003.2 01 0.024
3 2003.3 01 0.035
3 2003.4 01 0.028
;
/* Sort input data and perform computations */
proc sort data=have;
by industry yearquarter;
run;
proc stdize data=have method=mean oprefix sprefix=IndAdj
out=have_adj outstat=stats(where=(_type_='LOCATION') rename=(ROA=IROA));
var ROA;
by industry yearquarter;
run;
/* Merge firm-quarter level with industry-quarter level data and restore original sort order */
data want;
merge have_adj
stats(drop=_type_);
by industry yearquarter;
label IndAdjROA='Industry Adjusted ROA';
run;
proc sort data=want;
by firm yearquarter;
run;
I'll be back tomorrow (CET).
Hi FreelandReinhard! Thank you for your time! Yes, my variables Year and YQ are both formatted variables from datadate which has the format 01012003. I thought I took care of this using put and input, but clearly I did not! I have exhausted everything I can think of with this issue. Do you have any suggestions???
You can do all calculations with one Proc SQL: Industry average ROA are calculated in subquery:
proc sql;
create table want as
select P.Firm,
P.YearQuarter,
P.Industry,
P.ROA,
I.IROA,
(P.roa-I.IROA) as IAROA
from primary P
left join (select Sub.Industry, Sub.YearQuarter, mean(Sub.ROA) as IROA from Primary Sub group by Sub.Industry, Sub.YearQuarter) I
on P.Industry = I.Industry
and P.YearQuarter = I.YearQuarter
order p.Firm, P.YearQuarter;
quit;
@AskoLötjönen: Sure, this would be the most straightforward approach. It just seemed that the OP is more familiar with the data step and PROC MEANS.
I think, the easiest and safest way to prepare the aggregations and merges by year and by quarter is to create variables which contain year and quarter (incl. year), respectively.
Example:
/* Sample data */
data old;
date='24FEB2016'd; /* numeric variable with values like 20508, possibly derived */
run; /* from a raw date value like rawdate='24022016' by using */
/* the INPUT function: date=input(rawdate, ddmmyy8.) */
/* To be created */
data new;
set old;
year=year(date); /* numeric variable with values like 2016 */
quarter=put(date, yyqp.); /* character variable with values like '2016.1' */
run;
You say that YQ is formatted. My impression was rather that YQ is unformatted (screenshot shows SAS date values), but QUARTER could be formatted.
Y'all were so wonderful and helpful. It is all working beautifully now. Thank you very much!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.