Hi,
I am looking for sas procedure or sas function that similar to excel sumifs function.
I would to sum _FREQ_ in terms of msa_code_all and Auditor_Key where previousyear< yearend <= yearend
So for msa_code_all(10100) and Auditor_Key(27) sum of freq should be 8(5+2+1) because yearends of three rows are smaller thanpreviousyear.
Sumifs function is in G column =SUMIFS(D$2:D$21,$A$2:$A$21,"="&$A3,$B$2:$B$21,"="&$B3,$C$2:$C$21,"="&$C3,$E$2:$E$21,"<="&$E3,$E$2:$E$21,">"&$F3).
So far, I came up with the below code but does not work. Can you help me?
proc summary data=example nway;
class Auditor_Key msa_code_all
var _FREQ_;
where previousyear < yearend <= yearend ;
outputout=testsum sum=;
run;
msa_code_all | SIC2 | Auditor_Key | _FREQ_ | yearend | previousyear | Sum of Freq |
10100 | 67 | 27 | 5 | 20100301 | 20090301 | 8 |
10100 | 67 | 27 | 2 | 20100201 | 20090201 | 0 |
10100 | 67 | 27 | 1 | 20100101 | 20090101 | 1 |
10100 | 67 | 27 | 1 | 20031201 | 20021201 | 1 |
10180 | 36 | 846 | 1 | 20040901 | 20030901 | 1 |
10180 | 36 | 846 | 1 | 20030901 | 20020901 | 1 |
10180 | 46 | 846 | 1 | 20011201 | 20001201 | 1 |
10180 | 60 | 846 | 2 | 20110601 | 20100601 | 2 |
10180 | 60 | 846 | 1 | 20100601 | 20090601 | 1 |
10180 | 60 | 846 | 2 | 20090601 | 20080601 | 2 |
10180 | 60 | 846 | 1 | 20080601 | 20070601 | 1 |
10180 | 60 | 846 | 1 | 20070601 | 20060601 | 1 |
10180 | 60 | 846 | 1 | 20060601 | 20050601 | 1 |
10180 | 51 | 1091 | 1 | 20041201 | 20031201 | 1 |
10180 | 51 | 1091 | 1 | 20031201 | 20021201 | 1 |
10180 | 51 | 1091 | 1 | 20021201 | 20011201 | 1 |
10180 | 67 | 1091 | 1 | 20041201 | 20031201 | 1 |
10180 | 67 | 1091 | 1 | 20031201 | 20021201 | 1 |
10180 | 67 | 1091 | 1 | 20021201 | 20011201 | 1 |
10420 | 7 | 2 | 2 | 20111201 | 20101201 | 2 |
One easy way is to merge the data with itself. This doesn't seem to get the same values, but perhaps the boundary conditions on the date ranges is wrong?
data have ;
row+1;
input msa_code_all SIC2 Auditor_Key _FREQ_ yearend previousyear what;
informat yearend previousyear yymmdd8. ;
format yearend previousyear yymmdd10. ;
cards;
10100 67 27 5 20100301 20090301 8
10100 67 27 2 20100201 20090201 0
10100 67 27 1 20100101 20090101 1
10100 67 27 1 20031201 20021201 1
10180 36 846 1 20040901 20030901 1
10180 36 846 1 20030901 20020901 1
10180 46 846 1 20011201 20001201 1
10180 60 846 2 20110601 20100601 2
10180 60 846 1 20100601 20090601 1
10180 60 846 2 20090601 20080601 2
10180 60 846 1 20080601 20070601 1
10180 60 846 1 20070601 20060601 1
10180 60 846 1 20060601 20050601 1
10180 51 1091 1 20041201 20031201 1
10180 51 1091 1 20031201 20021201 1
10180 51 1091 1 20021201 20011201 1
10180 67 1091 1 20041201 20031201 1
10180 67 1091 1 20031201 20021201 1
10180 67 1091 1 20021201 20011201 1
10420 7 2 2 20111201 20101201 2
run;
proc sql noprint ;
create table want as
select distinct a.*,sum(b._freq_) as newvar
from have a
, have b
where a.msa_code_all = b.msa_code_all
and a.SIC2 and b.SIC2
and a.Auditor_Key and b.Auditor_Key
and b.yearend between a.yearend and a.previousyear
group by a.msa_code_all
, a.SIC2
, a.Auditor_Key
, a.yearend
, a.previousyear
order by a.row
;
quit;
proc print data=want ;
run;
I'm not following. Can you simplify such that you display what you have and what you want?
sorry Reeza!
I want to get the same values in Sum of Freq column using sas code becuase I got that values with excel sumifs function.
my code does not work. do you know how to get values in Sum of Freq column using sas code?
I don't understand the logic of how that column is calculated.
Seems to me the issue is what records to include.
The first group defined by MSA*SIC*AUDITOR has 4 records that total to 9 not 8.
If you want to only include 3 of the 4 records for the first group what logic is excluding the fourth record?
The last column in your report does not make any sense given the data.
Sorry about confusion.
Time matters.. 4th records's yearend(20031201) is not between first records' previousyear(20090301)and yearend(20100301).
First three records' yearends(20100301,20100201,20100101)
are between first records' previousyear(20090301)and yearend(20100301).
One easy way is to merge the data with itself. This doesn't seem to get the same values, but perhaps the boundary conditions on the date ranges is wrong?
data have ;
row+1;
input msa_code_all SIC2 Auditor_Key _FREQ_ yearend previousyear what;
informat yearend previousyear yymmdd8. ;
format yearend previousyear yymmdd10. ;
cards;
10100 67 27 5 20100301 20090301 8
10100 67 27 2 20100201 20090201 0
10100 67 27 1 20100101 20090101 1
10100 67 27 1 20031201 20021201 1
10180 36 846 1 20040901 20030901 1
10180 36 846 1 20030901 20020901 1
10180 46 846 1 20011201 20001201 1
10180 60 846 2 20110601 20100601 2
10180 60 846 1 20100601 20090601 1
10180 60 846 2 20090601 20080601 2
10180 60 846 1 20080601 20070601 1
10180 60 846 1 20070601 20060601 1
10180 60 846 1 20060601 20050601 1
10180 51 1091 1 20041201 20031201 1
10180 51 1091 1 20031201 20021201 1
10180 51 1091 1 20021201 20011201 1
10180 67 1091 1 20041201 20031201 1
10180 67 1091 1 20031201 20021201 1
10180 67 1091 1 20021201 20011201 1
10420 7 2 2 20111201 20101201 2
run;
proc sql noprint ;
create table want as
select distinct a.*,sum(b._freq_) as newvar
from have a
, have b
where a.msa_code_all = b.msa_code_all
and a.SIC2 and b.SIC2
and a.Auditor_Key and b.Auditor_Key
and b.yearend between a.yearend and a.previousyear
group by a.msa_code_all
, a.SIC2
, a.Auditor_Key
, a.yearend
, a.previousyear
order by a.row
;
quit;
proc print data=want ;
run;
Oh my good~ It works!!!
Tom! Thank you so much!!! I changed the boundary conditions on the date ranges to this.
a.previousyear < b.yearend <= a.yearend
If the records are physically grouped by msa_code_all/sic2/auditor_key, then this is a forward rolling window, where the window size varies, depending on the yearend and previousyear values.
I ran this double DOW, which should be faster and take less disk space. The ouput was identical to the SQL program output (with the OP's revised boundaries, and the equality constraints I mentioned elsewhere).
If you expect more than 20 records in a given BY group, then increase the array sizes.
data want2;
array yr{20};
array prvyr{20};
array sumfrq {20};
do rec=1 to 20;
sumfrq{rec}=0;
end;
do rec=1 by 1 until (last.auditor_key);
set have;
by msa_code_all sic2 auditor_key notsorted;
yr{rec}=yearend;
prvyr{rec}= previousyear;
do J=rec to 1 by -1 while (prvyr{J}<yr{rec}<=yr{J});
sumfrq{J}=sumfrq{J}+_freq_;
end;
end;
do rec=1 by 1 until (last.auditor_key);
set have;
by msa_code_all sic2 auditor_key notsorted;
newvar2=sumfrq{rec};
output;
end;
drop rec yr: prvyr: sumfrq: J ;
run;
I think the line "and a.sic2 and b.sic2" should be "and a.sic2=b.sic2".
Same with "and a.auditor_key and b.auditor_key". Should be "and a.auditor_key=b.auditor_key", yes?
MK
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.