BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JPARK
Calcite | Level 5

Hi,

I am looking for sas procedure or sas function that similar to excel sumifs function.

I would to sum _FREQ_ in terms of msa_code_all and Auditor_Key where previousyear< yearend <= yearend

So for   msa_code_all(10100) and Auditor_Key(27) sum of freq should be 8(5+2+1) because yearends of three rows are smaller thanpreviousyear.

Sumifs function is in G column =SUMIFS(D$2:D$21,$A$2:$A$21,"="&$A3,$B$2:$B$21,"="&$B3,$C$2:$C$21,"="&$C3,$E$2:$E$21,"<="&$E3,$E$2:$E$21,">"&$F3).

So far, I came up with the below code but does not work. Can you help me?

proc summary data=example nway;

class Auditor_Key msa_code_all

var _FREQ_;

where previousyear < yearend <= yearend ; 

outputout=testsum sum=;

run;

msa_code_allSIC2Auditor_Key_FREQ_yearendpreviousyearSum of Freq
101006727520100301200903018
101006727220100201200902010
101006727120100101200901011
101006727120031201200212011
1018036846120040901200309011
1018036846120030901200209011
1018046846120011201200012011
1018060846220110601201006012
1018060846120100601200906011
1018060846220090601200806012
1018060846120080601200706011
1018060846120070601200606011
1018060846120060601200506011
10180511091120041201200312011
10180511091120031201200212011
10180511091120021201200112011
10180671091120041201200312011
10180671091120031201200212011
10180671091120021201200112011
1042072220111201201012012
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

One easy way is to merge the data with itself.  This doesn't seem to get the same values, but perhaps the boundary conditions on the date ranges is wrong?

data have ;

  row+1;

  input msa_code_all SIC2 Auditor_Key _FREQ_ yearend previousyear what;

  informat yearend previousyear yymmdd8. ;

  format yearend previousyear yymmdd10. ;

cards;

10100 67 27 5 20100301 20090301 8

10100 67 27 2 20100201 20090201 0

10100 67 27 1 20100101 20090101 1

10100 67 27 1 20031201 20021201 1

10180 36 846 1 20040901 20030901 1

10180 36 846 1 20030901 20020901 1

10180 46 846 1 20011201 20001201 1

10180 60 846 2 20110601 20100601 2

10180 60 846 1 20100601 20090601 1

10180 60 846 2 20090601 20080601 2

10180 60 846 1 20080601 20070601 1

10180 60 846 1 20070601 20060601 1

10180 60 846 1 20060601 20050601 1

10180 51 1091 1 20041201 20031201 1

10180 51 1091 1 20031201 20021201 1

10180 51 1091 1 20021201 20011201 1

10180 67 1091 1 20041201 20031201 1

10180 67 1091 1 20031201 20021201 1

10180 67 1091 1 20021201 20011201 1

10420 7 2 2 20111201 20101201 2

run;

proc sql noprint ;

  create table want as

  select distinct a.*,sum(b._freq_) as newvar

  from have a

     , have b

  where a.msa_code_all = b.msa_code_all

    and a.SIC2 and b.SIC2

    and a.Auditor_Key and b.Auditor_Key

    and b.yearend between a.yearend and a.previousyear

group by a.msa_code_all

        , a.SIC2

        , a.Auditor_Key

        , a.yearend

        , a.previousyear

  order by a.row

  ;

quit;

proc print data=want ;

run;

View solution in original post

10 REPLIES 10
Reeza
Super User

I'm not following. Can you simplify such that you display what you have and what you want?

JPARK
Calcite | Level 5

sorry Reeza!

I want to get the same values in Sum of Freq column using sas code becuase I got that values with excel sumifs function.

my code does not work. do you know how to get values in Sum of Freq column using sas code?

Reeza
Super User

I don't understand the logic of how that column is calculated.

Tom
Super User Tom
Super User

Seems to me the issue is what records to include.

The first group defined by MSA*SIC*AUDITOR has 4 records that total to 9 not 8.

If you want to only include 3 of the 4 records for the first group what logic is excluding the fourth record?

The last column in your report does not make any sense given the data.

JPARK
Calcite | Level 5

Sorry about confusion.

Time matters.. 4th records's yearend(20031201) is not between first records' previousyear(20090301)and yearend(20100301).

First three records' yearends(20100301,20100201,20100101)

are between first records' previousyear(20090301)and yearend(20100301).

Tom
Super User Tom
Super User

One easy way is to merge the data with itself.  This doesn't seem to get the same values, but perhaps the boundary conditions on the date ranges is wrong?

data have ;

  row+1;

  input msa_code_all SIC2 Auditor_Key _FREQ_ yearend previousyear what;

  informat yearend previousyear yymmdd8. ;

  format yearend previousyear yymmdd10. ;

cards;

10100 67 27 5 20100301 20090301 8

10100 67 27 2 20100201 20090201 0

10100 67 27 1 20100101 20090101 1

10100 67 27 1 20031201 20021201 1

10180 36 846 1 20040901 20030901 1

10180 36 846 1 20030901 20020901 1

10180 46 846 1 20011201 20001201 1

10180 60 846 2 20110601 20100601 2

10180 60 846 1 20100601 20090601 1

10180 60 846 2 20090601 20080601 2

10180 60 846 1 20080601 20070601 1

10180 60 846 1 20070601 20060601 1

10180 60 846 1 20060601 20050601 1

10180 51 1091 1 20041201 20031201 1

10180 51 1091 1 20031201 20021201 1

10180 51 1091 1 20021201 20011201 1

10180 67 1091 1 20041201 20031201 1

10180 67 1091 1 20031201 20021201 1

10180 67 1091 1 20021201 20011201 1

10420 7 2 2 20111201 20101201 2

run;

proc sql noprint ;

  create table want as

  select distinct a.*,sum(b._freq_) as newvar

  from have a

     , have b

  where a.msa_code_all = b.msa_code_all

    and a.SIC2 and b.SIC2

    and a.Auditor_Key and b.Auditor_Key

    and b.yearend between a.yearend and a.previousyear

group by a.msa_code_all

        , a.SIC2

        , a.Auditor_Key

        , a.yearend

        , a.previousyear

  order by a.row

  ;

quit;

proc print data=want ;

run;

JPARK
Calcite | Level 5

Oh my good~ It works!!!

Tom! Thank you so much!!! I changed  the boundary conditions on the date ranges to this.

  a.previousyear < b.yearend <= a.yearend

CharlFM
Calcite | Level 5
Hi, first off thanks a lot for the post! This helped me out a lot as well! However, do you think there is a way to improve the performance? I am working with a data set of about 1.4gb and as you can imagine, it takes quite some time to run. I also need to run this with multiple conditions - the one condition increases the size a lot (I guess) and results in an error:
ERROR: Insufficient space in file WORK.'SASTMP-000000447'n.UTILITY.
ERROR: File WORK.'SASTMP-000000447'n.UTILITY is damaged. I/O processing did not complete.

If you could please suggest something, it would be great!
mkeintz
PROC Star

If the records are physically grouped by  msa_code_all/sic2/auditor_key, then this is a forward rolling window, where the window size varies, depending on the yearend and previousyear values.

 

I ran this double DOW, which should be faster and take less disk space.  The ouput was identical to the SQL program  output (with the OP's revised boundaries, and the equality constraints I mentioned elsewhere).

 

If you expect more than 20 records in a given BY group, then increase the array sizes. 

 

data want2;
  array yr{20};
  array prvyr{20};
  array sumfrq  {20};
  do rec=1 to 20;
    sumfrq{rec}=0;
  end;

  do rec=1 by 1 until (last.auditor_key);
    set have;
    by msa_code_all sic2 auditor_key  notsorted;
    yr{rec}=yearend;
    prvyr{rec}= previousyear;
    do J=rec to 1 by -1  while (prvyr{J}<yr{rec}<=yr{J});
      sumfrq{J}=sumfrq{J}+_freq_;
      end;
  end;

  do rec=1 by 1 until (last.auditor_key);
    set have;
    by msa_code_all sic2 auditor_key  notsorted;
    newvar2=sumfrq{rec};
    output;
  end;
  drop  rec yr: prvyr: sumfrq: J ;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

I think the line "and a.sic2 and b.sic2"  should be  "and a.sic2=b.sic2".

 

Same with "and a.auditor_key and b.auditor_key".  Should be "and a.auditor_key=b.auditor_key", yes?

 

MK

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 10233 views
  • 1 like
  • 5 in conversation