Aggregating data in SAS

Accepted Solution Solved
Reply
Contributor
Posts: 30
Accepted Solution

Aggregating data in SAS

Hi all,

I need some help for a code to aggregate data in SAS, i have not done it before i am unable to find help on SAS help as well. I want to aggregate a data which has a daily frequency to weekly (or monthly). I have a data set as follows,

IDyear monthdayincomehours_worked
120091110010
12009122008
12009131509
12009141657
120091535010
1200916905.5
12009171206.7
12009111359.8
120091214010
12009131604
12009141005.5
12009151506.7
12009161659.8
12009173508
22009118010
2200912758
22009133509
22009141207
220091520010
22009161505.5
22009171656.7
22009111609.8
220091215010
22009131654
22009141705.5
22009151506.7
22009161309.8
22009171008

I want to aggregate such a data by ID and days (to weekly) to obtain the dataset below,

IDyear monthweekincomehours_worked
1200911167.85718.028571
1200912171.42867.685714
2200911162.85718.028571
2200912146.42867.685714

Please advice .

Jessica


Accepted Solutions
Solution
‎05-26-2014 01:21 PM
Trusted Advisor
Posts: 1,228

Re: Aggregating data in SAS

Posted in reply to Jessica98

Hi,

Using proc tabulate.

Thanks,

Naeem

data want;
set have;
by id year;
if first.year then week=0;
if day=1 then week+1;
run;

proc tabulate data=want;
class id year month week;
var income hours_worked;
table id*year*month*week,(income hours_worked)*mean=' '*format=8.6;
run;

View solution in original post


All Replies
Respected Advisor
Posts: 4,920

Re: Aggregating data in SAS

Posted in reply to Jessica98

I assumed you wanted week numbers within a year and that the first day of the first week of the year is numbered 1  :

data have;

input

ID year month day income hours_worked;

datalines;

1 2009 1 1 100 10

1 2009 1 2 200 8

1 2009 1 3 150 9

1 2009 1 4 165 7

1 2009 1 5 350 10

1 2009 1 6 90 5.5

1 2009 1 7 120 6.7

1 2009 1 1 135 9.8

1 2009 1 2 140 10

1 2009 1 3 160 4

1 2009 1 4 100 5.5

1 2009 1 5 150 6.7

1 2009 1 6 165 9.8

1 2009 1 7 350 8

2 2009 1 1 80 10

2 2009 1 2 75 8

2 2009 1 3 350 9

2 2009 1 4 120 7

2 2009 1 5 200 10

2 2009 1 6 150 5.5

2 2009 1 7 165 6.7

2 2009 1 1 160 9.8

2 2009 1 2 150 10

2 2009 1 3 165 4

2 2009 1 4 170 5.5

2 2009 1 5 150 6.7

2 2009 1 6 130 9.8

2 2009 1 7 100 8

;

data weeks / view=weeks;

set have; by id year;

if first.year then week = 0;

if day=1 then week + 1;

run;

proc sql;

create table want as

select id, year, month, week, mean(income) as income,

    mean(hours_worked) as hours_worked

from weeks

group by id, year, month, week;

drop view weeks;

select * from want;

quit;

PG

PG
Contributor
Posts: 30

Re: Aggregating data in SAS

Hi PG,

Thanks a lot for your help, the code works but unfortunately i need the results to be a bit different. If i i run your code i get as follows,

IDyearmonthweekincomehours_worked
1200911....
1200922....
1200933....
1200944....
1200955....
1200966....
1200977....
1200988....
1200999....
120091010....
120091111....
1201011....
1201022....
1201033....
1201044....
1201054....
1201075....
1201086....
1201097....
12010108....
12010119....
120101210....
2200951....
2200962....
2200973....
2200984....
2200995....
22009106....
22009117....
2201011....
2201022....
2201033....
2201044....
2201055....
2201066....
2201077....
2201088....
2201099....
220101010....
220101111....
220101212....

What i want is for the data to be aggregated by weeks for every month by ID for example , for the first month for a given year the first seven days is the first week, the next seven the second and so on.

IDyearmonthweekincomehours_worked
1200911....
1200912....
1200913....
1200914....
1200921....
1200922....

Jessica

Super Contributor
Posts: 297

Re: Aggregating data in SAS

Posted in reply to Jessica98

You could use a hash object, although there are limitations depending on the amount of RAM you have available, but it should be reasonably efficient otherwise.

DATA _NULL_;

  DECLARE HASH HH (ORDERED:'YES');

  HH.DEFINEKEY ('ID','YEAR','MONTH','WEEK') ;

  HH.DEFINEDATA ('ID','YEAR','MONTH','WEEK'/*,'SUMINCOME'*/,'AVG_INCOME','AVG_HOURS') ;

  HH.DEFINEDONE () ;

  DO UNTIL (DONE);

  SET HAVE END=DONE;

  BY ID YEAR MONTH;

  IF FIRST.MONTH THEN WEEK = 0;

  IF DAY=1 THEN WEEK + 1;

  IF HH.FIND() ~= 0 THEN DO;

  COUNT      = 1;

  SUMINCOME  = INCOME;

  SUMHOURS   = HOURS_WORKED;

  AVG_INCOME = SUMINCOME/COUNT;

  AVG_HOURS  = SUMHOURS/COUNT;

  HH.ADD();

  OUTPUT;

     END;

  ELSE IF HH.FIND() = 0 THEN DO;

  COUNT+1;

  SUMINCOME  = SUM(SUMINCOME,INCOME);

  SUMHOURS   = SUM(SUMHOURS,HOURS_WORKED);

  AVG_INCOME = SUMINCOME/COUNT;

  AVG_HOURS  = SUMHOURS/COUNT;

  HH.REPLACE();

  OUTPUT;

       END;

  END;

  HH.OUTPUT (DATASET: 'WANT') ;

RUN;

Super User
Posts: 7,781

Re: Aggregating data in SAS

Posted in reply to Jessica98

Your problem has a perfect SAS solution in PROC SUMMARY:

data have1;

set have;

by id year;

if first.year then week = 0;

if day=1 then week + 1;

run;

proc summary data=have1;

by id year week;

var income hours_worked;

output

  out=want (drop=_freq_ _type_)

  mean=

;

run;

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 30

Re: Aggregating data in SAS

Posted in reply to KurtBremser

Hi,

that was simple and helpful, however i think when you put day=1 to indicate the increment in week, it makes a month=1 week. I would need to increment week for every 7 days right?

Super User
Posts: 7,781

Re: Aggregating data in SAS

Posted in reply to Jessica98

It depends on the original state of the data and the desired results.

If you can safely assume that the data ist sorted by id and date (and all dates have a record), then simply incrementing week everytime day = 1 will suffice to give you the "week of the year".

If you do have a real date in your base dataset, you can use a SAS function to derive the week of the year from that.

My main point was to show how easily the aggregation is done with PROC SUMMARY. Determining the week depends on what you want to achieve. If you want to start the weeks with 1 in every month, then you will also need to have the month in the BY statement in PROC SUMMARY.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Trusted Advisor
Posts: 3,212

Re: Aggregating data in SAS

Posted in reply to Jessica98

You could use proc tabulate with an output=  option like 25404 - PROC TABULATE data set output  surpressing the print output is also possible.  

---->-- ja karman --<-----
Contributor
Posts: 30

Re: Aggregating data in SAS

Thanks jaap karman,

The code you gave me is taking only the first days fare for all the first weeks, i think there is some problem counting the weeks. I think a slight modification of the code by PG stats would be helpful

jessica

Trusted Advisor
Posts: 3,212

Re: Aggregating data in SAS

Posted in reply to Jessica98

I proposed proc tabulate as you are wanting some calculations for groups. That is exactly what  tabulate is designed to do.
The example of the data is not your data but you are getting the idea. The week-numbers could be a problem by definition unless you are aware of that definition (ISO 8601).

---->-- ja karman --<-----
Trusted Advisor
Posts: 3,212

Re: Aggregating data in SAS

Posted in reply to Jessica98

Nice that "Proc summary" was a success "proc tabulate" and "proc report" are the successors-enhancements on that.

The date processing is more nastier not by the technical issue-s but by a lot of used conventions that are not aligned (confusing).
That is why I mentioned ISO8601 http://support.sas.com/resources/papers/proceedings13/026-2013.pdf . The first ay of a week Sunday or Monday. The first week of a year starts when the Thursday/Friday are workable.  Week numbers can span years and months.
A lot more of those kind to be experienced remember the milo-testing and time-traveling.

---->-- ja karman --<-----
Respected Advisor
Posts: 4,920

Re: Aggregating data in SAS

Posted in reply to Jessica98

A small change and it works by month.

data have;

input

ID year month day income hours_worked;

datalines;

1 2009 1 1 100 10

1 2009 1 2 200 8

1 2009 1 3 150 9

1 2009 1 4 165 7

1 2009 1 5 350 10

1 2009 1 6 90 5.5

1 2009 1 7 120 6.7

1 2009 1 1 135 9.8

1 2009 1 2 140 10

1 2009 1 3 160 4

1 2009 1 4 100 5.5

1 2009 1 5 150 6.7

1 2009 1 6 165 9.8

1 2009 1 7 350 8

2 2009 1 1 80 10

2 2009 1 2 75 8

2 2009 1 3 350 9

2 2009 1 4 120 7

2 2009 1 5 200 10

2 2009 1 6 150 5.5

2 2009 1 7 165 6.7

2 2009 1 1 160 9.8

2 2009 1 2 150 10

2 2009 1 3 165 4

2 2009 1 4 170 5.5

2 2009 1 5 150 6.7

2 2009 1 6 130 9.8

2 2009 1 7 100 8

;

data weeks / view=weeks;

set have; by id year month;

if first.month then week = 0;

if day=1 then week + 1;

run;

title "With SQL";

proc sql;

create table want as

select id, year, month, week, mean(income) as income,

    mean(hours_worked) as hours_worked

from weeks

group by id, year, month, week;

*drop view weeks;

select * from want;

quit;

title "With Proc Summary";

proc summary data=weeks;

by id year month week;

var income hours_worked;

output out=want(drop=_Smiley Happy mean=income hours_worked;

run;

proc print data=want noobs; run;

PG

PG
Super User
Posts: 19,789

Re: Aggregating data in SAS

Posted in reply to Jessica98

I'll suggest using a format to calculate your weeks. This way you only make one pass of your data, which is helpful if it's a large data set.

proc format;

    value weekfmt

    1 - 7 = "Week 1"

    8 - 14 = "Week 2"

    15 - 21 = "Week 3"

    22 - 28 = "Week 4"

    29 - 31 = "Week 5";

run;

data have;

input

ID year month day income hours_worked;

datalines;

1 2009 1 1 100 10

1 2009 1 2 200 8

1 2009 1 3 150 9

1 2009 1 4 165 7

1 2009 1 5 350 10

1 2009 1 6 90 5.5

1 2009 1 7 120 6.7

1 2009 1 8 135 9.8

1 2009 1 9 140 10

1 2009 1 10 160 4

1 2009 1 11 100 5.5

1 2009 1 12 150 6.7

1 2009 1 13 165 9.8

1 2009 1 14 350 8

2 2009 1 1 80 10

2 2009 1 2 75 8

2 2009 1 3 350 9

2 2009 1 4 120 7

2 2009 1 5 200 10

2 2009 1 6 150 5.5

2 2009 1 7 165 6.7

2 2009 2 1 160 9.8

2 2009 2 2 150 10

2 2009 2 3 165 4

2 2009 2 4 170 5.5

2 2009 2 5 150 6.7

2 2009 2 6 130 9.8

2 2009 2 7 100 8

;

proc means data=have;

class id year month day;

ways 4;

format day weekfmt.;

var income hours_worked;

output

  out=want (drop=_freq_ _type_)

  mean=

;

run;

Frequent Contributor
Posts: 75

Re: Aggregating data in SAS

Posted in reply to Jessica98

Jessica - This works as per your requirement...!

data income;

input ID $ year $ month $ day $ income hours_worked;

datalines;

1     2009  1     1     100   10

1     2009  1     2     200   8

1     2009  1     3     150   9

1     2009  1     4     165   7

1     2009  1     5     350   10

1     2009  1     6     90    5.5

1     2009  1     7     120   6.7

1     2009  1     1     135   9.8

1     2009  1     2     140   10

1     2009  1     3     160   4

1     2009  1     4     100   5.5

1     2009  1     5     150   6.7

1     2009  1     6     165   9.8

1     2009  1     7     350   8

2     2009  1     1     80    10

2     2009  1     2     75    8

2     2009  1     3     350   9

2     2009  1     4     120   7

2     2009  1     5     200   10

2     2009  1     6     150   5.5

2     2009  1     7     165   6.7

2     2009  1     1     160   9.8

2     2009  1     2     150   10

2     2009  1     3     165   4

2     2009  1     4     170   5.5

2     2009  1     5     150   6.7

2     2009  1     6     130   9.8

2     2009  1     7     100   8

;

run;

proc sort data = income;

by id year month;

run;

data include_week;

set income;

by id year month;

retain count week;

if first.id and first.year and first.month then count=0;

count=count+1;

week=ceil(count/7);

run;

proc sql;

select id, year, month, week, avg(income) as weekly_income, avg(hours_worked) as weekly_hours_worked

from include_week

group by id, year, month, week

;

quit;

Regards

Dhana


Solution
‎05-26-2014 01:21 PM
Trusted Advisor
Posts: 1,228

Re: Aggregating data in SAS

Posted in reply to Jessica98

Hi,

Using proc tabulate.

Thanks,

Naeem

data want;
set have;
by id year;
if first.year then week=0;
if day=1 then week+1;
run;

proc tabulate data=want;
class id year month week;
var income hours_worked;
table id*year*month*week,(income hours_worked)*mean=' '*format=8.6;
run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 521 views
  • 6 likes
  • 8 in conversation