Quartz | Level 8

## Aggregating dataset by date, including missing dates

Hi,

I have a dataset, work.PIC, which contains transactions (\$ amounts) per Address and Date. This dataset is sorted by Address and Date.

Sample Dataset:

`  Address    Date      Amount Property 1 31/01/2014 -770  Property 1 31/01/2014 -550  Property 1 31/01/2014 -20000  Property 1 31/01/2014 -52329.6  Property 1 31/07/2014 -660  Property 1 31/10/2014 -1738  Property 1 30/06/2015 -845.32  Property 1 30/06/2015 845.32  Property 1 30/06/2015 -845.32  Property 1 31/12/2015 845.32  Property 1 31/12/2015 -845.32  Property 1 31/12/2015 845.32  Property 2 31/01/2014 -1320  Property 2 31/01/2014 -2895.99  Property 2 31/01/2014 -653.32  Property 2 31/01/2014 -40000  Property 2 31/01/2014 -121954.8  Property 2 30/09/2015 -3500  Property 3 31/01/2014 -1200  Property 3 31/01/2014 -28080  Property 3 31/01/2014 -130483  Property 3 30/11/2015 -885  Property 3 31/12/2015 885`

For each address, I would like to create a running total per Address, with output at each Date. I have accomplished this through the code below:

Current code:

```data work.PIC_agg (keep = Address Date Value);
set work.PIC;
if First.Address then Value = 0;
value + Amount;
if Last.Date then output;
run;```

Problem:

This issue with the above is that it then only contains information for each property on a date where transactions occurred e.g. Property 1 will only have entries for Jan14, Jul14, Oct14, Jun15, and Dec15.

I would like to have a value for each date i.e. from Jan14 through to Dec15. If there is no transaction in a given month, then the value equals the previous month's total.

Proposed methodology:

Store each unique element of the Date variable in an array (DateArray)

Create a new variable for each element in DateArray

For each element in DataArray, sum over all transactions for a particular Address, where the Date<= DateArray

Store the resultant number in the correct variable for each Address

Any ideas on where the above methodology is sound, and, if so, how to implement. If not, could someone perhaps suggest a better method?

Please let me know if the above is unclear.

Many thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Rhodochrosite | Level 12

## Re: Aggregating dataset by date, including missing dates

Hi,

This newly modified code, has a running total (value), as well as, original monthy total

``````data have;
length address \$10 date 8 amount 8;
format date ddmmyy10.;
INPUT @1 address \$10. @12 date :ddmmyy10. amount :best.;
datalines;
Property 1 31/01/2014 -770
Property 1 31/01/2014 -550
Property 1 31/01/2014 -20000
Property 1 31/01/2014 -52329.6
Property 1 31/07/2014 -660
Property 1 31/10/2014 -1738
Property 1 30/06/2015 -845.32
Property 1 30/06/2015 845.32
Property 1 30/06/2015 -845.32
Property 1 31/12/2015 845.32
Property 1 31/12/2015 -845.32
Property 1 31/12/2015 845.32
Property 2 31/01/2014 -1320
Property 2 31/01/2014 -2895.99
Property 2 31/01/2014 -653.32
Property 2 31/01/2014 -40000
Property 2 31/01/2014 -121954.8
Property 2 30/09/2015 -3500
Property 3 31/01/2014 -1200
Property 3 31/01/2014 -28080
Property 3 31/01/2014 -130483
Property 3 30/11/2015 -885
Property 3 31/12/2015 885
;
run;

/* Create aggregate by address,date */
proc summary data=have nway;
var	amount;
output out=have_sum(drop=_:) sum=;
run;

/* For every address, ensure there are no monthly gaps */

length curr_dt curr_amt curr_val prev_dt prev_amt value prev_val 8;
retain curr_dt curr_amt curr_val prev_dt prev_amt value prev_val;
format curr_dt prev_dt ddmmyy10.;

SET have_sum;

do;
value = amount;
output; /* ensure it doesn't get left out */
end;
else if (first.date) then
value + amount;

prev_dt= lag(date);
prev_val= lag(value);
prev_amt= lag(amount);
put prev_dt= prev_val= prev_amt=;

/* Find the monthly gap */
diff= INTCK('month',prev_dt,date);
*put diff=;

if (diff > 0) then
do; /* Fill-in missing months */

/* Preserve current record values */
curr_dt = date;
curr_val = value;
curr_amt = amount;

/* Output missing months */
do i=1 to (diff-1) by 1;
date=intnx('month',prev_dt,i,'E');
amount=prev_amt;
value=prev_val;
OUTPUT;
end;

/* Don't forget current record */
date=curr_dt;
amount=curr_amt;
value=curr_val;
OUTPUT;

end; /* Fill-in missing months */
end;
run;
Proc print data=want; run;``````

Hope if gives you what you are looking for,

Ahmed

27 REPLIES 27
Opal | Level 21

## Re: Aggregating dataset by date, including missing dates

You say you have a variable PERIOD, but you haven't shown us anything about that variable. You'll likely get more responses if you show the forum your example have and want datasets .. each in the form of datasteps and posted using the {i} icon.

Art, CEO, AnalystFinder.com

Rhodochrosite | Level 12

## Re: Aggregating dataset by date, including missing dates

Hi, this uses an array but not in the way (I think) that you proposed. There probably is a more efficient INTNX-related metod, but  this fills in the missing months ...

* aggregate amounts by address and date;

proc summary data=x nway;
var amount;
output out=xx (drop=_:) sum=;
run;

* fill in the missing months;

data y;
array _a(0:1000);
do j=1 by 1 until (last.address);
set xx;
if first.address then first = date;
_a(date - first) = amount;
end;
do j=0 to (date - first);
value = sum(value,_a(j));
date = first+j;
if day(date+1) eq 1 then output;
end;
run;

Portion of output ...

``````DATA SET: y

Property1    31/01/2014     -73649.60
Property1    28/02/2014     -73649.60
Property1    31/03/2014     -73649.60
Property1    30/04/2014     -73649.60
Property1    31/05/2014     -73649.60
Property1    30/06/2014     -73649.60
Property1    31/07/2014     -74309.60
Property1    31/08/2014     -74309.60
Property1    30/09/2014     -74309.60
Property1    31/10/2014     -76047.60
Property1    30/11/2014     -76047.60
Property1    31/12/2014     -76047.60
Property1    31/01/2015     -76047.60
Property1    28/02/2015     -76047.60
Property1    31/03/2015     -76047.60
Property1    30/04/2015     -76047.60
Property1    31/05/2015     -76047.60
Property1    30/06/2015     -76892.92
Property1    31/07/2015     -76892.92
Property1    31/08/2015     -76892.92
Property1    30/09/2015     -76892.92
Property1    31/10/2015     -76892.92
Property1    30/11/2015     -76892.92
Property1    31/12/2015     -76047.60
``````

Quartz | Level 8

## Re: Aggregating dataset by date, including missing dates

Hi,
Thank you for the response, very helpful!

1) Is there any way I could get the resultant dataset to contain each month up to a given period? For example, using the code you kindly provided, Property 1 data goes up to 31Dec15, while Property 2 data only goes up to 30Sep15. Ideally I would like all the properties to generate up to the same month, if that makes sense?

2) Would you be able to walk me through what your code is doing? It clearly works, but I'm quite new to SAS and am not able to follow it.

Rhodochrosite | Level 12

## Re: Aggregating dataset by date, including missing dates

For maximum date in the data set for all PROPERTIES ...

data y;
retain max_date;
array _a(0:1000);
do j=1 by 1 until (last.address);
set xx;
if first.address then first = date;
_a(date - first) = amount;
max_date = max(max_date, date);
end;
do j=0 to max_date-first;
value = sum(value,_a(j));
date = first+j;
if day(date+1) eq 1 then output;
end;
run;

Send me an email ... msz03albany.edu ... and I'll do the walk through for you.

Quartz | Level 8

## Re: Aggregating dataset by date, including missing dates

Many thanks, the above works perfectly.

Tried sending you an email by it can't send, as it can't resolve the domain name in msz03albany.edu?
Rhodochrosite | Level 12

## Re: Aggregating dataset by date, including missing dates

Hi, still needs a couple fixes, only worked since the  max date is in the first group of addresses.

Will fix later and send to you. My email is msz03@albany.edu

Quartz | Level 8

## Re: Aggregating dataset by date, including missing dates

Hi,

I'm looking at this again today, and just noticed I'm getting some strange results.

If I use the dataset below:

```Address                 Date            Amount1000 Sumac Drive	31/05/2014	250
1000 Sumac Drive	30/06/2014	184252
1000 Sumac Drive	31/07/2014	4100
1000 Sumac Drive	31/08/2014	2572
1000 Sumac Drive	28/02/2015	402.98
1000 Sumac Drive	31/08/2015	-933.87
1000 Sumac Drive	30/09/2015	6520.72
1000 Sumac Drive	31/12/2015	-2632.98```

I'm getting the following output:

```1000 Sumac Drive	31/05/2014	-2382.98
1000 Sumac Drive	30/06/2014	181869.02
1000 Sumac Drive	31/07/2014	185969.02
1000 Sumac Drive	31/08/2014	188541.02
1000 Sumac Drive	30/09/2014	188541.02
1000 Sumac Drive	31/10/2014	188541.02
1000 Sumac Drive	30/11/2014	188541.02
1000 Sumac Drive	31/12/2014	188541.02
1000 Sumac Drive	31/01/2015	188541.02
1000 Sumac Drive	28/02/2015	188944
1000 Sumac Drive	31/03/2015	188944
1000 Sumac Drive	30/04/2015	188944
1000 Sumac Drive	31/05/2015	188944
1000 Sumac Drive	30/06/2015	188944
1000 Sumac Drive	31/07/2015	188944
1000 Sumac Drive	31/08/2015	188010.13
1000 Sumac Drive	30/09/2015	194530.85
1000 Sumac Drive	31/10/2015	194530.85
1000 Sumac Drive	30/11/2015	194530.85
1000 Sumac Drive	31/12/2015	191897.87
1000 Sumac Drive	31/01/2016	191897.87
1000 Sumac Drive	29/02/2016	191897.87
1000 Sumac Drive	31/03/2016	191897.87
1000 Sumac Drive	30/04/2016	191897.87
1000 Sumac Drive	31/05/2016	191897.87
1000 Sumac Drive	30/06/2016	191897.87
1000 Sumac Drive	31/07/2016	191897.87
1000 Sumac Drive	31/08/2016	191897.87
1000 Sumac Drive	30/09/2016	191897.87
1000 Sumac Drive	31/10/2016	191897.87
1000 Sumac Drive	30/11/2016	191897.87
1000 Sumac Drive	31/12/2016	191897.87
1000 Sumac Drive	31/01/2017	191897.87
1000 Sumac Drive	28/02/2017	191897.87
1000 Sumac Drive	31/03/2017	191897.87
1000 Sumac Drive	30/04/2017	191897.87
1000 Sumac Drive	31/05/2017	191897.87
1000 Sumac Drive	30/06/2017	191897.87
1000 Sumac Drive	31/07/2017	191897.87	```

I can't work out why it is misbehaving?

Any insight would be amazing!

Thanks!

Quartz | Level 8

## Re: Aggregating dataset by date, including missing dates

If it helps at all, it seems to be aggregating the first and last lines of the first dataset into the first line of second dataset, I just don't know why
Rhodochrosite | Level 12

## Re: Aggregating dataset by date, including missing dates

Mike,
Check my 2nd answer (07/10/17 5:xx PM) to your original post. I had included a sample output, that may just match what you are looking for?

Ahmed
Rhodochrosite | Level 12

## Re: Aggregating dataset by date, including missing dates

Hi Mike,

Would the following work for you?

```data have;
length address \$10 date 8 amount 8;
format date ddmmyy10.;
INPUT @1 address \$10. @12 date :ddmmyy10. amount :best.;
datalines;
Property 1 31/01/2014 -770
Property 1 31/01/2014 -550
Property 1 31/01/2014 -20000
Property 1 31/01/2014 -52329.6
Property 1 31/07/2014 -660
Property 1 31/10/2014 -1738
Property 1 30/06/2015 -845.32
Property 1 30/06/2015 845.32
Property 1 30/06/2015 -845.32
Property 1 31/12/2015 845.32
Property 1 31/12/2015 -845.32
Property 1 31/12/2015 845.32
Property 2 31/01/2014 -1320
Property 2 31/01/2014 -2895.99
Property 2 31/01/2014 -653.32
Property 2 31/01/2014 -40000
Property 2 31/01/2014 -121954.8
Property 2 30/09/2015 -3500
Property 3 31/01/2014 -1200
Property 3 31/01/2014 -28080
Property 3 31/01/2014 -130483
Property 3 30/11/2015 -885
Property 3 31/12/2015 885
;
run;

/* Using the /SPARSE option to fill-in the missing dates for the properties */
proc freq data=have;
table address*date / SPARSE out=all list noprint;
run;

/* Combine the original data with only the filler records (count=0) */
data final_v;
SET have all(where=(count=0));
run;

/* Produce the final aggregation table */
proc summary data=final_v nway;
var	amount;
output out=want(drop=_:) sum=;
run;```

Hope this helps,

Ahmed

Quartz | Level 8

## Re: Aggregating dataset by date, including missing dates

Hey, thanks for the response. It says "Proc freq not found". Is it part of Base SAS?
Super User

## Re: Aggregating dataset by date, including missing dates

@MikeFranz wrote:
Hey, thanks for the response. It says "Proc freq not found". Is it part of Base SAS?

Yes it is. Check your code for other errors.

Quartz | Level 8

## Re: Aggregating dataset by date, including missing dates

Hi,

My code is:

```proc freq data=work.pic;
table address*Date / SPARSE out=all list noprint;
run;```

which is copy paste from the solution, with only the dataset name changed.

I'm getting the following errors:

``` ERROR: The FREQ procedure cannot be run.

I have been not been able to run PROC FREQ in the past. I had just assumed it was because it wasn't part of Base SAS.

If it helps, running PROC SETINIT yields the following:

``` 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
55
56         PROC SETINIT;
57
58         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
Original site validation data
Current version: 9.04.01M3P062415
System birthday:   19JAN2017.
Operating System:   WX64_SV .
Product expiration dates:
---Base SAS Software                                                                                    14JAN2018
---SAS/GRAPH                                                                                            14JAN2018
---SAS Integration Technologies                                                                         14JAN2018
---SAS/Secure 168-bit                                                                                   14JAN2018
---SAS/Secure Windows                                                                                   14JAN2018
---SAS/ACCESS Interface to PC Files                                                                     14JAN2018
---SAS/ACCESS Interface to ODBC                                                                         14JAN2018
---SAS Workspace Server for Local Access                                                                14JAN2018
---High Performance Suite                                                                               14JAN2018
---SAS Search and Indexing Server                                                                       14JAN2018
---SAS Web Crawler Server                                                                               14JAN2018
---SAS LASR Analytic Server                                                                             14JAN2018
---SAS Visual Analytics Hub                                                                             14JAN2018
---SAS Visual Analytics Services                                                                        14JAN2018
---SAS Visual Analytics Base                                                                            13JAN2018
---Advanced Programming for LASR Analytic Server                                                        14JAN2018
---SAS Visual Analytics Server Components                                                               14JAN2018
---Visual Analytics Explorer                                                                            14JAN2018  ```

Thanks!

Opal | Level 21

## Re: Aggregating dataset by date, including missing dates

Contact SAS tech support. They'll probably just walk you through re-installing SAS, but you definitey should have access to proc freq. I have to wonder what else you might be missing.

Art, CEO, AnalystFinder.com

Discussion stats
• 27 replies
• 1943 views
• 0 likes
• 5 in conversation