Aggregating dataset by date, including missing dates

Accepted Solution Solved
Reply
Contributor
Posts: 28
Accepted Solution

Aggregating dataset by date, including missing dates

[ Edited ]

Hi,

 

I have a dataset, work.PIC, which contains transactions ($ amounts) per Address and Date. This dataset is sorted by Address and Date.

 

Sample Dataset:

  Address    Date      Amount
Property 1 31/01/2014 -770
Property 1 31/01/2014 -550
Property 1 31/01/2014 -20000
Property 1 31/01/2014 -52329.6
Property 1 31/07/2014 -660
Property 1 31/10/2014 -1738
Property 1 30/06/2015 -845.32
Property 1 30/06/2015 845.32
Property 1 30/06/2015 -845.32
Property 1 31/12/2015 845.32
Property 1 31/12/2015 -845.32
Property 1 31/12/2015 845.32
Property 2 31/01/2014 -1320
Property 2 31/01/2014 -2895.99
Property 2 31/01/2014 -653.32
Property 2 31/01/2014 -40000
Property 2 31/01/2014 -121954.8
Property 2 30/09/2015 -3500
Property 3 31/01/2014 -1200
Property 3 31/01/2014 -28080
Property 3 31/01/2014 -130483
Property 3 30/11/2015 -885
Property 3 31/12/2015 885
 

For each address, I would like to create a running total per Address, with output at each Date. I have accomplished this through the code below:

 

Current code: 

data work.PIC_agg (keep = Address Date Value);
	set work.PIC;
	by Address Date;
	if First.Address then Value = 0;
	value + Amount;
	if Last.Date then output;
run;

 

 

Problem:

This issue with the above is that it then only contains information for each property on a date where transactions occurred e.g. Property 1 will only have entries for Jan14, Jul14, Oct14, Jun15, and Dec15.

 

I would like to have a value for each date i.e. from Jan14 through to Dec15. If there is no transaction in a given month, then the value equals the previous month's total.

 

 

Proposed methodology:

Store each unique element of the Date variable in an array (DateArray)

Create a new variable for each element in DateArray

For each element in DataArray, sum over all transactions for a particular Address, where the Date<= DateArray

Store the resultant number in the correct variable for each Address

 

 

Any ideas on where the above methodology is sound, and, if so, how to implement. If not, could someone perhaps suggest a better method?

 

Please let me know if the above is unclear.

 

Many thanks!

 


Accepted Solutions
Solution
‎07-20-2017 09:30 AM
Regular Contributor
Posts: 216

Re: Aggregating dataset by date, including missing dates

Posted in reply to MikeFranz

Hi,

This newly modified code, has a running total (value), as well as, original monthy total

data have;
length address $10 date 8 amount 8;
format date ddmmyy10.;
INPUT @1 address $10. @12 date :ddmmyy10. amount :best.;
datalines;
Property 1 31/01/2014 -770 
Property 1 31/01/2014 -550 
Property 1 31/01/2014 -20000 
Property 1 31/01/2014 -52329.6 
Property 1 31/07/2014 -660 
Property 1 31/10/2014 -1738 
Property 1 30/06/2015 -845.32 
Property 1 30/06/2015 845.32 
Property 1 30/06/2015 -845.32 
Property 1 31/12/2015 845.32 
Property 1 31/12/2015 -845.32 
Property 1 31/12/2015 845.32 
Property 2 31/01/2014 -1320 
Property 2 31/01/2014 -2895.99 
Property 2 31/01/2014 -653.32 
Property 2 31/01/2014 -40000 
Property 2 31/01/2014 -121954.8 
Property 2 30/09/2015 -3500 
Property 3 31/01/2014 -1200 
Property 3 31/01/2014 -28080 
Property 3 31/01/2014 -130483 
Property 3 30/11/2015 -885 
Property 3 31/12/2015 885
;
run;

/* Create aggregate by address,date */
proc summary data=have nway;
  class address date;
  var	amount;
  output out=have_sum(drop=_:) sum=;
run;

/* For every address, ensure there are no monthly gaps */
data want(KEEP=address date amount value);

	length curr_dt curr_amt curr_val prev_dt prev_amt value prev_val 8;
	retain curr_dt curr_amt curr_val prev_dt prev_amt value prev_val;
	format curr_dt prev_dt ddmmyy10.;

	do until (last.address);

		SET have_sum;
		by address date;

		if (first.address) then
		do;
			value = amount;
			output; /* ensure it doesn't get left out */
		end;
		else if (first.date) then
			value + amount;

		prev_dt= lag(date);
		prev_val= lag(value);
		prev_amt= lag(amount);
		put prev_dt= prev_val= prev_amt=;

		/* Find the monthly gap */
		diff= INTCK('month',prev_dt,date);
		*put diff=;

		if (diff > 0) then
		do; /* Fill-in missing months */

			/* Preserve current record values */
			curr_dt = date;
			curr_val = value;
			curr_amt = amount;

			/* Output missing months */
			do i=1 to (diff-1) by 1;
			  date=intnx('month',prev_dt,i,'E');
			  amount=prev_amt;
			  value=prev_val;
			  OUTPUT;
			end;

			/* Don't forget current record */
			date=curr_dt;
			amount=curr_amt;
			value=curr_val;
			OUTPUT;

		end; /* Fill-in missing months */
	end;
run;
Proc print data=want; run;

Hope if gives you what you are looking for,

 

Ahmed 

View solution in original post


All Replies
PROC Star
Posts: 7,467

Re: Aggregating dataset by date, including missing dates

Posted in reply to MikeFranz

You say you have a variable PERIOD, but you haven't shown us anything about that variable. You'll likely get more responses if you show the forum your example have and want datasets .. each in the form of datasteps and posted using the {i} icon.

 

Art, CEO, AnalystFinder.com

 

Valued Guide
Posts: 765

Re: Aggregating dataset by date, including missing dates

Posted in reply to MikeFranz

Hi, this uses an array but not in the way (I think) that you proposed. There probably is a more efficient INTNX-related metod, but  this fills in the missing months ...

 

* aggregate amounts by address and date;

proc summary data=x nway;
class address date;
var amount;
output out=xx (drop=_Smiley Happy sum=;
run;

 

* fill in the missing months;

data y;
array _a(0:1000);
do j=1 by 1 until (last.address);
  set xx;
  by address;
  if first.address then first = date;
  _a(date - first) = amount;
end;
do j=0 to (date - first);
  value = sum(value,_a(j));
  date = first+j;
  if day(date+1) eq 1 then output;
end;
keep address date value;
run;

 

Portion of output ...

DATA SET: y

 address           date       value

Property1    31/01/2014     -73649.60
Property1    28/02/2014     -73649.60
Property1    31/03/2014     -73649.60
Property1    30/04/2014     -73649.60
Property1    31/05/2014     -73649.60
Property1    30/06/2014     -73649.60
Property1    31/07/2014     -74309.60
Property1    31/08/2014     -74309.60
Property1    30/09/2014     -74309.60
Property1    31/10/2014     -76047.60
Property1    30/11/2014     -76047.60
Property1    31/12/2014     -76047.60
Property1    31/01/2015     -76047.60
Property1    28/02/2015     -76047.60
Property1    31/03/2015     -76047.60
Property1    30/04/2015     -76047.60
Property1    31/05/2015     -76047.60
Property1    30/06/2015     -76892.92
Property1    31/07/2015     -76892.92
Property1    31/08/2015     -76892.92
Property1    30/09/2015     -76892.92
Property1    31/10/2015     -76892.92
Property1    30/11/2015     -76892.92
Property1    31/12/2015     -76047.60

 

 

 

 

Contributor
Posts: 28

Re: Aggregating dataset by date, including missing dates

Hi,
Thank you for the response, very helpful!
Couple of follow up questions:

1) Is there any way I could get the resultant dataset to contain each month up to a given period? For example, using the code you kindly provided, Property 1 data goes up to 31Dec15, while Property 2 data only goes up to 30Sep15. Ideally I would like all the properties to generate up to the same month, if that makes sense?

2) Would you be able to walk me through what your code is doing? It clearly works, but I'm quite new to SAS and am not able to follow it.

Thanks again for your help!
Valued Guide
Posts: 765

Re: Aggregating dataset by date, including missing dates

Posted in reply to MikeFranz

For maximum date in the data set for all PROPERTIES ...

 

data y;
retain max_date;
array _a(0:1000);
do j=1 by 1 until (last.address);
  set xx;
  by address;
  if first.address then first = date;
  _a(date - first) = amount;
  max_date = max(max_date, date);
end;
do j=0 to max_date-first;
  value = sum(value,_a(j));
  date = first+j;
  if day(date+1) eq 1 then output;
end;
keep address date value;
run;

 

Send me an email ... msz03albany.edu ... and I'll do the walk through for you.

Contributor
Posts: 28

Re: Aggregating dataset by date, including missing dates

Many thanks, the above works perfectly.

Tried sending you an email by it can't send, as it can't resolve the domain name in msz03albany.edu?
Valued Guide
Posts: 765

Re: Aggregating dataset by date, including missing dates

Posted in reply to MikeFranz

Hi, still needs a couple fixes, only worked since the  max date is in the first group of addresses.

 

Will fix later and send to you. My email is msz03@albany.edu

Contributor
Posts: 28

Re: Aggregating dataset by date, including missing dates

Hi,

 

I'm looking at this again today, and just noticed I'm getting some strange results.

If I use the dataset below:

Address                 Date            Amount
1000 Sumac Drive 31/05/2014 250 1000 Sumac Drive 30/06/2014 184252 1000 Sumac Drive 31/07/2014 4100 1000 Sumac Drive 31/08/2014 2572 1000 Sumac Drive 28/02/2015 402.98 1000 Sumac Drive 31/08/2015 -933.87 1000 Sumac Drive 30/09/2015 6520.72 1000 Sumac Drive 31/12/2015 -2632.98

I'm getting the following output:

1000 Sumac Drive	31/05/2014	-2382.98	
1000 Sumac Drive	30/06/2014	181869.02	
1000 Sumac Drive	31/07/2014	185969.02	
1000 Sumac Drive	31/08/2014	188541.02	
1000 Sumac Drive	30/09/2014	188541.02	
1000 Sumac Drive	31/10/2014	188541.02	
1000 Sumac Drive	30/11/2014	188541.02	
1000 Sumac Drive	31/12/2014	188541.02	
1000 Sumac Drive	31/01/2015	188541.02	
1000 Sumac Drive	28/02/2015	188944	
1000 Sumac Drive	31/03/2015	188944	
1000 Sumac Drive	30/04/2015	188944	
1000 Sumac Drive	31/05/2015	188944	
1000 Sumac Drive	30/06/2015	188944	
1000 Sumac Drive	31/07/2015	188944	
1000 Sumac Drive	31/08/2015	188010.13	
1000 Sumac Drive	30/09/2015	194530.85	
1000 Sumac Drive	31/10/2015	194530.85	
1000 Sumac Drive	30/11/2015	194530.85	
1000 Sumac Drive	31/12/2015	191897.87	
1000 Sumac Drive	31/01/2016	191897.87	
1000 Sumac Drive	29/02/2016	191897.87	
1000 Sumac Drive	31/03/2016	191897.87	
1000 Sumac Drive	30/04/2016	191897.87	
1000 Sumac Drive	31/05/2016	191897.87	
1000 Sumac Drive	30/06/2016	191897.87	
1000 Sumac Drive	31/07/2016	191897.87	
1000 Sumac Drive	31/08/2016	191897.87	
1000 Sumac Drive	30/09/2016	191897.87	
1000 Sumac Drive	31/10/2016	191897.87	
1000 Sumac Drive	30/11/2016	191897.87	
1000 Sumac Drive	31/12/2016	191897.87	
1000 Sumac Drive	31/01/2017	191897.87	
1000 Sumac Drive	28/02/2017	191897.87	
1000 Sumac Drive	31/03/2017	191897.87	
1000 Sumac Drive	30/04/2017	191897.87	
1000 Sumac Drive	31/05/2017	191897.87	
1000 Sumac Drive	30/06/2017	191897.87	
1000 Sumac Drive	31/07/2017	191897.87	

I can't work out why it is misbehaving?

 

Any insight would be amazing!

 

Thanks!

 

 

 

 

 

 

 

Contributor
Posts: 28

Re: Aggregating dataset by date, including missing dates

Posted in reply to MikeFranz
If it helps at all, it seems to be aggregating the first and last lines of the first dataset into the first line of second dataset, I just don't know why
Regular Contributor
Posts: 216

Re: Aggregating dataset by date, including missing dates

Posted in reply to MikeFranz
Mike,
Check my 2nd answer (07/10/17 5:xx PM) to your original post. I had included a sample output, that may just match what you are looking for?

Ahmed
Regular Contributor
Posts: 216

Re: Aggregating dataset by date, including missing dates

Posted in reply to MikeFranz

Hi Mike,

Would the following work for you?

 

data have;
length address $10 date 8 amount 8;
format date ddmmyy10.;
INPUT @1 address $10. @12 date :ddmmyy10. amount :best.;
datalines;
Property 1 31/01/2014 -770 
Property 1 31/01/2014 -550 
Property 1 31/01/2014 -20000 
Property 1 31/01/2014 -52329.6 
Property 1 31/07/2014 -660 
Property 1 31/10/2014 -1738 
Property 1 30/06/2015 -845.32 
Property 1 30/06/2015 845.32 
Property 1 30/06/2015 -845.32 
Property 1 31/12/2015 845.32 
Property 1 31/12/2015 -845.32 
Property 1 31/12/2015 845.32 
Property 2 31/01/2014 -1320 
Property 2 31/01/2014 -2895.99 
Property 2 31/01/2014 -653.32 
Property 2 31/01/2014 -40000 
Property 2 31/01/2014 -121954.8 
Property 2 30/09/2015 -3500 
Property 3 31/01/2014 -1200 
Property 3 31/01/2014 -28080 
Property 3 31/01/2014 -130483 
Property 3 30/11/2015 -885 
Property 3 31/12/2015 885
;
run;

/* Using the /SPARSE option to fill-in the missing dates for the properties */
proc freq data=have;
  table address*date / SPARSE out=all list noprint;
run;

/* Combine the original data with only the filler records (count=0) */
data final_v;
  SET have all(where=(count=0));
run;

/* Produce the final aggregation table */
proc summary data=final_v nway;
  class address date;
  var	amount;
  output out=want(drop=_:) sum=;
run;

Hope this helps,

Ahmed

 

Contributor
Posts: 28

Re: Aggregating dataset by date, including missing dates

Posted in reply to AhmedAl_Attar
Hey, thanks for the response. It says "Proc freq not found". Is it part of Base SAS?
Super User
Posts: 19,770

Re: Aggregating dataset by date, including missing dates

Posted in reply to MikeFranz

MikeFranz wrote:
Hey, thanks for the response. It says "Proc freq not found". Is it part of Base SAS?

Yes it is. Check your code for other errors.

Contributor
Posts: 28

Re: Aggregating dataset by date, including missing dates

Hi,

 

My code is:

 

proc freq data=work.pic;
  table address*Date / SPARSE out=all list noprint;
run;

which is copy paste from the solution, with only the dataset name changed.

 

I'm getting the following errors:

 

 
 ERROR: The FREQ procedure cannot be run.
 ERROR: Procedure FREQ not found.

I have been not been able to run PROC FREQ in the past. I had just assumed it was because it wasn't part of Base SAS.

 

If it helps, running PROC SETINIT yields the following:

 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 55         
 56         PROC SETINIT;
 57         
 58         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 Original site validation data
 Current version: 9.04.01M3P062415
 System birthday:   19JAN2017.
 Operating System:   WX64_SV .
 Product expiration dates:
 ---Base SAS Software                                                                                    14JAN2018  
 ---SAS/GRAPH                                                                                            14JAN2018  
 ---SAS Integration Technologies                                                                         14JAN2018  
 ---SAS/Secure 168-bit                                                                                   14JAN2018  
 ---SAS/Secure Windows                                                                                   14JAN2018  
 ---SAS/ACCESS Interface to PC Files                                                                     14JAN2018  
 ---SAS/ACCESS Interface to ODBC                                                                         14JAN2018  
 ---SAS Workspace Server for Local Access                                                                14JAN2018  
 ---High Performance Suite                                                                               14JAN2018  
 ---SAS Search and Indexing Server                                                                       14JAN2018  
 ---SAS Web Crawler Server                                                                               14JAN2018  
 ---SAS LASR Analytic Server                                                                             14JAN2018  
 ---SAS Visual Analytics Hub                                                                             14JAN2018  
 ---SAS Visual Analytics Services                                                                        14JAN2018  
 ---SAS Visual Analytics Base                                                                            13JAN2018  
 ---Advanced Programming for LASR Analytic Server                                                        14JAN2018  
 ---SAS Visual Analytics Server Components                                                               14JAN2018  
 ---Visual Analytics Explorer                                                                            14JAN2018  

Thanks!

 
PROC Star
Posts: 7,467

Re: Aggregating dataset by date, including missing dates

Posted in reply to MikeFranz

Contact SAS tech support. They'll probably just walk you through re-installing SAS, but you definitey should have access to proc freq. I have to wonder what else you might be missing.

 

Art, CEO, AnalystFinder.com

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 27 replies
  • 290 views
  • 0 likes
  • 5 in conversation