DATA Step, Macro, Functions and more

Help: Use First four columns as an input and have to output last three columns.

Reply
Occasional Contributor
Posts: 13

Help: Use First four columns as an input and have to output last three columns.

Hi,

I need help building this logic.

 

The input columns in a dataset are below:

 

DIM1DIM2DATETIMEAMOUNT
71123401JUN2017:00:15:10:00000070
73123401JUN2017:00:15:30:00000020
74123401JUN2017:00:17:18:00000083
76123401JUN2017:00:30:13:00000080
77123401JUN2017:00:30:13:00000021
78123401JUN2017:00:30:14:00000025
80123401JUN2017:00:31:09:000000100
91456701JUN2017:09:20:11:000000687.5
91456701JUN2017:09:20:11:0000003750
22567801JUN2017:09:18:59:000000200
22567801JUN2017:09:18:59:000000200
22567801JUN2017:09:18:59:000000225
22567801JUN2017:09:18:59:000000175
22567801JUN2017:09:18:59:000000175
22567801JUN2017:09:18:59:000000175
65678901JUN2017:05:45:22:0000001014.02
66678901JUN2017:05:45:22:000000100
101234601JUN2017:11:05:51:0000001500
101234601JUN2017:11:05:51:0000001500
101234601JUN2017:11:05:51:0000001500
101234601JUN2017:11:05:51:0000001500
101234601JUN2017:11:05:51:0000001500
101234601JUN2017:11:05:51:0000001300
102234601JUN2017:11:06:49:0000001200
102234601JUN2017:11:06:49:0000001280
105234601JUN2017:11:07:16:0000001400

 

and I am expecting the output dataset as like below (with additional three variables with cumulative counts and amount created)

 

DIM1DIM2DATETIMEAMOUNTAMOUNT CUMULATIVECOUNTS CUMULATIVETXN COUNTS
71123401JUN2017:00:15:10:00000070000
73123401JUN2017:00:15:30:000000207011
74123401JUN2017:00:17:18:000000839022
76123401JUN2017:00:30:13:0000008017333
77123401JUN2017:00:30:13:0000002125344
78123401JUN2017:00:30:14:0000002527455
80123401JUN2017:00:31:09:00000010029966
91456701JUN2017:09:20:11:000000687.5000
91456701JUN2017:09:20:11:0000003750000
22567801JUN2017:09:18:59:000000200000
22567801JUN2017:09:18:59:000000200000
22567801JUN2017:09:18:59:000000225000
22567801JUN2017:09:18:59:000000175000
22567801JUN2017:09:18:59:000000175000
22567801JUN2017:09:18:59:000000175000
65678901JUN2017:05:45:22:0000001014.02000
66678901JUN2017:05:45:22:0000001001014.0211
101234601JUN2017:11:05:51:0000001500000
101234601JUN2017:11:05:51:0000001500000
101234601JUN2017:11:05:51:0000001500000
101234601JUN2017:11:05:51:0000001500000
101234601JUN2017:11:05:51:0000001500000
101234601JUN2017:11:05:51:0000001300000
102234601JUN2017:11:06:49:0000001200880061
102234601JUN2017:11:06:49:0000001280880061
105234601JUN2017:11:07:16:00000014001128082

 

Thanks for the help.

Super User
Posts: 24,028

Re: Help: Use First four columns as an input and have to output last three columns.

Posted in reply to InnovativeMinds

Look at PROC EXPAND

 


InnovativeMinds wrote:

Hi,

I need help building this logic.

 

The input columns in a dataset are below:

 

DIM1 DIM2 DATETIME AMOUNT
71 1234 01JUN2017:00:15:10:000000 70
73 1234 01JUN2017:00:15:30:000000 20
74 1234 01JUN2017:00:17:18:000000 83
76 1234 01JUN2017:00:30:13:000000 80
77 1234 01JUN2017:00:30:13:000000 21
78 1234 01JUN2017:00:30:14:000000 25
80 1234 01JUN2017:00:31:09:000000 100
91 4567 01JUN2017:09:20:11:000000 687.5
91 4567 01JUN2017:09:20:11:000000 3750
22 5678 01JUN2017:09:18:59:000000 200
22 5678 01JUN2017:09:18:59:000000 200
22 5678 01JUN2017:09:18:59:000000 225
22 5678 01JUN2017:09:18:59:000000 175
22 5678 01JUN2017:09:18:59:000000 175
22 5678 01JUN2017:09:18:59:000000 175
65 6789 01JUN2017:05:45:22:000000 1014.02
66 6789 01JUN2017:05:45:22:000000 100
101 2346 01JUN2017:11:05:51:000000 1500
101 2346 01JUN2017:11:05:51:000000 1500
101 2346 01JUN2017:11:05:51:000000 1500
101 2346 01JUN2017:11:05:51:000000 1500
101 2346 01JUN2017:11:05:51:000000 1500
101 2346 01JUN2017:11:05:51:000000 1300
102 2346 01JUN2017:11:06:49:000000 1200
102 2346 01JUN2017:11:06:49:000000 1280
105 2346 01JUN2017:11:07:16:000000 1400

 

and I am expecting the output dataset as like below (with additional three variables with cumulative counts and amount created)

 

DIM1 DIM2 DATETIME AMOUNT AMOUNT CUMULATIVE COUNTS CUMULATIVE TXN COUNTS
71 1234 01JUN2017:00:15:10:000000 70 0 0 0
73 1234 01JUN2017:00:15:30:000000 20 70 1 1
74 1234 01JUN2017:00:17:18:000000 83 90 2 2
76 1234 01JUN2017:00:30:13:000000 80 173 3 3
77 1234 01JUN2017:00:30:13:000000 21 253 4 4
78 1234 01JUN2017:00:30:14:000000 25 274 5 5
80 1234 01JUN2017:00:31:09:000000 100 299 6 6
91 4567 01JUN2017:09:20:11:000000 687.5 0 0 0
91 4567 01JUN2017:09:20:11:000000 3750 0 0 0
22 5678 01JUN2017:09:18:59:000000 200 0 0 0
22 5678 01JUN2017:09:18:59:000000 200 0 0 0
22 5678 01JUN2017:09:18:59:000000 225 0 0 0
22 5678 01JUN2017:09:18:59:000000 175 0 0 0
22 5678 01JUN2017:09:18:59:000000 175 0 0 0
22 5678 01JUN2017:09:18:59:000000 175 0 0 0
65 6789 01JUN2017:05:45:22:000000 1014.02 0 0 0
66 6789 01JUN2017:05:45:22:000000 100 1014.02 1 1
101 2346 01JUN2017:11:05:51:000000 1500 0 0 0
101 2346 01JUN2017:11:05:51:000000 1500 0 0 0
101 2346 01JUN2017:11:05:51:000000 1500 0 0 0
101 2346 01JUN2017:11:05:51:000000 1500 0 0 0
101 2346 01JUN2017:11:05:51:000000 1500 0 0 0
101 2346 01JUN2017:11:05:51:000000 1300 0 0 0
102 2346 01JUN2017:11:06:49:000000 1200 8800 6 1
102 2346 01JUN2017:11:06:49:000000 1280 8800 6 1
105 2346 01JUN2017:11:07:16:000000 1400 11280 8 2

 

Thanks for the help.


 

New Contributor
Posts: 3

Re: Help: Use First four columns as an input and have to output last three columns.

[ Edited ]

Assuming I understand the logic for your last two rows correctly the code below should do what you need.

 

data want(drop=DIM2_Start);
	set have;
	by DIM2 notsorted DIM1 notsorted;
	retain Amount_Cumulative Counts_Cumulative TXN_Counts DIM2_Start;

	if First.DIM2 then do;
		Amount_Cumulative = 0;
		Counts_Cumulative = 0;
		TXN_Counts = 0;
		DIM2_Start = _N_; *Used to track the observation number within each DIM2 value;
	end; else if First.DIM1 then do;
		Counts_Cumulative = _N_ - DIM2_Start;
		TXN_Counts + 1;
	end;
	output;
	Amount_cumulative + amount;
run;
Super User
Posts: 6,939

Re: Help: Use First four columns as an input and have to output last three columns.

Posted in reply to InnovativeMinds

I'm not sure I can figure out the logic for the final two columns.  Here's something for the first new column:

 

data want;

set have;

by DIM2 notsorted;

if first.DIM2 then amount_cumulative=0;

output;

amount_cumulative + amount;

run;

Ask a Question
Discussion stats
  • 3 replies
  • 143 views
  • 0 likes
  • 4 in conversation