Hi,
I need help building this logic.
The input columns in a dataset are below:
DIM1 | DIM2 | DATETIME | AMOUNT |
71 | 1234 | 01JUN2017:00:15:10:000000 | 70 |
73 | 1234 | 01JUN2017:00:15:30:000000 | 20 |
74 | 1234 | 01JUN2017:00:17:18:000000 | 83 |
76 | 1234 | 01JUN2017:00:30:13:000000 | 80 |
77 | 1234 | 01JUN2017:00:30:13:000000 | 21 |
78 | 1234 | 01JUN2017:00:30:14:000000 | 25 |
80 | 1234 | 01JUN2017:00:31:09:000000 | 100 |
91 | 4567 | 01JUN2017:09:20:11:000000 | 687.5 |
91 | 4567 | 01JUN2017:09:20:11:000000 | 3750 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 200 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 200 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 225 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 175 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 175 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 175 |
65 | 6789 | 01JUN2017:05:45:22:000000 | 1014.02 |
66 | 6789 | 01JUN2017:05:45:22:000000 | 100 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1300 |
102 | 2346 | 01JUN2017:11:06:49:000000 | 1200 |
102 | 2346 | 01JUN2017:11:06:49:000000 | 1280 |
105 | 2346 | 01JUN2017:11:07:16:000000 | 1400 |
and I am expecting the output dataset as like below (with additional three variables with cumulative counts and amount created)
DIM1 | DIM2 | DATETIME | AMOUNT | AMOUNT CUMULATIVE | COUNTS CUMULATIVE | TXN COUNTS |
71 | 1234 | 01JUN2017:00:15:10:000000 | 70 | 0 | 0 | 0 |
73 | 1234 | 01JUN2017:00:15:30:000000 | 20 | 70 | 1 | 1 |
74 | 1234 | 01JUN2017:00:17:18:000000 | 83 | 90 | 2 | 2 |
76 | 1234 | 01JUN2017:00:30:13:000000 | 80 | 173 | 3 | 3 |
77 | 1234 | 01JUN2017:00:30:13:000000 | 21 | 253 | 4 | 4 |
78 | 1234 | 01JUN2017:00:30:14:000000 | 25 | 274 | 5 | 5 |
80 | 1234 | 01JUN2017:00:31:09:000000 | 100 | 299 | 6 | 6 |
91 | 4567 | 01JUN2017:09:20:11:000000 | 687.5 | 0 | 0 | 0 |
91 | 4567 | 01JUN2017:09:20:11:000000 | 3750 | 0 | 0 | 0 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 200 | 0 | 0 | 0 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 200 | 0 | 0 | 0 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 225 | 0 | 0 | 0 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 175 | 0 | 0 | 0 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 175 | 0 | 0 | 0 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 175 | 0 | 0 | 0 |
65 | 6789 | 01JUN2017:05:45:22:000000 | 1014.02 | 0 | 0 | 0 |
66 | 6789 | 01JUN2017:05:45:22:000000 | 100 | 1014.02 | 1 | 1 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 | 0 | 0 | 0 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 | 0 | 0 | 0 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 | 0 | 0 | 0 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 | 0 | 0 | 0 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 | 0 | 0 | 0 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1300 | 0 | 0 | 0 |
102 | 2346 | 01JUN2017:11:06:49:000000 | 1200 | 8800 | 6 | 1 |
102 | 2346 | 01JUN2017:11:06:49:000000 | 1280 | 8800 | 6 | 1 |
105 | 2346 | 01JUN2017:11:07:16:000000 | 1400 | 11280 | 8 | 2 |
Thanks for the help.
Look at PROC EXPAND
@InnovativeMinds wrote:
Hi,
I need help building this logic.
The input columns in a dataset are below:
DIM1 DIM2 DATETIME AMOUNT 71 1234 01JUN2017:00:15:10:000000 70 73 1234 01JUN2017:00:15:30:000000 20 74 1234 01JUN2017:00:17:18:000000 83 76 1234 01JUN2017:00:30:13:000000 80 77 1234 01JUN2017:00:30:13:000000 21 78 1234 01JUN2017:00:30:14:000000 25 80 1234 01JUN2017:00:31:09:000000 100 91 4567 01JUN2017:09:20:11:000000 687.5 91 4567 01JUN2017:09:20:11:000000 3750 22 5678 01JUN2017:09:18:59:000000 200 22 5678 01JUN2017:09:18:59:000000 200 22 5678 01JUN2017:09:18:59:000000 225 22 5678 01JUN2017:09:18:59:000000 175 22 5678 01JUN2017:09:18:59:000000 175 22 5678 01JUN2017:09:18:59:000000 175 65 6789 01JUN2017:05:45:22:000000 1014.02 66 6789 01JUN2017:05:45:22:000000 100 101 2346 01JUN2017:11:05:51:000000 1500 101 2346 01JUN2017:11:05:51:000000 1500 101 2346 01JUN2017:11:05:51:000000 1500 101 2346 01JUN2017:11:05:51:000000 1500 101 2346 01JUN2017:11:05:51:000000 1500 101 2346 01JUN2017:11:05:51:000000 1300 102 2346 01JUN2017:11:06:49:000000 1200 102 2346 01JUN2017:11:06:49:000000 1280 105 2346 01JUN2017:11:07:16:000000 1400
and I am expecting the output dataset as like below (with additional three variables with cumulative counts and amount created)
DIM1 DIM2 DATETIME AMOUNT AMOUNT CUMULATIVE COUNTS CUMULATIVE TXN COUNTS 71 1234 01JUN2017:00:15:10:000000 70 0 0 0 73 1234 01JUN2017:00:15:30:000000 20 70 1 1 74 1234 01JUN2017:00:17:18:000000 83 90 2 2 76 1234 01JUN2017:00:30:13:000000 80 173 3 3 77 1234 01JUN2017:00:30:13:000000 21 253 4 4 78 1234 01JUN2017:00:30:14:000000 25 274 5 5 80 1234 01JUN2017:00:31:09:000000 100 299 6 6 91 4567 01JUN2017:09:20:11:000000 687.5 0 0 0 91 4567 01JUN2017:09:20:11:000000 3750 0 0 0 22 5678 01JUN2017:09:18:59:000000 200 0 0 0 22 5678 01JUN2017:09:18:59:000000 200 0 0 0 22 5678 01JUN2017:09:18:59:000000 225 0 0 0 22 5678 01JUN2017:09:18:59:000000 175 0 0 0 22 5678 01JUN2017:09:18:59:000000 175 0 0 0 22 5678 01JUN2017:09:18:59:000000 175 0 0 0 65 6789 01JUN2017:05:45:22:000000 1014.02 0 0 0 66 6789 01JUN2017:05:45:22:000000 100 1014.02 1 1 101 2346 01JUN2017:11:05:51:000000 1500 0 0 0 101 2346 01JUN2017:11:05:51:000000 1500 0 0 0 101 2346 01JUN2017:11:05:51:000000 1500 0 0 0 101 2346 01JUN2017:11:05:51:000000 1500 0 0 0 101 2346 01JUN2017:11:05:51:000000 1500 0 0 0 101 2346 01JUN2017:11:05:51:000000 1300 0 0 0 102 2346 01JUN2017:11:06:49:000000 1200 8800 6 1 102 2346 01JUN2017:11:06:49:000000 1280 8800 6 1 105 2346 01JUN2017:11:07:16:000000 1400 11280 8 2
Thanks for the help.
Assuming I understand the logic for your last two rows correctly the code below should do what you need.
data want(drop=DIM2_Start);
set have;
by DIM2 notsorted DIM1 notsorted;
retain Amount_Cumulative Counts_Cumulative TXN_Counts DIM2_Start;
if First.DIM2 then do;
Amount_Cumulative = 0;
Counts_Cumulative = 0;
TXN_Counts = 0;
DIM2_Start = _N_; *Used to track the observation number within each DIM2 value;
end; else if First.DIM1 then do;
Counts_Cumulative = _N_ - DIM2_Start;
TXN_Counts + 1;
end;
output;
Amount_cumulative + amount;
run;
I'm not sure I can figure out the logic for the final two columns. Here's something for the first new column:
data want;
set have;
by DIM2 notsorted;
if first.DIM2 then amount_cumulative=0;
output;
amount_cumulative + amount;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.