Hi,
I need help building this logic.
The input columns in a dataset are below:
DIM1 | DIM2 | DATETIME | AMOUNT |
71 | 1234 | 01JUN2017:00:15:10:000000 | 70 |
73 | 1234 | 01JUN2017:00:15:30:000000 | 20 |
74 | 1234 | 01JUN2017:00:17:18:000000 | 83 |
76 | 1234 | 01JUN2017:00:30:13:000000 | 80 |
77 | 1234 | 01JUN2017:00:30:13:000000 | 21 |
78 | 1234 | 01JUN2017:00:30:14:000000 | 25 |
80 | 1234 | 01JUN2017:00:31:09:000000 | 100 |
91 | 4567 | 01JUN2017:09:20:11:000000 | 687.5 |
91 | 4567 | 01JUN2017:09:20:11:000000 | 3750 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 200 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 200 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 225 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 175 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 175 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 175 |
65 | 6789 | 01JUN2017:05:45:22:000000 | 1014.02 |
66 | 6789 | 01JUN2017:05:45:22:000000 | 100 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1300 |
102 | 2346 | 01JUN2017:11:06:49:000000 | 1200 |
102 | 2346 | 01JUN2017:11:06:49:000000 | 1280 |
105 | 2346 | 01JUN2017:11:07:16:000000 | 1400 |
and I am expecting the output dataset as like below (with additional three variables with cumulative counts and amount created)
DIM1 | DIM2 | DATETIME | AMOUNT | AMOUNT CUMULATIVE | COUNTS CUMULATIVE | TXN COUNTS |
71 | 1234 | 01JUN2017:00:15:10:000000 | 70 | 0 | 0 | 0 |
73 | 1234 | 01JUN2017:00:15:30:000000 | 20 | 70 | 1 | 1 |
74 | 1234 | 01JUN2017:00:17:18:000000 | 83 | 90 | 2 | 2 |
76 | 1234 | 01JUN2017:00:30:13:000000 | 80 | 173 | 3 | 3 |
77 | 1234 | 01JUN2017:00:30:13:000000 | 21 | 253 | 4 | 4 |
78 | 1234 | 01JUN2017:00:30:14:000000 | 25 | 274 | 5 | 5 |
80 | 1234 | 01JUN2017:00:31:09:000000 | 100 | 299 | 6 | 6 |
91 | 4567 | 01JUN2017:09:20:11:000000 | 687.5 | 0 | 0 | 0 |
91 | 4567 | 01JUN2017:09:20:11:000000 | 3750 | 0 | 0 | 0 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 200 | 0 | 0 | 0 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 200 | 0 | 0 | 0 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 225 | 0 | 0 | 0 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 175 | 0 | 0 | 0 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 175 | 0 | 0 | 0 |
22 | 5678 | 01JUN2017:09:18:59:000000 | 175 | 0 | 0 | 0 |
65 | 6789 | 01JUN2017:05:45:22:000000 | 1014.02 | 0 | 0 | 0 |
66 | 6789 | 01JUN2017:05:45:22:000000 | 100 | 1014.02 | 1 | 1 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 | 0 | 0 | 0 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 | 0 | 0 | 0 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 | 0 | 0 | 0 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 | 0 | 0 | 0 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1500 | 0 | 0 | 0 |
101 | 2346 | 01JUN2017:11:05:51:000000 | 1300 | 0 | 0 | 0 |
102 | 2346 | 01JUN2017:11:06:49:000000 | 1200 | 8800 | 6 | 1 |
102 | 2346 | 01JUN2017:11:06:49:000000 | 1280 | 8800 | 6 | 1 |
105 | 2346 | 01JUN2017:11:07:16:000000 | 1400 | 11280 | 8 | 2 |
Thanks for the help.
Look at PROC EXPAND
@InnovativeMinds wrote:
Hi,
I need help building this logic.
The input columns in a dataset are below:
DIM1 DIM2 DATETIME AMOUNT 71 1234 01JUN2017:00:15:10:000000 70 73 1234 01JUN2017:00:15:30:000000 20 74 1234 01JUN2017:00:17:18:000000 83 76 1234 01JUN2017:00:30:13:000000 80 77 1234 01JUN2017:00:30:13:000000 21 78 1234 01JUN2017:00:30:14:000000 25 80 1234 01JUN2017:00:31:09:000000 100 91 4567 01JUN2017:09:20:11:000000 687.5 91 4567 01JUN2017:09:20:11:000000 3750 22 5678 01JUN2017:09:18:59:000000 200 22 5678 01JUN2017:09:18:59:000000 200 22 5678 01JUN2017:09:18:59:000000 225 22 5678 01JUN2017:09:18:59:000000 175 22 5678 01JUN2017:09:18:59:000000 175 22 5678 01JUN2017:09:18:59:000000 175 65 6789 01JUN2017:05:45:22:000000 1014.02 66 6789 01JUN2017:05:45:22:000000 100 101 2346 01JUN2017:11:05:51:000000 1500 101 2346 01JUN2017:11:05:51:000000 1500 101 2346 01JUN2017:11:05:51:000000 1500 101 2346 01JUN2017:11:05:51:000000 1500 101 2346 01JUN2017:11:05:51:000000 1500 101 2346 01JUN2017:11:05:51:000000 1300 102 2346 01JUN2017:11:06:49:000000 1200 102 2346 01JUN2017:11:06:49:000000 1280 105 2346 01JUN2017:11:07:16:000000 1400
and I am expecting the output dataset as like below (with additional three variables with cumulative counts and amount created)
DIM1 DIM2 DATETIME AMOUNT AMOUNT CUMULATIVE COUNTS CUMULATIVE TXN COUNTS 71 1234 01JUN2017:00:15:10:000000 70 0 0 0 73 1234 01JUN2017:00:15:30:000000 20 70 1 1 74 1234 01JUN2017:00:17:18:000000 83 90 2 2 76 1234 01JUN2017:00:30:13:000000 80 173 3 3 77 1234 01JUN2017:00:30:13:000000 21 253 4 4 78 1234 01JUN2017:00:30:14:000000 25 274 5 5 80 1234 01JUN2017:00:31:09:000000 100 299 6 6 91 4567 01JUN2017:09:20:11:000000 687.5 0 0 0 91 4567 01JUN2017:09:20:11:000000 3750 0 0 0 22 5678 01JUN2017:09:18:59:000000 200 0 0 0 22 5678 01JUN2017:09:18:59:000000 200 0 0 0 22 5678 01JUN2017:09:18:59:000000 225 0 0 0 22 5678 01JUN2017:09:18:59:000000 175 0 0 0 22 5678 01JUN2017:09:18:59:000000 175 0 0 0 22 5678 01JUN2017:09:18:59:000000 175 0 0 0 65 6789 01JUN2017:05:45:22:000000 1014.02 0 0 0 66 6789 01JUN2017:05:45:22:000000 100 1014.02 1 1 101 2346 01JUN2017:11:05:51:000000 1500 0 0 0 101 2346 01JUN2017:11:05:51:000000 1500 0 0 0 101 2346 01JUN2017:11:05:51:000000 1500 0 0 0 101 2346 01JUN2017:11:05:51:000000 1500 0 0 0 101 2346 01JUN2017:11:05:51:000000 1500 0 0 0 101 2346 01JUN2017:11:05:51:000000 1300 0 0 0 102 2346 01JUN2017:11:06:49:000000 1200 8800 6 1 102 2346 01JUN2017:11:06:49:000000 1280 8800 6 1 105 2346 01JUN2017:11:07:16:000000 1400 11280 8 2
Thanks for the help.
Assuming I understand the logic for your last two rows correctly the code below should do what you need.
data want(drop=DIM2_Start);
set have;
by DIM2 notsorted DIM1 notsorted;
retain Amount_Cumulative Counts_Cumulative TXN_Counts DIM2_Start;
if First.DIM2 then do;
Amount_Cumulative = 0;
Counts_Cumulative = 0;
TXN_Counts = 0;
DIM2_Start = _N_; *Used to track the observation number within each DIM2 value;
end; else if First.DIM1 then do;
Counts_Cumulative = _N_ - DIM2_Start;
TXN_Counts + 1;
end;
output;
Amount_cumulative + amount;
run;
I'm not sure I can figure out the logic for the final two columns. Here's something for the first new column:
data want;
set have;
by DIM2 notsorted;
if first.DIM2 then amount_cumulative=0;
output;
amount_cumulative + amount;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.