Hi everyone,
Here's a sample dataset.
Obs | SeqNum | Amount | NegSign | CES_APPL |
10400 | 1700553755 | 1035 72 | FCD | |
10401 | 1700553756 | 53760 90 | DDA | |
10402 | 1700553757 | 27746 05 | DDA | |
10403 | 1700553758 | 65240 21 | AGL | |
10404 | 1700553759 | 5910 97 | - | ABC |
10405 | 1700553760 | 91130 71 | FCD | |
10406 | 1700553761 | 10888 23 | DDA | |
10407 | 1700553762 | 2622 13 | DDA | |
10408 | 1700553763 | 124066 24 | DDA | |
10409 | 1700553764 | 7067 77 | DDA | |
10410 | 1700553765 | 40 21 | DDA | |
10411 | 1700553766 | 5831 73 | DDA | |
10412 | 1700553767 | 1030 98 | - | AGL |
10413 | 1700553768 | 43648 52 | DDA | |
10414 | 1700553769 | 18695 12 | FCD | |
10415 | 1700553770 | 9439 83 | XYZ | |
10416 | 1700553771 | 12219 36 | DDA | |
10417 | 1700553772 | 7256 50 | DDA | |
10418 | 1700553773 | 40698 54 | DDA | |
10419 | 1700553774 | 16211 15 | - | DDA |
10420 | 1700553775 | 600 00 | DDA | |
10421 | 1700553776 | 53760 90 | DDA |
I'd like to output some rows which meet the following logic (in this case, obs#10404 through obs#10411) :
(1) The first row to output: If NegSign = (-) and CES_APPL = ABC or AGL
(2) And then ALL following rows: CES_APPL = DDA or FCD. If it contains something else other than these 2 codes, don't output the whole chunk.
For example:
-Obs#10400 through obs#10403 will NOT be outpout since there is no starting row with NegSign = (-).
-Obs#10404 through obs#10411 will be OUTPUT since obs#10404 NegSign=(-) and CES_APPL = ABC, and then ALL the following rows CES_APPL = DDA or FCD.
-However, obs#10412 through obs#10418 will NOT be output even though obs#10412 NegSign = (-) and CES_APPL = AGL, but one of the following rows (obs#10415) CES_APPL = XYZ. Not ALL CES_APPL = DDA or FCD. It has to be ALL or NOTHING within the block.
-Obs#10419 through obs#10421 will NOT be output since obs#10419 NegSign = (-), but CES_APPL ≠ ABC or AGL.
So, my idea is to mark the starting row which meet the criteria (NegSign = (-) and CES_APPL = ABC or AGL.
Then, check criteria (2) for the next row only if (1) is met. Then output, otherwise, don't output.
Stop when the next row NegSign = (-), then go back to check (1) again.
I use Base SAS 9.4.
I'm not sure this logic could be done in SAS or not.
Any help would be greatly appreciated.
Thanks,
Jeff
It helps to make two passes through the data. For example:
data pass1;
set have;
if NegSign = '-' then block + 1;
run;
Then make a second pass, using BLOCK to determine which observations "belong" together:
data want;
output_flag = 'Y';
do until (last.block);
set pass1;
by block;
if first.block then do;
if NegSign = ' ' then output_flag = 'N';
if CES_APPL not in ('ABC', 'AGL') then output_flag = 'N';
end;
else if CES_APPL not in ('DDA', 'FCD') then output_flag = 'N';
end;
do until (last.block);
set pass1;
by block;
if output_flag = 'Y' then output;
end;
drop output_flag block;
run;
It helps to make two passes through the data. For example:
data pass1;
set have;
if NegSign = '-' then block + 1;
run;
Then make a second pass, using BLOCK to determine which observations "belong" together:
data want;
output_flag = 'Y';
do until (last.block);
set pass1;
by block;
if first.block then do;
if NegSign = ' ' then output_flag = 'N';
if CES_APPL not in ('ABC', 'AGL') then output_flag = 'N';
end;
else if CES_APPL not in ('DDA', 'FCD') then output_flag = 'N';
end;
do until (last.block);
set pass1;
by block;
if output_flag = 'Y' then output;
end;
drop output_flag block;
run;
Thanks so much, Astounding. It works perfectly ^_^
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.