@mkeintz solution is more efficient, as long as you understand it 🙂
No need to sort and lag- and then re-sort to original order.
Just use a suitably define DO loop containing a SET statement to read ahead to find the desired value
data have;
input w :$20. value;
datalines;
abate 200
abbreviations 50
abdomen 20
abdominal 110
abelcet 100
aberrant 100
abiraterone 100
abl1 100
ablation 210
abnormal 30
abo 780
abscission 260
absence 100
absenteeism 140
absorption 130
abstinence 150
abstracting 110
academia 210
academic 30
academy 600
acceleration 180
access 130
accessibility 100
accident 200
accidents 230
accountability 50
accounting 100
accreditation 230
acculturation 30
ace 600
acetates 920
acetazolamide 310
acetic 110
acetone 210
acetyl 50
acetylaminofluorene 200
acetylatio 230
acetylation 200
bupropion 550
burden 560
burkitt's 230
burn 300
burns 140
businesses 550
butanoic 800
butterflies 500
butyrates 100
butyric 450
butyryl 800
bypass 330
bystander 300
c 300
c2 130
c2h2 150
c3h 250
c57bl 720
run;
data want (drop=next_:);
set have;
length nw $20; retain nw ' ';
if value<=200 and nw<w then do until (next_v>200 and next_w>=w);
if eof=0 then set have (rename=(w=next_w value=next_v)) end=eof;
else do; next_w='ZZZZZZ'; next_v=201; end;
nw=lag(next_w);
end;
else if value>200 then nw=w;
run;
xxx
You don't want the second SET statement to prematurely end the data step, which it would do if it attempts to read beyond the end-of-file. So the SET statement inside the loop has an "end=eof" parameter, which sets the dummy variable EOF to 1 only if the last record is in hand.
So I use "if eof=0 then set" to avoid reading beyond end of file. But of course the loop containing the data set might not have the UNTIL expression satisfied at the end-of-file. So the "if eof=0 then set" is followed by ELSE setting next_w to 'ZZZZZZ' and next_v>200 thereby satisfying the until expression and getting out of the loop.
data have;
informat w $30. value 8.;
input w $ value;
cards;
abate 200
abbreviations 50
abdomen 20
abdominal 110
abelcet 100
aberrant 100
abiraterone 100
abl1 100
ablation 210
abnormal 30
abo 780
abscission 260
absence 100
absenteeism 140
absorption 130
abstinence 150
abstracting 110
academia 210
academic 30
academy 600
acceleration 180
access 130
accessibility 100
accident 200
accidents 230
accountability 50
accounting 100
accreditation 230
acculturation 30
ace 600
acetates 920
acetazolamide 310
acetic 110
acetone 210
acetyl 50
acetylaminofluorene 200
acetylatio 230
acetylation 200
bupropion 550
burden 560
burkitt's 230
burn 300
burns 140
businesses 550
butanoic 800
butterflies 500
butyrates 100
butyric 450
butyryl 800
bypass 330
bystander 300
c 300
c2 130
c2h2 150
c3h 250
c57bl 720
;
run;
data temp;
set have;
flag=ifn(value le 200,1,0);
run;
data want;
do until(last.flag);
set temp;
by flag notsorted;
end;
if value le 200 then do;_w=w;yes=1;end;
do until(last.flag);
set temp;
by flag notsorted;
new=ifc(yes,_w,w);
output;
end;
drop yes _w flag;
run;
Nice,
suggest this tweak for the second do group:
_w=ifc(value>200,' ',w); /* instead of if value le 200 then do;_w=w;yes=1;end;*/
do until(last.flag);
set temp;
by flag notsorted;
new=coalescec(_w,w); /* new=ifc(yes,_w,w);*/
output;
end;
Just for the record, I finally had a chance to check the solution that I posted originally. It seems to work fine. If it's not giving you the answer you expect, I wouldn't mind seeing your log and why the results are different than what you wanted. Perhaps I just never understood the problem.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.