Dear,
I need to flag the OBS based on a sort order. Is there a way to flag the prior OBS with non blank value if a blank value is present for the obs. Eg. For ID=3, in my output i am flagging last OBS which is blank value. In my output, i need to skip it and flag the prior non blank OBS .Please help . Thank you.
data one;
input id visit test $ val ptnum date $19.;
datalines;
1 1 temp 88 2 2010-04-27
1 1 temp 90 2 2010-04-28
1 1 temp 92 2 2010-04-29
2 1 temp 100 2 2010-05-27
2 1 temp 98 2 2010-05-28
2 1 temp 96 2 2010-05-29
3 1 temp 100 2 2010-05-27
3 1 temp 98 2 2010-05-28
3 1 temp . 2 2010-05-29
;
proc sort data=one;
by id test visit ptnum date;
run;
data two;
set one;
by id test visit ptnum date;
if last.ptnum then flag='Y';
run;
output need;
1 1 temp 88 2 2010-04-27
1 1 temp 90 2 2010-04-28
1 1 temp 92 2 2010-04-29 Y
2 1 temp 100 2 2010-05-27
2 1 temp 98 2 2010-05-28
2 1 temp 96 2 2010-05-29 Y
3 1 temp 100 2 2010-05-27
3 1 temp 98 2 2010-05-28 Y
3 1 temp . 2 2010-05-29
OUTPUT GETTING
1 1 temp 88 2 2010-04-27
1 1 temp 90 2 2010-04-28
1 1 temp 92 2 2010-04-29 Y
2 1 temp 100 2 2010-05-27
2 1 temp 98 2 2010-05-28
2 1 temp 96 2 2010-05-29 Y
3 1 temp 100 2 2010-05-27
3 1 temp 98 2 2010-05-28
3 1 temp . 2 2010-05-29 Y
Probably the easiest way is to sort differently:
proc sort data=one;
by id test visit ptnum descending date;
run;
data two;
set one;
by id test visit ptnum descending date;
retain ever_flagged;
if first.ptnum then ever_flagged=0;
if val > . then ever_flagged + 1;
if ever_flagged=1 then do;
flag='Y';
ever_flagged + 1;
end;
drop ever_flagged;
run;
If you would like, you can always put the data back into the order you want:
proc sort data=two;
by id test visit ptnum date;
run;
data one; input id visit test $ val ptnum date $19.; datalines; 1 1 temp 88 2 2010-04-27 1 1 temp 90 2 2010-04-28 1 1 temp 92 2 2010-04-29 2 1 temp 100 2 2010-05-27 2 1 temp 98 2 2010-05-28 2 1 temp 96 2 2010-05-29 3 1 temp 100 2 2010-05-27 3 1 temp 98 2 2010-05-28 3 1 temp . 2 2010-05-29 ; data want; do i=1 by 1 until(last.test); set one; by id visit test; if not missing(val) then n=i; end; do j=1 by 1 until(last.test); set one; by id visit test; flag=' '; if j=n then flag='Y'; output; end; drop n i j; run;
Another option:
data one; input id visit test $ val ptnum date $19.; datalines; 1 1 temp 88 2 2010-04-27 1 1 temp 90 2 2010-04-28 1 1 temp 92 2 2010-04-29 2 1 temp 100 2 2010-05-27 2 1 temp 98 2 2010-05-28 2 1 temp 96 2 2010-05-29 3 1 temp 100 2 2010-05-27 3 1 temp 98 2 2010-05-28 3 1 temp . 2 2010-05-29 ; run; proc sql; create table WANT as select A.*, B.FLAG from WORK.ONE A left join (select ID, VISIT, TEST, PTNUM, max(input(DATE,yymmdd10.)) as MDATE, "Y" as FLAG from (select * from WORK.ONE where VAL ne .) group by ID,VISIT,TEST,PTNUM) B on A.ID=B.ID and A.VISIT=B.VISIT and A.TEST=B.TEST and A.PTNUM=B.PTNUM and A.DATE=put(MDATE,yymmdd10.); quit;
Hellow, it wer me, I would do it like this
data one;
input id visit test $ val ptnum date $19.;
datalines;
1 1 temp 88 2 2010-04-27
1 1 temp 90 2 2010-04-28
1 1 temp 92 2 2010-04-29
2 1 temp 100 2 2010-05-27
2 1 temp 98 2 2010-05-28
2 1 temp 96 2 2010-05-29
3 1 temp 100 2 2010-05-27
3 1 temp 98 2 2010-05-28
3 1 temp . 2 2010-05-29
;
run ;
/* Insert step1 */
data ONE ;
set ONE ;
DUMMY_DATE = ^(missing(VAL)) * input(DATE, yymmdd10.) ;
run ;
proc sort data=one;
by id test visit ptnum DUMMY_DATE;
run;
data two(drop=DUMMY_DATE);
set one;
by id test visit ptnum DUMMY_DATE;
if last.ptnum then flag='Y';
run;
/* Insert step2 */
proc sort data=two;
by id test visit ptnum date;
run;
First, I try to resist the atavistic impulse to sort a data set by descending order as a means of looking ahead, especially if you're going to re-sort by ascending order - very expensive for large data sets.
You want to flag the last non-missing record for each by group:
data v_need / view=v_need;
set one;
by id test visit ptnum;
where val^=.;
if last.ptnum;
flag='Y';
run;
data two;
merge one v_need;
by id test visit ptnum date;
run;
** This looks like two passes through the data, but V_NEED is a data set view, not a data set file. That means it is not written to disk, and is activated until called in the DATA TWO step - at which point it is streamed to the data two step. As a result, in data two, there wlil be two simultaneous streams of input from dataset ONE. In turn that means that ONE is physically being accessed only once, because the disk input blocks for one stream will be cached in memory and used by the other stream, all thanks to your operating system.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.