data ex3 ;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;
I have one data set so now assign 'y' for which is maximum result value by test wise which is come first like see below data set
id | test | visit | result | flag |
101 | rbc | visit1 | 222 |
|
101 | rbc | visit2 | 222 |
|
101 | rbc | visit3 | 300 | y |
101 | wbc | visit1 | 222 |
|
101 | wbc | visit2 | 222 |
|
101 | wbc | visit3 | 300 | y |
101 | wbc | visit4 | 300 |
|
102 | rbc | visit1 | 222 |
|
102 | rbc | visit2 | 222 |
|
102 | rbc | visit3 | 300 | y |
102 | wbc | visit1 | 222 |
|
102 | wbc | visit2 | 222 |
|
102 | wbc | visit3 | 300 |
|
102 | wbc | visit4 | 400 | y |
102 | wbc | visit5 | 400 |
|
data ex3 ;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbx visit1 221
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit6 500
101 wbc visit1 222
105 rbc visit1 222
105 rbc visit2 222
105 rbc visit3 300
109 wbc visit1 222
108 wbc visit2 222
109 wbc visit3 300
;
data ex3_key;
set ex3;
mykey = compress(id||test);
run;
proc sort data=ex3_key;
by mykey;
run;
data want(drop=mykey);
set ex3_key;
by mykey;
if last.mykey then flag = "Y";
run;
the sample data has been modified to create more cases for testing.
For each ID/TEST group, you want to determine the maximum result, and then flag the first record having that maximum.
You can read each group twice - the first time to determine the maximum, and then, knowing the maximum, reread to flag the first instance of that maximum. Also during the re-read do the output.
data have;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;
data want (drop=maxresult);
/* First pass - read the group to establish MAXRESULT*/
do until (last.test);
set have;
by id test;
maxresult=max(maxresult,result);
end;
/* Second pass - flag the first instance of result=maxresult*/
do until (last.test);
set have;
by id test;
if result=maxresult then do;
flag='Y';
maxresult=.X; *Set a value that result will never take*;
end;
else flag=' ';
output;
end;
run;
Assuming that your data set is sorted by id test visit
proc summary data=ex3;
class id test;
var result;
output out=max_result max=max_result;
run;
data want;
merge ex3 max_result;
by id test;
prev_result=lag(result);
if result=max_result and result^=prev_result then flag="Y";
run;
data ex3 ;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;
proc sql;
create table want(drop=mm) as
select *, ifc(visit=min(visit) and mm=result,'Y',' ') as Flag
from (select *,ifn(max(result)=result, max(result),.) as mm from ex3 group by id,test)
group by id, test, mm
order by id, test, visit;
quit;
You can accomplish this just by BY group processing and sorts.
Sort the data such that the largest values is either on top or on the bottom, so you're sorting by ID, TEST, Result value and then date so that the earliest date is first or last depending on your logic.
data have;
input id $ test $ visit $ result;
cards;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;
proc sort data=have;
by id test result descending visit;
run;
data want;
set have;
by id test;
if last.test then
flag='Y';
run;
proc sort data=want;
by id test visit;
run;
Hi @Reeza This is the Approach I would use however 'visit10' (if it can exist, I didn't find a mention) would cause some difficulty. Maybe rip the number out and input it for the sort.
If your data are already sorted by id/test, and are in the desired order within each group, then it's beneficial to avoid proc sort, which would then need to be followed by a data step, and probably a re-sort into original order. That's three passes through the data. And in this case to make the re-sort work, you would need to take preliminary measures. Editted note: Oops, wrong on the "preliminary measures" requirement. @SuzanneDorinski shows that a proc sort option negates that need.
Consider interleaving data set HAVE with itself - group by group - a single step, and effectively a single pass through the data. This is analogous to the double "do until (last.test);" that I offered earlier, but might be a little more self-evident:
data have;
input id $ test $ visit $ result;
cards;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;
data want (drop=_:);
set have (in=firstpass) have (in=secondpass);
by id test;
retain _maxresult;
if first.test then _maxresult=result; /*Was "if first.id"*/
else if firstpass then _maxresult=max(_maxresult,result);
if secondpass;
if result=_maxresult then flag='Y';
if flag='Y' then _maxresult=.X; /*Assign an impossible value for result*/
run;
This program reads each ID/TEST group twice. The first pass generates the retained variable _maxresult. The second passes uses _maxresult to identify the flagged record. Note the "if secondpass;" is a subsetting if statement, so only the secondpass records are output, meaning no duplicate records.
Although there are two "passes" through the data, you won't be doubling the disk channel activity. Since the two passes are tightly synchronized, they are almost always reading observations from the same disk block, which would have been cached in memory by the operating system.
PROC SORT has some options that might help. You can try SORTSEQ=LINGUISTIC(NUMERIC_COLLATION=ON).
Below I modified @Reeza's example to use linguistic sorting. Note that I changed two rows in the input to see if this works.
data have;
input id $ test $ visit $ result;
cards;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit14 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit115 400
;
proc sort data=have sortseq=linguistic(numeric_collation=on);
by id test result descending visit;
run;
data want;
set have;
by id test;
if last.test then
flag='Y';
run;
proc sort data=want sortseq=linguistic(numeric_collation=on);
by id test visit;
run;
proc print data=want;
run;
@DanielLangley wrote:
Hi @Reeza This is the Approach I would use however 'visit10' (if it can exist, I didn't find a mention) would cause some difficulty. Maybe rip the number out and input it for the sort.
Proc Sort has the option SORTSEQ =Linguistic with the sub-option of Numeric_collation to sort such moderately poorly designed data values.
data example; input val $; datalines; visit10 visit3 visit1 visit100 ; proc sort data=example sortseq=linguistic (numeric_collation=on); by val; run;
there are different approaches mentioned by experts, but not first. so I tried with first.
data have;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;
proc sort data=have;
by id test descending result visit;
run;
data want;
set have;
by id test descending result visit;
if first.test then flag='Y';
run;
proc sort data=want;
by id test visit;
run;
data have;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;
data want;
do until(last.test);
set have ;
by id test notsorted;
max=max(max,result);
end;
do until(last.test);
set have ;
by id test notsorted;
flag=' ';
if result=max and not found then do;flag='Y';found=1;end;
output;
end;
drop max found;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.