BookmarkSubscribeRSS Feed
thanikondharish
Fluorite | Level 6


data ex3 ;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;

I have one data set so now assign 'y' for which is maximum result value by test wise which is come first like see below data set

id

test

visit

result

flag

101

rbc

visit1

222

 

101

rbc

visit2

222

 

101

rbc

visit3

300

y

101

wbc

visit1

222

 

101

wbc

visit2

222

 

101

wbc

visit3

300

y

101

wbc

visit4

300

 

102

rbc

visit1

222

 

102

rbc

visit2

222

 

102

rbc

visit3

300

y

102

wbc

visit1

222

 

102

wbc

visit2

222

 

102

wbc

visit3

300

 

102

wbc

visit4

400

y

102

wbc

visit5

400

 

13 REPLIES 13
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13
data ex3 ;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbx visit1 221
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit6 500
101 wbc visit1 222
105 rbc visit1 222
105 rbc visit2 222
105 rbc visit3 300
109 wbc visit1 222
108 wbc visit2 222
109 wbc visit3 300

;
data ex3_key;
	set ex3;
	mykey = compress(id||test);
run;
proc sort data=ex3_key;
	by mykey;
run;
data want(drop=mykey);
	set ex3_key;
	by mykey;
	if last.mykey then flag = "Y";
run;


the sample data has been modified to create more cases for testing.

 

thanikondharish
Fluorite | Level 6
i got this but we need to give priority first maximum value see second
dataset
Reeza
Super User
@VDD why do you create concatenated keys? The BY statement can take multiple variables.
mkeintz
PROC Star

For each ID/TEST group, you want to determine the maximum result, and then flag the first record having that maximum.

 

You can read each group twice - the first time to determine the maximum, and then, knowing the maximum, reread to flag the first instance of that maximum.  Also during the re-read do the output.

 

data have;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;
data want (drop=maxresult);
  /* First pass - read the group to establish MAXRESULT*/
  do until (last.test);
    set have;
    by id test;
    maxresult=max(maxresult,result);
  end;
  /* Second pass - flag the first instance of result=maxresult*/
  do until (last.test);
    set have;
    by id test;
    if result=maxresult then do;
      flag='Y';
      maxresult=.X; *Set a value that result will never take*;
    end;
    else flag=' ';
    output;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PaigeMiller
Diamond | Level 26

Assuming that your data set is sorted by id test visit

 

proc summary data=ex3;
    class id test;
    var result;
    output out=max_result max=max_result;
run;
data want; 
    merge ex3 max_result;
    by id test;
    prev_result=lag(result);
    if result=max_result and result^=prev_result then flag="Y";
run;
--
Paige Miller
novinosrin
Tourmaline | Level 20
data ex3 ;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;

proc sql;
create table want(drop=mm) as
select *, ifc(visit=min(visit) and mm=result,'Y',' ') as Flag
from (select *,ifn(max(result)=result, max(result),.) as mm from ex3 group by id,test)
group by id, test, mm 
order by id, test, visit;
quit;
Reeza
Super User

You can accomplish this just by BY group processing and sorts. 

 

Sort the data such that the largest values is either on top or on the bottom, so you're sorting by ID, TEST, Result value and then date so that the earliest date is first or last depending on your logic. 

 

data have;
    input id $ test $ visit $ result;
    cards;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;

proc sort data=have;
    by id test result descending visit;
run;

data want;
    set have;
    by id test;

    if last.test then
        flag='Y';
run;

proc sort data=want;
    by id test visit;
run;
DanielLangley
Quartz | Level 8

Hi @Reeza This is the Approach I would use however 'visit10' (if it can exist, I didn't find a mention) would cause some difficulty. Maybe rip the number out and input it for the sort.

 

mkeintz
PROC Star

If your data are already sorted by id/test, and are in the desired order within each group, then it's beneficial to avoid proc sort, which would then need to be followed by a data step, and probably a re-sort into original order.  That's three passes through the data.  And in this case to make the re-sort work, you would need to take preliminary measures.  Editted note: Oops, wrong on the "preliminary measures" requirement.  @SuzanneDorinski shows that a proc sort option negates that need. 

 

Consider interleaving data set HAVE with itself - group by group - a single step, and effectively a single pass through the data.  This is analogous to the double "do until (last.test);" that I offered earlier, but might be a little more self-evident:

 

data have;
    input id $ test $ visit $ result;
    cards;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;
data want (drop=_:);
  set have (in=firstpass) have (in=secondpass);
  by id test;
  retain _maxresult;
  if first.test then _maxresult=result;   /*Was "if first.id"*/
  else if firstpass then _maxresult=max(_maxresult,result);

  if secondpass;
  if result=_maxresult then flag='Y';
  if flag='Y' then _maxresult=.X; /*Assign an impossible value for result*/
run;

 

This program reads each ID/TEST group twice.  The first pass generates the retained variable _maxresult.  The second passes uses _maxresult to identify the flagged record.   Note the "if secondpass;" is a subsetting if statement, so only the secondpass records are output, meaning no duplicate records.

 

Although there are two "passes" through the data, you won't be doubling the disk channel activity.  Since the two passes are tightly synchronized, they are almost always reading observations from the same disk block, which would have been cached in memory by the operating system.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SuzanneDorinski
Lapis Lazuli | Level 10

PROC SORT has some options that might help.  You can try SORTSEQ=LINGUISTIC(NUMERIC_COLLATION=ON).

 

Below I modified @Reeza's example to use linguistic sorting.  Note that I changed two rows in the input to see if this works.

 

data have;
    input id $ test $ visit $ result;
    cards;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit14 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit115 400
;

proc sort data=have sortseq=linguistic(numeric_collation=on);
    by id test result descending visit;
run;

data want;
    set have;
    by id test;

    if last.test then
        flag='Y';
run;

proc sort data=want sortseq=linguistic(numeric_collation=on);
    by id test visit;
run;

proc print data=want;
run;

 

ballardw
Super User

@DanielLangley wrote:

Hi @Reeza This is the Approach I would use however 'visit10' (if it can exist, I didn't find a mention) would cause some difficulty. Maybe rip the number out and input it for the sort.

 


Proc Sort has the option SORTSEQ =Linguistic with the sub-option of Numeric_collation to sort such moderately poorly designed data values.

 

data example;
   input val $;
datalines;
visit10
visit3
visit1
visit100
;

proc sort data=example sortseq=linguistic (numeric_collation=on);
  by val;
run;
Jagadishkatam
Amethyst | Level 16

there are different approaches mentioned by experts, but not first. so I tried with first. 

 

data have;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;

proc sort data=have;
by id test descending result visit;
run;

data want;
set have;
by id test descending result visit;
if first.test then flag='Y';
run;

proc sort data=want;
by id test visit;
run;

Thanks,
Jag
Ksharp
Super User
data have;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;
data want;
 do until(last.test); 
  set have ;
  by id test notsorted;
  max=max(max,result);
 end;
 do until(last.test); 
  set have ;
  by id test notsorted;
  flag=' ';
  if result=max and not found then do;flag='Y';found=1;end;
  output;
 end;
 drop max found;
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 3178 views
  • 10 likes
  • 11 in conversation