calculate first maximum result

thanikondharish · Posted 12-25-2018 10:22 AM

data ex3 ;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;

I have one data set so now assign 'y' for which is maximum result value by test wise which is come first like see below data set

id	test	visit	result	flag
101	rbc	visit1	222
101	rbc	visit2	222
101	rbc	visit3	300	y
101	wbc	visit1	222
101	wbc	visit2	222
101	wbc	visit3	300	y
101	wbc	visit4	300
102	rbc	visit1	222
102	rbc	visit2	222
102	rbc	visit3	300	y
102	wbc	visit1	222
102	wbc	visit2	222
102	wbc	visit3	300
102	wbc	visit4	400	y
102	wbc	visit5	400

VDD · Posted 12-25-2018 11:09 AM

data ex3 ;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbx visit1 221
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit6 500
101 wbc visit1 222
105 rbc visit1 222
105 rbc visit2 222
105 rbc visit3 300
109 wbc visit1 222
108 wbc visit2 222
109 wbc visit3 300

;
data ex3_key;
	set ex3;
	mykey = compress(id||test);
run;
proc sort data=ex3_key;
	by mykey;
run;
data want(drop=mykey);
	set ex3_key;
	by mykey;
	if last.mykey then flag = "Y";
run;

the sample data has been modified to create more cases for testing.

thanikondharish · Posted 12-25-2018 11:29 AM

i got this but we need to give priority first maximum value see second
dataset

Reeza · Posted 12-25-2018 02:41 PM

@VDD why do you create concatenated keys? The BY statement can take multiple variables.

mkeintz · Posted 12-25-2018 12:38 PM

For each ID/TEST group, you want to determine the maximum result, and then flag the first record having that maximum.

You can read each group twice - the first time to determine the maximum, and then, knowing the maximum, reread to flag the first instance of that maximum. Also during the re-read do the output.

data have;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;
data want (drop=maxresult);
  /* First pass - read the group to establish MAXRESULT*/
  do until (last.test);
    set have;
    by id test;
    maxresult=max(maxresult,result);
  end;
  /* Second pass - flag the first instance of result=maxresult*/
  do until (last.test);
    set have;
    by id test;
    if result=maxresult then do;
      flag='Y';
      maxresult=.X; *Set a value that result will never take*;
    end;
    else flag=' ';
    output;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

PaigeMiller · Posted 12-25-2018 01:02 PM

Assuming that your data set is sorted by id test visit

proc summary data=ex3;
    class id test;
    var result;
    output out=max_result max=max_result;
run;
data want; 
    merge ex3 max_result;
    by id test;
    prev_result=lag(result);
    if result=max_result and result^=prev_result then flag="Y";
run;

--
Paige Miller

novinosrin · Posted 12-25-2018 01:07 PM

data ex3 ;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;

proc sql;
create table want(drop=mm) as
select *, ifc(visit=min(visit) and mm=result,'Y',' ') as Flag
from (select *,ifn(max(result)=result, max(result),.) as mm from ex3 group by id,test)
group by id, test, mm 
order by id, test, visit;
quit;

Reeza · Posted 12-25-2018 02:50 PM

You can accomplish this just by BY group processing and sorts.

Sort the data such that the largest values is either on top or on the bottom, so you're sorting by ID, TEST, Result value and then date so that the earliest date is first or last depending on your logic.

data have;
    input id $ test $ visit $ result;
    cards;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;

proc sort data=have;
    by id test result descending visit;
run;

data want;
    set have;
    by id test;

    if last.test then
        flag='Y';
run;

proc sort data=want;
    by id test visit;
run;

DanielLangley · Posted 12-27-2018 09:31 AM

Hi @Reeza This is the Approach I would use however 'visit10' (if it can exist, I didn't find a mention) would cause some difficulty. Maybe rip the number out and input it for the sort.

mkeintz · Posted 12-27-2018 12:40 PM

If your data are already sorted by id/test, and are in the desired order within each group, then it's beneficial to avoid proc sort, which would then need to be followed by a data step, and probably a re-sort into original order. That's three passes through the data. And in this case to make the re-sort work, you would need to take preliminary measures. Editted note: Oops, wrong on the "preliminary measures" requirement. @SuzanneDorinski shows that a proc sort option negates that need.

Consider interleaving data set HAVE with itself - group by group - a single step, and effectively a single pass through the data. This is analogous to the double "do until (last.test);" that I offered earlier, but might be a little more self-evident:

data have;
    input id $ test $ visit $ result;
    cards;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;
data want (drop=_:);
  set have (in=firstpass) have (in=secondpass);
  by id test;
  retain _maxresult;
  if first.test then _maxresult=result;   /*Was "if first.id"*/
  else if firstpass then _maxresult=max(_maxresult,result);

  if secondpass;
  if result=_maxresult then flag='Y';
  if flag='Y' then _maxresult=.X; /*Assign an impossible value for result*/
run;

This program reads each ID/TEST group twice. The first pass generates the retained variable _maxresult. The second passes uses _maxresult to identify the flagged record. Note the "if secondpass;" is a subsetting if statement, so only the secondpass records are output, meaning no duplicate records.

Although there are two "passes" through the data, you won't be doubling the disk channel activity. Since the two passes are tightly synchronized, they are almost always reading observations from the same disk block, which would have been cached in memory by the operating system.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SuzanneDorinski · Posted 12-27-2018 01:29 PM

PROC SORT has some options that might help. You can try SORTSEQ=LINGUISTIC(NUMERIC_COLLATION=ON).

Below I modified @Reeza's example to use linguistic sorting. Note that I changed two rows in the input to see if this works.

data have;
    input id $ test $ visit $ result;
    cards;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit14 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit115 400
;

proc sort data=have sortseq=linguistic(numeric_collation=on);
    by id test result descending visit;
run;

data want;
    set have;
    by id test;

    if last.test then
        flag='Y';
run;

proc sort data=want sortseq=linguistic(numeric_collation=on);
    by id test visit;
run;

proc print data=want;
run;

ballardw · Posted 12-31-2018 12:35 PM

@DanielLangley wrote:

Hi @Reeza This is the Approach I would use however 'visit10' (if it can exist, I didn't find a mention) would cause some difficulty. Maybe rip the number out and input it for the sort.

Proc Sort has the option SORTSEQ =Linguistic with the sub-option of Numeric_collation to sort such moderately poorly designed data values.

data example;
   input val $;
datalines;
visit10
visit3
visit1
visit100
;

proc sort data=example sortseq=linguistic (numeric_collation=on);
  by val;
run;

Jagadishkatam · Posted 12-25-2018 03:04 PM

there are different approaches mentioned by experts, but not first. so I tried with first.

data have;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;

proc sort data=have;
by id test descending result visit;
run;

data want;
set have;
by id test descending result visit;
if first.test then flag='Y';
run;

proc sort data=want;
by id test visit;
run;

Thanks,
Jag

Ksharp · Posted 12-26-2018 07:26 AM

data have;
input id $ test $ visit $ result ;
cards ;
101 rbc visit1 222
101 rbc visit2 222
101 rbc visit3 300
101 wbc visit1 222
101 wbc visit2 222
101 wbc visit3 300
101 wbc visit4 300
102 rbc visit1 222
102 rbc visit2 222
102 rbc visit3 300
102 wbc visit1 222
102 wbc visit2 222
102 wbc visit3 300
102 wbc visit4 400
102 wbc visit5 400
;
data want;
 do until(last.test); 
  set have ;
  by id test notsorted;
  max=max(max,result);
 end;
 do until(last.test); 
  set have ;
  by id test notsorted;
  flag=' ';
  if result=max and not found then do;flag='Y';found=1;end;
  output;
 end;
 drop max found;
run;

calculate first maximum result

Re: calculate first maximum result

Re: calculate first maximum result

Re: calculate first maximum result

Re: calculate first maximum result

Re: calculate first maximum result

Re: calculate first maximum result

Re: calculate first maximum result

Re: calculate first maximum result

Re: calculate first maximum result

Re: calculate first maximum result

Re: calculate first maximum result

Re: calculate first maximum result

Re: calculate first maximum result

Registration is open

Registration is open

SAS Training: Just a Click Away