DATA Step, Macro, Functions and more

Finding the second largest number across columns

Reply
Occasional Contributor
Posts: 16

Finding the second largest number across columns

Hi everyone!

 

Conceptually, I've run into an issue with my code and I'm wondering if anyone's done something similar/has an idea how to do this task?  

 

Essentially I have a variable for the first non-fatal event in a month (i.e. Event1_date1) followed by two variables for a second non-fatal event within the month (i.e. Event1_date2) or a fatal event after the first (i.e. FEvent_date1).  My data is organized by person-time for each month of the study.  A sample of the data is below:

 

data test1;
  infile cards dlm=',';
  input ID MonthYear  Event1_date1  Event1_date2  FEvent_date1;
  cards;
1, 1, 3000, , 
1, 2, , ,
1, 3, 3060, , 3100
1, 4, , , 
2, 8, 2090, 3040, 
2, 9, ,
2, 10, , , 3190 3, 1, 2935, 2945, 2965 3, 2, , , ;

I am trying to figure out a way to calculate the time to an event (either non-fatal or fatal) after the initial non-fatal event.  Originally I was thinking of running a datastep that subtracted the first in either Event1_date2 or FEvent_date1, but then realized this would ignore if there's an earlier nonfatal event in the Event1_date1 column (see case 1).

 

Does anyone have an idea how to find the second largest number across 3 variables, then both subtract this from the initial value and count the months between this event and the initial event?

 

This is what I've been working with so far, with little success.

 

data test2;
do until (last.id); 
set test1
2nd_largest=whichn(largest(2,of col(Event1_date1, Event1_date2, FEvent_date1)),of col(Event1_date1, Event1_date2, FEvent_date2)); event2=vname(col(2nd_largest));
end; run;

data test3;
do until (last.id);
set test2;
time2event2=.;
IF first.Event1_date THEN time2event2= Event1_date1 - event2;

time2event2_mth=.;
IF first.id THEN counter=0;
IF counter=0 THEN do;
counter=1;
ELSE if Event1_date1<event2 THEN DO;
counter+1;
END;
end;

 

Super User
Super User
Posts: 7,942

Re: Finding the second largest number across columns

You can simply put them into an array then call sortn() to sort them, hence the second element would be the seoncond smallest if existing:

data test1;
  infile cards dlm=',';
  input ID MonthYear  Event1_date1  Event1_date2  FEvent_date1;
  cards;
1, 1, 3000, , 
1, 2, , ,
1, 3, 3060, , 3100
1, 4, , , 
2, 8, 2090, 3040, 
2, 9, ,
2, 10, , , 3190
3, 1, 2935, 2945, 2965
3, 2, , ,
;
run;
data want;
  set test1;
  array vals{*} event1_date1 event1_date2 fevent_date1;
  call sortn(of vals{*});
  second_large=ifn(vals{2}=.,vals{3},vals{2});
run;

However, that being said, you will likely find your task a lot easier if you normalise your data (have the dates going down the page, rather than across).

Occasional Contributor
Posts: 16

Re: Finding the second largest number across columns

Unfortunately, I can't normalize that way since we're trying to keep the observations in Person-Month (also I can't combine my variables since some represent different types of events).

 

Using this code, would I follow by using the "second_large" variable for the data step to identify time since first event and months since first event? (I'm still stuck on how to get the months since first event to work.)

Super User
Posts: 19,770

Re: Finding the second largest number across columns

For number of months subtract and divide by 365.25/12. 

 

Nmonths = ( Date1- date2) /(365.25/12);

 

'month' is not a standard measure of time as hour or days, so this at least standardizes the definition. 

SAS Super FREQ
Posts: 708

Re: Finding the second largest number across columns

hi

 

I might not have fully understand the task, but there are some handy functions that will help you with this.

 

The example below uses the LARGEST function to the return the largest as well as the second largest values.

Using the WHICHN function will return the position of the value within the array i this matter in any way.

 

data test1;
  infile cards dlm=',';
  input ID MonthYear  Event1_date1  Event1_date2  FEvent_date1;

  array xevents{*} Event1_date1  Event1_date2  FEvent_date1;

  nEvents = n(of xevents{*});
  largestEvent = largest(1, of xevents{*});
  largestEvent2 = largest(2, of xevents{*});
  iLargestEvent = WHICHN(largestEvent, OF xevents{*});
  iLargestEvent2 = WHICHN(largestEvent2, OF xevents{*});

  format event: fevent: largest: date9.;

  cards;
1, 1, 3000, , 
1, 2, , ,
1, 3, 3060, , 3100
1, 4, , , 
2, 8, 2090, 3040, 
2, 9, ,
2, 10, , , 3190
3, 1, 2935, 2945, 2965
3, 2, , ,
;

 

Super User
Posts: 11,343

Re: Finding the second largest number across columns

You should provide an example of the desired output given the exampe input data set. I'm not sure whether you are actually looking across observations or note from the words you are using.

Occasional Contributor
Posts: 16

Re: Finding the second largest number across columns

data want;
  infile cards dlm=',';
  input ID MonthYear  Event1_date1  Event1_date2  FEvent_date1 time2event2 time2event2_mth;
  cards;
1, 1, 3000, , , 60, 2
1, 2, , , , 60, 2
1, 3, 3060, , 3100, 60, 2
1, 4, , , , 60, 2
2, 8, 2090, 3040, , 50, 0
2, 9, , , 50, 0
2, 10, , , 3190, 50, 0 3, 1, 2935, 2945, 2965, 10, 0 3, 2, , , , 10, 0 ;

 

so I would be hoping for something like this- where the variables for time2event2 and time2event2_mth are added.  Time2event2 would calculate the difference between the first date value and the second largest date value for that id and the time2event2_mth would calculate the change in monthyear_r2 variable between the two dates.

 

sorry for the confusion- does that make sense? / is this possible?

Occasional Contributor
Posts: 16

Re: Finding the second largest number across columns

Correction: (changed a couple values in the original dummy dataset)

data want;
  infile cards dlm=',';
  input ID MonthYear  Event1_date1  Event1_date2  FEvent_date1 time2event2 time2event2_mth;
  cards;
1, 1, 3100, , , 60, 2
1, 2, , , , 60, 2
1, 3, 3040, , 3000, 60, 2
1, 4, , , , 60, 2
2, 8, 3040, 3010, , 30, 0
2, 9, , , 50, 0
2, 10, , , 2090, 50, 0 3, 1, 2965, 2955, 2935, 10, 0 3, 2, , , , 10, 0 ;
PROC Star
Posts: 1,759

Re: Finding the second largest number across columns

Your needs are still very confusing and it looks like  your data is wrong.

If I understood, just merge this back:

 

data HAVE;
  infile cards dlm=',';
  input ID MonthYear  Event1_date1  Event1_date2  FEvent_date1;
  cards;
1, 1, 3000,     , 
1, 2,     ,     ,
1, 3, 3060,     , 3100
1, 4,     ,     , 
2, 8, 3010, 3040, 
2, 9,     ,     ,
2, 10,    ,     , 3190
3, 1, 2935, 2945, 2965
3, 2,     ,     ,
run;

* I consider you only ever get 3 events;
data WANT;
  set HAVE;
  by ID;
  retain FIRST_DTE FIRST_MTH;
  if first.ID then do;
    call missing(time2event2,time2event2_mth);
    FIRST_DTE=Event1_date1;
    FIRST_MTH=MonthYear;
  end;
  if Event1_date2 
   | (Event1_date1 & Event1_date1 ne FIRST_DTE) then do;
    time2event2=coalesce(Event1_date2,Event1_date1)-FIRST_DTE;
    time2event2_mth=MonthYear-FIRST_MTH;
    output;
  end;
  keep time2event2 time2event2_MTH;
run;

proc print noobs; run;



time2event2 time2event2_mth
60 2
30 0
10 0
Ask a Question
Discussion stats
  • 8 replies
  • 195 views
  • 0 likes
  • 6 in conversation