Solved: Re: Set the same Index for repeated rows (dates)

NewUsrStat · Posted 06-26-2024 06:19 AM

Hi guys,

suppose to have the following dataset DB. Is there a way to get DB1? In other words where there are repeated (identical) dates the Index should be 1. NB: all IDs have only one record (row) where Index = 1.

Thank you in advance

data DB;
  input ID :$20. Admission :date09. Discharge :date09. Index;
  format Admission date9. Discharge date9.;
cards;
0001  13JAN2017 25JAN2017  0 
0001  13JAN2017 25JAN2017  1
0001  22FEB2018 03MAR2018  0   
0001  30JAN2019 04MAR2019  0  
0002  01DEC2018 14DEC2018  1 
0002  01DEC2018 14DEC2018  0 
0003  09JAN2016 25JAN2016  1 
0003  29JAN2018 12FEB2018  0; 
run;


data DB1;
  input ID :$20. Admission :date09. Discharge :date09. Index;
  format Admission date9. Discharge date9.;
cards;
0001  13JAN2017 25JAN2017  1 
0001  13JAN2017 25JAN2017  1
0001  22FEB2018 03MAR2018  0   
0001  30JAN2019 04MAR2019  0  
0002  01DEC2018 14DEC2018  1 
0002  01DEC2018 14DEC2018  1 
0003  09JAN2016 25JAN2016  1 
0003  29JAN2018 12FEB2018  0; 
run;

Tom · Posted 06-26-2024 02:44 PM

@NewUsrStat wrote:
So if the dates are duplicated and at least for one record Index =1 then also index should be duplicated to be 1 otherwise nothing to do.

So if any of the observations in the group have a 1 then value should be one?

That is essentially a MAX() operation across all of the observations in the group.

proc sql ;
create table want as 
  select DB.ID
       , DB.Admission
       , DB.Discharge
       , DB.Index
       , max(DB.Index) as New_Index
  from DB
  group by DB.ID
         , DB.Admission
;
quit;

Result

Note: You might need to make new test data if you want to actually test this new logic. For example a group of observations with the same ID and admission date where none of them currently have INDEX=1.

View solution in original post

PaigeMiller · Posted 06-26-2024 07:20 AM

We have already shown you solutions on two very similar data sets in these threads: (thread 1 thread 2)

Can you show us what you have tried for this problem?

--
Paige Miller

Tom · Posted 06-26-2024 09:44 AM

Are you saying you want to new variable to be TRUE when there are multiple observations for the same ID and admission date?

data DB;
  input ID :$20. Admission :date09. Discharge :date09. Index;
  format Admission Discharge date9.;
cards;
0001  13JAN2017 25JAN2017  0 
0001  13JAN2017 25JAN2017  1
0001  22FEB2018 03MAR2018  0   
0001  30JAN2019 04MAR2019  0  
0002  01DEC2018 14DEC2018  1 
0002  01DEC2018 14DEC2018  0 
0003  09JAN2016 25JAN2016  1 
0003  29JAN2018 12FEB2018  0
; 

data want;
  set db ;
  by id admission ;
  new_flag = not (first.admission and last.admission);
run;

Result

Obs     ID     Admission    Discharge    Index    new_flag

 1     0001    13JAN2017    25JAN2017      0          1
 2     0001    13JAN2017    25JAN2017      1          1
 3     0001    22FEB2018    03MAR2018      0          0
 4     0001    30JAN2019    04MAR2019      0          0
 5     0002    01DEC2018    14DEC2018      1          1
 6     0002    01DEC2018    14DEC2018      0          1
 7     0003    09JAN2016    25JAN2016      1          0
 8     0003    29JAN2018    12FEB2018      0          0

NewUsrStat · Posted 06-26-2024 11:49 AM

Hi Tom thank you for your help.
I tried to apply your code but there are cases where dates are duplicated and Index = 0 that with your code become flagged 1 while they will stay = 0.

Tom · Posted 06-26-2024 12:56 PM

@NewUsrStat wrote:
Hi Tom thank you for your help.
I tried to apply your code but there are cases where dates are duplicated and Index = 0 that with your code become flagged 1 while they will stay = 0.

So you need to actually explain what the rule is for calculating this new variable.

NewUsrStat · Posted 06-26-2024 01:56 PM

So if the dates are duplicated and at least for one record Index =1 then also index should be duplicated to be 1 otherwise nothing to do.

Tom · Posted 06-26-2024 02:44 PM

@NewUsrStat wrote:
So if the dates are duplicated and at least for one record Index =1 then also index should be duplicated to be 1 otherwise nothing to do.

So if any of the observations in the group have a 1 then value should be one?

That is essentially a MAX() operation across all of the observations in the group.

proc sql ;
create table want as 
  select DB.ID
       , DB.Admission
       , DB.Discharge
       , DB.Index
       , max(DB.Index) as New_Index
  from DB
  group by DB.ID
         , DB.Admission
;
quit;

Result

Note: You might need to make new test data if you want to actually test this new logic. For example a group of observations with the same ID and admission date where none of them currently have INDEX=1.