BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
nharuka
Obsidian | Level 7

I have multiple observations per person. What I'm trying to do is if COMPDT = '.' for all rows for that same ID, then I just want to keep one of the obs(they're duplicates in this case). However, if COMPFL='Y' then I want ESDT=COMPDT and ETDT=ETDT where 

DSSEQ =1 for that id. How can I do this?

 

Current data

ID    DSSEQ    ESDT         ETDT          COMPFL   COMPDT
001     1      01Jan2016    01Jan2016       N        .
001     2      01Jan2016    01Jan2016       N        .
002     4      07JUL2014    07JUL2014       N        .  
002     2      13NOV2012    13NOV2012       Y      13NOV2012
002     3      06DEC2018    06DEC2018       N        .
002     1      14FEB2020    14FEB2020       N        .
003     2       25OCT2018   25OCT2018       N        .       
003     1       25OCT2018   25OCT2018       N        .      

Wanted data

ID    DSSEQ    ESDT         ETDT          
001     1      01Jan2016    01Jan2016              
002     2      13NOV2012    14FEB2020            
003     1       25OCT2018   25OCT2018             

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Assuming that every ID has exactly one DSSEQ=1 record, and that any ID which has a non-missing COMPDT has only one such record, then a self-MERGE with a selective KEEP parameter will work just fine: 

 

data want (drop=compfl compdt);
  merge have (where=(dsseq=1))
        have (keep=id dsseq compdt where=(compdt^=.));
  by id;
  if compdt^=. then esdt=compdt;
run;

 

This also assume data are sorted by ID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

1 REPLY 1
mkeintz
PROC Star

Assuming that every ID has exactly one DSSEQ=1 record, and that any ID which has a non-missing COMPDT has only one such record, then a self-MERGE with a selective KEEP parameter will work just fine: 

 

data want (drop=compfl compdt);
  merge have (where=(dsseq=1))
        have (keep=id dsseq compdt where=(compdt^=.));
  by id;
  if compdt^=. then esdt=compdt;
run;

 

This also assume data are sorted by ID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 778 views
  • 1 like
  • 2 in conversation