BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
nharuka
Obsidian | Level 7

I have multiple observations per person. What I'm trying to do is if COMPDT = '.' for all rows for that same ID, then I just want to keep one of the obs(they're duplicates in this case). However, if COMPFL='Y' then I want ESDT=COMPDT and ETDT=ETDT where 

DSSEQ =1 for that id. How can I do this?

 

Current data

ID    DSSEQ    ESDT         ETDT          COMPFL   COMPDT
001     1      01Jan2016    01Jan2016       N        .
001     2      01Jan2016    01Jan2016       N        .
002     4      07JUL2014    07JUL2014       N        .  
002     2      13NOV2012    13NOV2012       Y      13NOV2012
002     3      06DEC2018    06DEC2018       N        .
002     1      14FEB2020    14FEB2020       N        .
003     2       25OCT2018   25OCT2018       N        .       
003     1       25OCT2018   25OCT2018       N        .      

Wanted data

ID    DSSEQ    ESDT         ETDT          
001     1      01Jan2016    01Jan2016              
002     2      13NOV2012    14FEB2020            
003     1       25OCT2018   25OCT2018             

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Assuming that every ID has exactly one DSSEQ=1 record, and that any ID which has a non-missing COMPDT has only one such record, then a self-MERGE with a selective KEEP parameter will work just fine: 

 

data want (drop=compfl compdt);
  merge have (where=(dsseq=1))
        have (keep=id dsseq compdt where=(compdt^=.));
  by id;
  if compdt^=. then esdt=compdt;
run;

 

This also assume data are sorted by ID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

1 REPLY 1
mkeintz
PROC Star

Assuming that every ID has exactly one DSSEQ=1 record, and that any ID which has a non-missing COMPDT has only one such record, then a self-MERGE with a selective KEEP parameter will work just fine: 

 

data want (drop=compfl compdt);
  merge have (where=(dsseq=1))
        have (keep=id dsseq compdt where=(compdt^=.));
  by id;
  if compdt^=. then esdt=compdt;
run;

 

This also assume data are sorted by ID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 216 views
  • 1 like
  • 2 in conversation