<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to find irrelevant rows and delete them? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-irrelevant-rows-and-delete-them/m-p/917893#M361576</link>
    <description>I don't fully understand the logic for the 2 scenarios. Please provide additional details. In the 2 scenarios you describe there are always 2 ID(rows). Is that the limit or could there be 3 or more rows with the same ID?&lt;BR /&gt;Is the full table sorted in any way? If yes, how?&lt;BR /&gt;Would it be OK to sort the dataset have by ID and Event_Date before eliminating rows?</description>
    <pubDate>Mon, 26 Feb 2024 11:42:15 GMT</pubDate>
    <dc:creator>JosvanderVelden</dc:creator>
    <dc:date>2024-02-26T11:42:15Z</dc:date>
    <item>
      <title>How to find irrelevant rows and delete them?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-irrelevant-rows-and-delete-them/m-p/917878#M361567</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;I have Data about PHI claims. that contain next variables:&lt;BR /&gt;ID - Id of the client,&lt;BR /&gt;Event_Date - The date when claim started,&lt;BR /&gt;QTR - the quarter of the Event_date,&lt;BR /&gt;Payment_To - The date the claim was paud until,&lt;BR /&gt;Monthes_To_Pay - how much monthes the claim survived.&lt;/P&gt;
&lt;P&gt;This data is pretty dirty, because there is not always inpyted the correct evemt_day, sometimes this date is wrong or fictive.&lt;BR /&gt;I need to clean this up.&lt;/P&gt;
&lt;P&gt;Here is the example of data:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Data Have;
Input ID Event_Date QTR Payment_To Monthes_To_Pay ;
datalines;
251595 23/03/2011 01/03/2011 21/09/2011 6
277681 20/06/2004 01/06/2004 30/04/2005 10
277681 15/12/2004 01/12/2004 28/02/2005 2
283044 21/11/2004 01/12/2004 31/07/2006 20
285726 17/01/2008 01/03/2008 31/01/2009 12
311449 01/11/2004 01/12/2004 31/12/2005 13
311449 03/02/2005 01/03/2005 31/07/2005 5
253333 07/11/2005 01/12/2005 31/10/2010 59
253333 12/12/2007 01/12/2007 30/09/2011 45

;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;BR /&gt;As we can see For IDs 277681 and 311449 there are two rows for each.&lt;BR /&gt;the event_date of the second row of each ID is within payment period and Payment_to as well, and so we conclude that event date in the second row is wrong.&lt;BR /&gt;I need to find out such rows and delete them because they are not relevant to my data.&lt;/P&gt;
&lt;P&gt;For ID 253333 there is another story - we have two event dates, where second date within payment period of first event.&lt;BR /&gt;For such rows I need to keep only the first event(it happened first) and assighn the latest Payment_to as it is continious claim.&lt;/P&gt;
&lt;P&gt;Want:&lt;BR /&gt;ID |Event_Date |QTR |Payment_To |Monthes_To_Pay &lt;BR /&gt;-------|---------------|---------------|---------------|--------------&lt;BR /&gt;251595 |23/03/2011 |01/03/2011 |21/09/2011 |6&lt;BR /&gt;277681 |20/06/2004 |01/06/2004 |30/04/2005 |10&lt;BR /&gt;283044 |21/11/2004 |01/12/2004 |31/07/2006 |20&lt;BR /&gt;285726 |17/01/2008 |01/03/2008 |31/01/2009 |12&lt;BR /&gt;311449 |01/11/2004 |01/12/2004 |31/12/2005 |13&lt;BR /&gt;253333 |07/11/2005 |01/12/2005 |30/09/2011 |70&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The full table contain up to 100,000 rows.&lt;/P&gt;
&lt;P&gt;Any help how can I handele these two scenarios?&lt;BR /&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Feb 2024 09:00:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-find-irrelevant-rows-and-delete-them/m-p/917878#M361567</guid>
      <dc:creator>IgorR</dc:creator>
      <dc:date>2024-02-26T09:00:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to find irrelevant rows and delete them?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-irrelevant-rows-and-delete-them/m-p/917893#M361576</link>
      <description>I don't fully understand the logic for the 2 scenarios. Please provide additional details. In the 2 scenarios you describe there are always 2 ID(rows). Is that the limit or could there be 3 or more rows with the same ID?&lt;BR /&gt;Is the full table sorted in any way? If yes, how?&lt;BR /&gt;Would it be OK to sort the dataset have by ID and Event_Date before eliminating rows?</description>
      <pubDate>Mon, 26 Feb 2024 11:42:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-find-irrelevant-rows-and-delete-them/m-p/917893#M361576</guid>
      <dc:creator>JosvanderVelden</dc:creator>
      <dc:date>2024-02-26T11:42:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to find irrelevant rows and delete them?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-irrelevant-rows-and-delete-them/m-p/917929#M361588</link>
      <description>&lt;P&gt;You should provide example data in a data step that runs correctly.&lt;/P&gt;
&lt;P&gt;Example of running your code:&lt;/P&gt;
&lt;PRE&gt;138  Data Have;
139  Input ID Event_Date QTR Payment_To Monthes_To_Pay ;
140  datalines;

NOTE: Invalid data for Event_Date in line 141 8-17.
NOTE: Invalid data for QTR in line 141 19-28.
NOTE: Invalid data for Payment_To in line 141 30-39.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----
141        251595 23/03/2011 01/03/2011 21/09/2011 6
ID=251595 Event_Date=. QTR=. Payment_To=. Monthes_To_Pay=6 _ERROR_=1 _N_=1
NOTE: Invalid data for Event_Date in line 142 8-17.
NOTE: Invalid data for QTR in line 142 19-28.
NOTE: Invalid data for Payment_To in line 142 30-39.
142        277681 20/06/2004 01/06/2004 30/04/2005 10

&lt;/PRE&gt;
&lt;P&gt;Your input statement attempts to read dates a simple numbers erroneously.&lt;/P&gt;
&lt;P&gt;If your values are actual dates then you should include an informat to read them properly and format to display them as needed.&lt;/P&gt;
&lt;PRE&gt;Data Have;
Input ID Event_Date :ddmmyy10. QTR :ddmmyy10.  Payment_To :ddmmyy10. Monthes_To_Pay ;
format Event_date QTR Payment_to ddmmyy10. ;
datalines;
251595 23/03/2011 01/03/2011 21/09/2011 6
277681 20/06/2004 01/06/2004 30/04/2005 10
277681 15/12/2004 01/12/2004 28/02/2005 2
283044 21/11/2004 01/12/2004 31/07/2006 20
285726 17/01/2008 01/03/2008 31/01/2009 12
311449 01/11/2004 01/12/2004 31/12/2005 13
311449 03/02/2005 01/03/2005 31/07/2005 5
253333 07/11/2005 01/12/2005 31/10/2010 59
253333 12/12/2007 01/12/2007 30/09/2011 45
;
run;&lt;/PRE&gt;
&lt;P&gt;Which does unfortunately bring up the question of your variable values. Are they actually dates or strings?&lt;/P&gt;</description>
      <pubDate>Mon, 26 Feb 2024 18:15:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-find-irrelevant-rows-and-delete-them/m-p/917929#M361588</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-02-26T18:15:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to find irrelevant rows and delete them?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-irrelevant-rows-and-delete-them/m-p/917931#M361590</link>
      <description>&lt;P&gt;Let's make sure I fully understand your request.&amp;nbsp;&amp;nbsp;Here is what I think you mean&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;If the payment_to date of the current obs follows the event_date of the next obs, then if neccessary, update the payment_to_date (and monthestopay) in the current obs.&amp;nbsp; "If neccessary mean if the next payment_to date comes after the current payment_to date.&amp;nbsp; You'll see the code for this using the data from the MERGE statement.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;If the current event_date precedes the previous payment_to date, then delete.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;The code below:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Assumes all obs for a given ID to be in a single group.&lt;/LI&gt;
&lt;LI&gt;Obs within each ID to be sorted by EVENT_DATE.&lt;/LI&gt;
&lt;LI&gt;Does not accommodate more than 2 consecutive overlapping observations.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  input ID Event_Date QTR Payment_To Monthes_To_Pay ;
  informat event_date qtr payment_to ddmmyy10.;
  format  event_date qtr payment_to date9.  ddmmyy10. ;
datalines;
251595 23/03/2011 01/03/2011 21/09/2011 6
277681 20/06/2004 01/06/2004 30/04/2005 10
277681 15/12/2004 01/12/2004 28/02/2005 2
283044 21/11/2004 01/12/2004 31/07/2006 20
285726 17/01/2008 01/03/2008 31/01/2009 12
311449 01/11/2004 01/12/2004 31/12/2005 13
311449 03/02/2005 01/03/2005 31/07/2005 5
253333 07/11/2005 01/12/2005 31/10/2010 59
253333 12/12/2007 01/12/2007 30/09/2011 45
run;


data want;
  set have (keep=id);
  by id notsorted;
  merge /*Current obs */ have 
        /*Next obs    */ have  (firstobs=2 keep=event_date payment_to 
                             rename=(event_date=nxt_ev payment_to=nxt_pt));

  if last.id=0 and nxt_ev &amp;lt;= payment_to then do;  /*Check whether payment_to should be extended?*/
    payment_to=max(payment_to,nxt_pt);
    monthes_to_pay=intck('month',event_date,payment_to);
  end;

  if first.id=0 and event_date&amp;lt;=lag(payment_to) then delete;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;BTW,&amp;nbsp;thanks for trying to provide sample data in the form of a DATA step.&amp;nbsp; But please make sure the DATA step works, by testing your code before submitting.&amp;nbsp; When I ran your sample code, it generated missing values for all the date variables, due to the absence informat specifications for the INPUT ting the date variables.&amp;nbsp; &amp;nbsp; I've put the informats in my code above.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Feb 2024 18:17:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-find-irrelevant-rows-and-delete-them/m-p/917931#M361590</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2024-02-26T18:17:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to find irrelevant rows and delete them?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-irrelevant-rows-and-delete-them/m-p/917988#M361615</link>
      <description>&lt;P&gt;I have added a final row to your sample data set.&amp;nbsp; As a result the id&amp;nbsp;253333 has three consecutive obs that overlap to make a single date range.&amp;nbsp; This code will accommodate that situation:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  input ID Event_Date QTR Payment_To Monthes_To_Pay ;
  informat event_date qtr payment_to ddmmyy10.;
  format  event_date qtr payment_to ddmmyy10. ;
datalines;
251595 23/03/2011 01/03/2011 21/09/2011 6
277681 20/06/2004 01/06/2004 30/04/2005 10
277681 15/12/2004 01/12/2004 28/02/2005 2
283044 21/11/2004 01/12/2004 31/07/2006 20
285726 17/01/2008 01/03/2008 31/01/2009 12
311449 01/11/2004 01/12/2004 31/12/2005 13
311449 03/02/2005 01/03/2005 31/07/2005 5
253333 07/11/2005 01/12/2005 31/10/2010 59
253333 12/12/2007 01/12/2007 30/09/2011 45
253333 12/12/2008 01/12/2008 30/09/2012 45
run;

data want (drop=i n nxt_: extended_:);
  /*Read all overlaps and generate extended payment_to */
  do n=1 by 1 until (last.id or nxt_ev&amp;gt;extended_pt);
    set have (keep=id);
    by id notsorted;
    merge /*Current obs */ have 
          /*Next obs    */ have  (firstobs=2 keep=event_date payment_to 
                           rename=(event_date=nxt_ev payment_to=nxt_pt)) ;
    if extended_pt=. then extended_pt=payment_to;
    if last.id=0 and nxt_ev&amp;lt;=extended_pt then extended_pt=max(extended_pt,nxt_pt);
  end;
  /*Reread the same N obs, output only the first, but with the extended payment_to */
  do i=1 to n;
    set have;
    if i=1 then do;
      payment_to=extended_pt;
      monthes_to_pay=intck('month',event_date,payment_to);
      output;
    end;
  end;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 27 Feb 2024 03:11:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-find-irrelevant-rows-and-delete-them/m-p/917988#M361615</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2024-02-27T03:11:39Z</dc:date>
    </item>
  </channel>
</rss>

