<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Selecting distinct rows based on one variable but obtaining the rows with the most data in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Selecting-distinct-rows-based-on-one-variable-but-obtaining-the/m-p/783792#M250011</link>
    <description>&lt;P&gt;Create a completeness measure for each row and then take the one with the highest level of completeness.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This of course assumes you cannot 'fix' the data and fill in data that's missing from other rows to form a complete/more complete row instead.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/408641"&gt;@Celina1&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I am wondering what the best approach is for creating a data set that has only one row for each patient, but maintains the row for that patient with the most data. For example I started with this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sql;&lt;BR /&gt;create table table_two&lt;BR /&gt;as select distinct&amp;nbsp; patientid condition diagnosis DateofBirth sex address city zip county from master_file&lt;BR /&gt;quit;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I want one row for every patientid, but if one row is complete (or more complete) for all the variables and another is missing all the demographics for example, I would want to keep the row with the most information.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any guidance on the best approach is greatly appreciated.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 02 Dec 2021 21:52:19 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2021-12-02T21:52:19Z</dc:date>
    <item>
      <title>Selecting distinct rows based on one variable but obtaining the rows with the most data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Selecting-distinct-rows-based-on-one-variable-but-obtaining-the/m-p/783789#M250010</link>
      <description>&lt;P&gt;I am wondering what the best approach is for creating a data set that has only one row for each patient, but maintains the row for that patient with the most data. For example I started with this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sql;&lt;BR /&gt;create table table_two&lt;BR /&gt;as select distinct&amp;nbsp; patientid condition diagnosis DateofBirth sex address city zip county from master_file&lt;BR /&gt;quit;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want one row for every patientid, but if one row is complete (or more complete) for all the variables and another is missing all the demographics for example, I would want to keep the row with the most information.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any guidance on the best approach is greatly appreciated.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Dec 2021 21:39:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Selecting-distinct-rows-based-on-one-variable-but-obtaining-the/m-p/783789#M250010</guid>
      <dc:creator>Celina1</dc:creator>
      <dc:date>2021-12-02T21:39:58Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting distinct rows based on one variable but obtaining the rows with the most data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Selecting-distinct-rows-based-on-one-variable-but-obtaining-the/m-p/783792#M250011</link>
      <description>&lt;P&gt;Create a completeness measure for each row and then take the one with the highest level of completeness.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This of course assumes you cannot 'fix' the data and fill in data that's missing from other rows to form a complete/more complete row instead.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/408641"&gt;@Celina1&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I am wondering what the best approach is for creating a data set that has only one row for each patient, but maintains the row for that patient with the most data. For example I started with this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sql;&lt;BR /&gt;create table table_two&lt;BR /&gt;as select distinct&amp;nbsp; patientid condition diagnosis DateofBirth sex address city zip county from master_file&lt;BR /&gt;quit;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I want one row for every patientid, but if one row is complete (or more complete) for all the variables and another is missing all the demographics for example, I would want to keep the row with the most information.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any guidance on the best approach is greatly appreciated.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Dec 2021 21:52:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Selecting-distinct-rows-based-on-one-variable-but-obtaining-the/m-p/783792#M250011</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-12-02T21:52:19Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting distinct rows based on one variable but obtaining the rows with the most data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Selecting-distinct-rows-based-on-one-variable-but-obtaining-the/m-p/783808#M250019</link>
      <description>&lt;P&gt;Example data and a clear definition of "most data" or "more complete" would be extremely useful for a complete answer.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I can easily envision a data set with 100 variable and records with 95 or more of the variables with values but not complete because one or two specific variables are missing.&lt;/P&gt;
&lt;P&gt;Or that have many variables with the code for "not actually recorded for some reason".&lt;/P&gt;</description>
      <pubDate>Fri, 03 Dec 2021 01:00:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Selecting-distinct-rows-based-on-one-variable-but-obtaining-the/m-p/783808#M250019</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2021-12-03T01:00:21Z</dc:date>
    </item>
  </channel>
</rss>

