<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Only keep rows with highest value in one column for each unique pair of values in two other colu in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/938093#M368528</link>
    <description>I used uppercase for WORK and SOMEDATASET and now that part is working for me too. Thank you!</description>
    <pubDate>Fri, 02 Aug 2024 16:04:16 GMT</pubDate>
    <dc:creator>Wickedestjr</dc:creator>
    <dc:date>2024-08-02T16:04:16Z</dc:date>
    <item>
      <title>Only keep rows with highest value in one column for each unique pair of values in two other columns?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/937130#M368222</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a dataset like this with hundreds of additional columns and thousands of rows:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;First_name | Last_name | Version | Total_payment&lt;/P&gt;&lt;P&gt;Scott | Smith | 1 | $100&lt;/P&gt;&lt;P&gt;John | Smith | 3 | $10&lt;/P&gt;&lt;P&gt;Michael | Scott | 2 | $50&lt;/P&gt;&lt;P&gt;Michael | Scott | 3 | $75&lt;/P&gt;&lt;P&gt;Michael | Scott | 4 | $100&lt;/P&gt;&lt;P&gt;Michael | Jordan | 1 | $20&lt;/P&gt;&lt;P&gt;Michael | Jordan | 2 | $50&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm looking for a (hopefully) slick way of updating the table so it only includes rows for the latest version of each unique First/Last name combination, like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;First_name | Last_name | Version | Total_payment&lt;/P&gt;&lt;P&gt;Scott | Smith | 1 | $100&lt;/P&gt;&lt;P&gt;John | Smith | 3 | $10&lt;/P&gt;&lt;P&gt;Michael | Scott | 4 | $100&lt;/P&gt;&lt;P&gt;Michael | Jordan | 2 | $50&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The dataset is also not necessarily sorted in any way like I have it presented here. Is there any simple way of achieving this?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Jul 2024 14:59:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/937130#M368222</guid>
      <dc:creator>Wickedestjr</dc:creator>
      <dc:date>2024-07-25T14:59:03Z</dc:date>
    </item>
    <item>
      <title>Re: Only keep rows with highest value in one column for each unique pair of values in two other colu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/937148#M368228</link>
      <description>&lt;P&gt;If by "I'm looking for a (hopefully) slick way of updating the table so it only includes rows for the latest version of each unique First/Last name combination" and "latest" means "last appearing in the data. The a data step with BY group processing:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data have;
   infile datalines dlm='|';
   informat First_name  Last_name $25. Version 4. Total_payment comma10.;
   input First_name  Last_name  Version  Total_payment ;
   format total_payment dollar10.;
datalines;
Scott | Smith | 1 | $100
John | Smith | 3 | $10
Michael | Scott | 2 | $50
Michael | Scott | 3 | $75
Michael | Scott | 4 | $100
Michael | Jordan | 1 | $20
Michael | Jordan | 2 | $50
;

data want;
   set have;
   by First_name  Last_name notsorted;
   if last.last_name;
run;&lt;/PRE&gt;
&lt;P&gt;The NOTSORTED option allows use of BY variables that are not in sort order. When your data is grouped by the BY variables this works as use of a BY statement creates automatic variables that indicate the First or Last of each group, referenced with Last.variable or First.variable. These variables have numeric values of 1 (true) or 0(false). So the subsetting if is only true for last of the last_name values associated with the first_name.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If your data is &lt;STRONG&gt;not&lt;/STRONG&gt; already grouped that way then I think you would sort the Have by the name variables plus the Version&amp;nbsp; or other variable that is supposed to have the highest value variable prior to the Want data set.&lt;/P&gt;
&lt;P&gt;Note that you did not mention which variable and in your limited example both Version and Total_payment have the "highest value" shown for the wanted output data set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Jul 2024 16:20:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/937148#M368228</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-07-25T16:20:06Z</dc:date>
    </item>
    <item>
      <title>Re: Only keep rows with highest value in one column for each unique pair of values in two other colu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/937153#M368230</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/459807"&gt;@Wickedestjr&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a dataset like this with hundreds of additional columns and thousands of rows:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;First_name | Last_name | Version | Total_payment&lt;/P&gt;
&lt;P&gt;Scott | Smith | 1 | $100&lt;/P&gt;
&lt;P&gt;John | Smith | 3 | $10&lt;/P&gt;
&lt;P&gt;Michael | Scott | 2 | $50&lt;/P&gt;
&lt;P&gt;Michael | Scott | 3 | $75&lt;/P&gt;
&lt;P&gt;Michael | Scott | 4 | $100&lt;/P&gt;
&lt;P&gt;Michael | Jordan | 1 | $20&lt;/P&gt;
&lt;P&gt;Michael | Jordan | 2 | $50&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm looking for a (hopefully) slick way of updating the table so it only includes rows for the latest version of each unique First/Last name combination, like this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;First_name | Last_name | Version | Total_payment&lt;/P&gt;
&lt;P&gt;Scott | Smith | 1 | $100&lt;/P&gt;
&lt;P&gt;John | Smith | 3 | $10&lt;/P&gt;
&lt;P&gt;Michael | Scott | 4 | $100&lt;/P&gt;
&lt;P&gt;Michael | Jordan | 2 | $50&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The dataset is also not necessarily sorted in any way like I have it presented here. Is there any simple way of achieving this?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;If the dataset is "not necessarily sorted", I presume that a given firstname/lastname combination is not always presented as consecutive observations, yes?&amp;nbsp; &amp;nbsp;If so, then what do you mean by "latest"?&amp;nbsp; Do you mean the obs with the highest VERSION value?&amp;nbsp; &amp;nbsp;Or do you simply mean the last obs encountered for a given firstname/lastname?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's code that keeps the observation with the highest VERSION value:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data _null_;
  if _n_=1 then do;
    if 0 then set have;
    declare hash h (dataset:'have (obs=0)',ordered:'a');
      h.definekey('first_name','last_name');
      h.definedata(all:'Y');
      h.definedone();
  end;
  set have (rename=(version=_vers))  end=end_of_have;
  if h.find()^=0 then call missing(version);
  if _vers&amp;gt;version then do;
     version=_vers;
     h.replace();
  end;
  if end_of_have then h.output(dataset:'want');
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If you just want the last obs enountered, then:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
  set have;
  declare hash h (dataset:'have',duplicate:'r',ordered:'a');
    h.definekey('first_name','last_name');
    h.definedata(all:'Y');
    h.definedone();
  h.output(dataset:'want');
  stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Jul 2024 17:50:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/937153#M368230</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2024-07-25T17:50:59Z</dc:date>
    </item>
    <item>
      <title>Re: Only keep rows with highest value in one column for each unique pair of values in two other colu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/937165#M368235</link>
      <description>&lt;P&gt;If you are looking for a truly slick way to do this, you can use PROC SUMMARY.&amp;nbsp; The "trick" is to use the ID statement, as below:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc summary data=have nway;
  class first_name last_name;
  output out=want;
  id version  other...variables...here ;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The ID statement tells proc summary to provide the ID variable with the highest value within each first_name/last_name combination.&amp;nbsp; But instead of a true ID variable you want the highest VERSION value.&amp;nbsp; So put it at the leftmost position in the ID statement, then list all the other variables excluding first_name and last_name.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In your example, there is only one other variable, so you could use:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc summary data=have nway;
  class first_name last_name;
  output out=want;
  id version  total_payment ;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And if the list of other variables is long, you can use PROC SQL to generate it for you, as a macro variable VARLIST:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql noprint;
  select name 
  into :varlist separated by ' '
  from dictionary.columns  
  where libname='WORK' and memname='HAVE'
       and not lowcase(name) in ('first_name','last_name','version')
  ;
quit;
%put &amp;amp;=varlist;

proc summary data=have nway;
  class first_name last_name;
  output out=want;
  id version &amp;amp;varlist  ;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Jul 2024 18:30:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/937165#M368235</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2024-07-25T18:30:43Z</dc:date>
    </item>
    <item>
      <title>Re: Only keep rows with highest value in one column for each unique pair of values in two other colu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/938001#M368493</link>
      <description>&lt;P&gt;Thank you for sharing this. All the proc summary code is working for me. I'm just having trouble running the proc sql that creates the macro variable VARLIST.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I run that portion it says "NOTE: No rows were selected." and "WARNING: Apparent symbolic reference VARLIST not resolved."&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Do you know why I might be experiencing that?&lt;/P&gt;</description>
      <pubDate>Thu, 01 Aug 2024 19:18:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/938001#M368493</guid>
      <dc:creator>Wickedestjr</dc:creator>
      <dc:date>2024-08-01T19:18:52Z</dc:date>
    </item>
    <item>
      <title>Re: Only keep rows with highest value in one column for each unique pair of values in two other colu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/938022#M368502</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/459807"&gt;@Wickedestjr&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Thank you for sharing this. All the proc summary code is working for me. I'm just having trouble running the proc sql that creates the macro variable VARLIST.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When I run that portion it says "NOTE: No rows were selected." and "WARNING: Apparent symbolic reference VARLIST not resolved."&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do you know why I might be experiencing that?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Share your code from the log along with all the messages of creating VARLIST.&lt;/P&gt;
&lt;P&gt;It is extremely likely that you did not provide the correct name of either the LIBNAME or MEMNAME (data set) those two are stored in the dictionary tables in upper case. So if you used LIBNAME='work' and MEMNAME='somedataset' there were no matches. If those were the names of your library and memname the code would be&lt;/P&gt;
&lt;P&gt;LIBNAME='WORK' and MEMNAME='SOMEDATASET'&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;with no matches for the library and data set name then no variable names would be returned and VARLIST would be empty, nothing to "resolve".&lt;/P&gt;</description>
      <pubDate>Thu, 01 Aug 2024 21:46:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/938022#M368502</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-08-01T21:46:22Z</dc:date>
    </item>
    <item>
      <title>Re: Only keep rows with highest value in one column for each unique pair of values in two other colu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/938093#M368528</link>
      <description>I used uppercase for WORK and SOMEDATASET and now that part is working for me too. Thank you!</description>
      <pubDate>Fri, 02 Aug 2024 16:04:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Only-keep-rows-with-highest-value-in-one-column-for-each-unique/m-p/938093#M368528</guid>
      <dc:creator>Wickedestjr</dc:creator>
      <dc:date>2024-08-02T16:04:16Z</dc:date>
    </item>
  </channel>
</rss>

