<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic If IF condition satisfied anywhere in a group, then last observation of that group gets tagged in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/If-IF-condition-satisfied-anywhere-in-a-group-then-last/m-p/867557#M342647</link>
    <description>&lt;P&gt;Hi! I'm a first time poster, but 10 minutes of online searching didn't yield a similar result, so I apologize if I just failed at finding a similar, past post w/ a solution.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a dataset called 'untaggedData' that is comprised of two columns: 'id' and 'year'. I've sorted it by id first and then by year (descending). Most id's have multiple years.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My goal is to label the oldest year (bottom row) for any given id where the label tells me whether or not a &amp;gt;= 2 year difference occurs in any consecutive years for that given id.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;While I have years of coding experience in other languages like Python and VBA, I'm new to SAS and its many unfamiliar mannerisms. The following code seemed like a line-efficient way to accomplish my goal, but I'm discovering that mixing the lag() function with IF-THEN statements is not intuitive.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;data taggedData;
	set work.untaggedData;
	category = "A";
	if lag(id) = id and lag(year) - year &amp;gt;= 2 then category = "B";
	if lag(id) = id and lag(category) = "B" then category = "B";
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Here's a suitable mock-up of the result my inadequate code:&lt;/P&gt;&lt;TABLE border="1"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;ID&lt;/TD&gt;&lt;TD&gt;YEAR&lt;/TD&gt;&lt;TD&gt;CATEGORY&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2022&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2021&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2006&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2006&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2005&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2003&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2003&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2003&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2002&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;2022&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;2021&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;So I don't understand why SAS is not labeling all subsequent rows (per ID) with "B" after it occurs the first time?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Though my primary question is how to achieve my goal? How would you go about labeling&amp;nbsp;the oldest year (bottom row) for any given id so that it says whether or not a &amp;gt;= 2 year difference occurs in any consecutive years for that given id?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Zach&lt;/P&gt;</description>
    <pubDate>Fri, 31 Mar 2023 20:11:12 GMT</pubDate>
    <dc:creator>zcoop</dc:creator>
    <dc:date>2023-03-31T20:11:12Z</dc:date>
    <item>
      <title>If IF condition satisfied anywhere in a group, then last observation of that group gets tagged</title>
      <link>https://communities.sas.com/t5/SAS-Programming/If-IF-condition-satisfied-anywhere-in-a-group-then-last/m-p/867557#M342647</link>
      <description>&lt;P&gt;Hi! I'm a first time poster, but 10 minutes of online searching didn't yield a similar result, so I apologize if I just failed at finding a similar, past post w/ a solution.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a dataset called 'untaggedData' that is comprised of two columns: 'id' and 'year'. I've sorted it by id first and then by year (descending). Most id's have multiple years.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My goal is to label the oldest year (bottom row) for any given id where the label tells me whether or not a &amp;gt;= 2 year difference occurs in any consecutive years for that given id.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;While I have years of coding experience in other languages like Python and VBA, I'm new to SAS and its many unfamiliar mannerisms. The following code seemed like a line-efficient way to accomplish my goal, but I'm discovering that mixing the lag() function with IF-THEN statements is not intuitive.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;data taggedData;
	set work.untaggedData;
	category = "A";
	if lag(id) = id and lag(year) - year &amp;gt;= 2 then category = "B";
	if lag(id) = id and lag(category) = "B" then category = "B";
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Here's a suitable mock-up of the result my inadequate code:&lt;/P&gt;&lt;TABLE border="1"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;ID&lt;/TD&gt;&lt;TD&gt;YEAR&lt;/TD&gt;&lt;TD&gt;CATEGORY&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2022&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2021&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2006&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2006&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2005&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2003&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2003&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2003&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2002&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;2022&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;2021&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;So I don't understand why SAS is not labeling all subsequent rows (per ID) with "B" after it occurs the first time?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Though my primary question is how to achieve my goal? How would you go about labeling&amp;nbsp;the oldest year (bottom row) for any given id so that it says whether or not a &amp;gt;= 2 year difference occurs in any consecutive years for that given id?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Zach&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2023 20:11:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/If-IF-condition-satisfied-anywhere-in-a-group-then-last/m-p/867557#M342647</guid>
      <dc:creator>zcoop</dc:creator>
      <dc:date>2023-03-31T20:11:12Z</dc:date>
    </item>
    <item>
      <title>Re: If IF condition satisfied anywhere in a group, then last observation of that group gets tagged</title>
      <link>https://communities.sas.com/t5/SAS-Programming/If-IF-condition-satisfied-anywhere-in-a-group-then-last/m-p/867565#M342652</link>
      <description>&lt;P&gt;The LAG() function sets up a queue, where an element is taken from the "top" and a new one inserted at the "bottom"&amp;nbsp;&lt;EM&gt;whenever the function is called&lt;/EM&gt;. So, calling the function conditionally is usually a bad idea, with confusing results.&lt;/P&gt;
&lt;P&gt;But you call it unconditionally, which is good.&lt;/P&gt;
&lt;P&gt;But at the time you call it for lag(category), category will be "A" if the first IF was not true, and that "A" goes into the queue.&lt;/P&gt;
&lt;P&gt;Use RETAIN and BY instead.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data taggedData;
set work.untaggedData;
by id;
retain category;
if first.id then category = "A";
if not first.id and lag(year) - year &amp;gt;= 2 then category = "B";
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2023 20:54:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/If-IF-condition-satisfied-anywhere-in-a-group-then-last/m-p/867565#M342652</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2023-03-31T20:54:13Z</dc:date>
    </item>
  </channel>
</rss>

