<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Categorize observations based on cumulative sums for two variables in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770161#M244307</link>
    <description>&lt;P&gt;Dear SAS experts&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would like to create a categorical variable that include all observations that sum up to certain values based on two numerical variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Given this simple dataset:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data example;&lt;BR /&gt;input value1 value2;&lt;BR /&gt;datalines;&lt;BR /&gt;2 5&lt;BR /&gt;3 5&lt;BR /&gt;6 10&lt;BR /&gt;2 4&lt;BR /&gt;3 5&lt;BR /&gt;2 5&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I want to categorize the observations (as they are ordered here) based on running sums. When the running sum reaches &amp;gt;=5 for value1 AND &amp;gt;=10 for value2 a category should be created. Then, the process is repeated again. The resulting dataset should look like:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;value1 value2 cat_var&lt;/P&gt;
&lt;P&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;BR /&gt;3 &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;BR /&gt;6 &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;10&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;BR /&gt;2 &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;BR /&gt;3 &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;BR /&gt;2 &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For category 1 it takes two observations before the conditions described above are met. For category 2 the condition is already met in the 3&lt;SUP&gt;rd&lt;/SUP&gt; observation and therefore only one observation is included in category 2. For category 3, while the condition is met for value1 already in the 5&lt;SUP&gt;th&lt;/SUP&gt; observation for value2 it is met on the 6&lt;SUP&gt;th&lt;/SUP&gt; observation. Therefore 3 observations are included in category 3.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Does anyone have a suggestion on which syntax I can use to create such a categorical variable?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you&lt;/P&gt;</description>
    <pubDate>Fri, 24 Sep 2021 08:51:21 GMT</pubDate>
    <dc:creator>mgrasmussen</dc:creator>
    <dc:date>2021-09-24T08:51:21Z</dc:date>
    <item>
      <title>Categorize observations based on cumulative sums for two variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770161#M244307</link>
      <description>&lt;P&gt;Dear SAS experts&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would like to create a categorical variable that include all observations that sum up to certain values based on two numerical variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Given this simple dataset:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data example;&lt;BR /&gt;input value1 value2;&lt;BR /&gt;datalines;&lt;BR /&gt;2 5&lt;BR /&gt;3 5&lt;BR /&gt;6 10&lt;BR /&gt;2 4&lt;BR /&gt;3 5&lt;BR /&gt;2 5&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I want to categorize the observations (as they are ordered here) based on running sums. When the running sum reaches &amp;gt;=5 for value1 AND &amp;gt;=10 for value2 a category should be created. Then, the process is repeated again. The resulting dataset should look like:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;value1 value2 cat_var&lt;/P&gt;
&lt;P&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;BR /&gt;3 &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;BR /&gt;6 &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;10&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;BR /&gt;2 &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;BR /&gt;3 &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;BR /&gt;2 &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For category 1 it takes two observations before the conditions described above are met. For category 2 the condition is already met in the 3&lt;SUP&gt;rd&lt;/SUP&gt; observation and therefore only one observation is included in category 2. For category 3, while the condition is met for value1 already in the 5&lt;SUP&gt;th&lt;/SUP&gt; observation for value2 it is met on the 6&lt;SUP&gt;th&lt;/SUP&gt; observation. Therefore 3 observations are included in category 3.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Does anyone have a suggestion on which syntax I can use to create such a categorical variable?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Fri, 24 Sep 2021 08:51:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770161#M244307</guid>
      <dc:creator>mgrasmussen</dc:creator>
      <dc:date>2021-09-24T08:51:21Z</dc:date>
    </item>
    <item>
      <title>Re: Categorize observations based on cumulative sums for two variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770163#M244308</link>
      <description>&lt;P&gt;Try this&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data example;
input value1 value2;
datalines;
2 5
3 5
6 10
2 4
3 5
2 5
;

data want(drop = s:);
   set example;
   
   if not s1 then cat_var + 1;
   
   s1 + value1;
   s2 + value2;
   
   if s1 ge 5 and s2 ge 10 then do;
      s1 = 0; s2 = 0;
   end;
   
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Result:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;value1  value2  cat_var
2       5       1
3       5       1
6       10      2
2       4       3
3       5       3
2       5       3&lt;/PRE&gt;</description>
      <pubDate>Fri, 24 Sep 2021 09:02:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770163#M244308</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2021-09-24T09:02:29Z</dc:date>
    </item>
    <item>
      <title>Re: Categorize observations based on cumulative sums for two variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770170#M244311</link>
      <description>&lt;P&gt;Dear PeterClemmensen&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It appears to work in this small example and I will try it out in my large dataset which 'example' is meant to reflect.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please briefly walk me through your code? I tried to understand (few comments/questions below) it but I do not understand all of it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want(drop = s:); /* Drop all variables which start with an s */&lt;BR /&gt;set example;&lt;BR /&gt;&lt;BR /&gt;if not s1 then cat_var + 1; /* cat_var is created and increases by 1 when the condition to the left is met. What is meant by "if not s1"? */&lt;BR /&gt;&lt;BR /&gt;s1 + value1; /* In (new) variables s1 and s2 cum sum of value1 and value2 are computed */&lt;BR /&gt;s2 + value2;&lt;BR /&gt;&lt;BR /&gt;if s1 ge 5 and s2 ge 10 then do; /* It must have something to do with the categorization */&lt;BR /&gt;s1 = 0; s2 = 0;&lt;BR /&gt;end;&lt;BR /&gt;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Fri, 24 Sep 2021 09:29:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770170#M244311</guid>
      <dc:creator>mgrasmussen</dc:creator>
      <dc:date>2021-09-24T09:29:23Z</dc:date>
    </item>
    <item>
      <title>Re: Categorize observations based on cumulative sums for two variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770173#M244313</link>
      <description>&lt;P&gt;Because of the SUM statements&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if not s1 then cat_var + 1;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;and&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;s1 + value1;
s2 + value2;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;both s1 and s2 are automatically retained.&lt;/P&gt;
&lt;P&gt;In the first iteration of the data step, s1 will still be missing, which is considered as a &lt;EM&gt;boolean value of false&lt;/EM&gt;; at the end of a category, s1 is set to zero, which is also a boolean value of false, so in both these cases, the increment of the cat_var variable is done.&lt;/P&gt;
&lt;P&gt;Since a SUM &lt;EM&gt;statement&lt;/EM&gt; also works like the SUM &lt;EM&gt;function&lt;/EM&gt;, missing values are considered as zero, so the retained variables effectively start with zeroes.&lt;/P&gt;
&lt;P&gt;These are the underlying principles that let this code work.&lt;/P&gt;</description>
      <pubDate>Fri, 24 Sep 2021 09:37:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770173#M244313</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-09-24T09:37:31Z</dc:date>
    </item>
    <item>
      <title>Re: Categorize observations based on cumulative sums for two variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770175#M244315</link>
      <description>&lt;P&gt;Hey Kurt&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I appreciate the explanation. It is starting to make sense to me.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I did however not understand:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;"&lt;SPAN&gt;Since a SUM&amp;nbsp;&lt;/SPAN&gt;&lt;EM&gt;statement&lt;/EM&gt;&lt;SPAN&gt;&amp;nbsp;also works like the SUM&amp;nbsp;&lt;/SPAN&gt;&lt;EM&gt;function&lt;/EM&gt;&lt;SPAN&gt;, missing values are considered as zero, so the retained variables effectively start with zeroes."&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;- Where do missing values come into play in the syntax in question?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thank you&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Sep 2021 09:45:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770175#M244315</guid>
      <dc:creator>mgrasmussen</dc:creator>
      <dc:date>2021-09-24T09:45:36Z</dc:date>
    </item>
    <item>
      <title>Re: Categorize observations based on cumulative sums for two variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770177#M244317</link>
      <description>&lt;P&gt;Are you here referencing the first iteration were s1 is missing, but will be considered as 0?&lt;/P&gt;</description>
      <pubDate>Fri, 24 Sep 2021 09:50:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770177#M244317</guid>
      <dc:creator>mgrasmussen</dc:creator>
      <dc:date>2021-09-24T09:50:30Z</dc:date>
    </item>
    <item>
      <title>Re: Categorize observations based on cumulative sums for two variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770183#M244319</link>
      <description>&lt;P&gt;Unless stated explicitly otherwise (a RETAIN statement can also set an initial value), retained variables start as missing values when the data step starts executing. This is what I was referring to.&lt;/P&gt;
&lt;P&gt;You can see the operation of a SUM statement with this simple code:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
input s;
s1 + s;
datalines;
.
0
1
2
;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 24 Sep 2021 10:01:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770183#M244319</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-09-24T10:01:16Z</dc:date>
    </item>
    <item>
      <title>Re: Categorize observations based on cumulative sums for two variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770184#M244320</link>
      <description>&lt;P&gt;Another quick code to illustrate boolean values:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data another_test;
input s;
length result $5;
if s then result = "True"; else result = "False";
datalines;
.
0
.a
1
-1
0.0003
;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Missing (Including the "special missing" values) or zero is false, anything else is true.&lt;/P&gt;</description>
      <pubDate>Fri, 24 Sep 2021 10:06:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770184#M244320</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-09-24T10:06:23Z</dc:date>
    </item>
    <item>
      <title>Re: Categorize observations based on cumulative sums for two variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770185#M244321</link>
      <description>&lt;P&gt;This is great. Makes a lot more sense now.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Fri, 24 Sep 2021 10:08:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770185#M244321</guid>
      <dc:creator>mgrasmussen</dc:creator>
      <dc:date>2021-09-24T10:08:52Z</dc:date>
    </item>
    <item>
      <title>Re: Categorize observations based on cumulative sums for two variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770190#M244324</link>
      <description>&lt;P&gt;Hey Kurt&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please help me modify the code such that it works in a scenario when the first values of 'value1' and 'value2' are both 0?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If I run the code below, the first observation will be its own category, which it should not be.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data example;&lt;BR /&gt;input value1 value2;&lt;BR /&gt;datalines;&lt;BR /&gt;0 0&lt;BR /&gt;2 5&lt;BR /&gt;3 5&lt;BR /&gt;6 10&lt;BR /&gt;2 4&lt;BR /&gt;3 5&lt;BR /&gt;2 5&lt;BR /&gt;;&lt;/P&gt;
&lt;P&gt;data want (drop = s:); &lt;BR /&gt;set example;&lt;BR /&gt;&lt;BR /&gt;if not s1 then cat_var + 1; &lt;BR /&gt;&lt;BR /&gt;s1 + value1; &lt;BR /&gt;s2 + value2; &lt;BR /&gt;&lt;BR /&gt;if s1 ge 5 and s2 ge 10 then do; &lt;BR /&gt;s1 = 0; s2 = 0; &lt;BR /&gt;end;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Fri, 24 Sep 2021 11:02:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Categorize-observations-based-on-cumulative-sums-for-two/m-p/770190#M244324</guid>
      <dc:creator>mgrasmussen</dc:creator>
      <dc:date>2021-09-24T11:02:58Z</dc:date>
    </item>
  </channel>
</rss>

