<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Splitting dataset by column and limiting output in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Splitting-dataset-by-column-and-limiting-output/m-p/589889#M168779</link>
    <description>&lt;P&gt;You report that the "obs=1000" option in this code&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  create table county_Retail_Store_1 as
  select
  from county_data (obs=1000)
  where closest_store = 'Retail_Store_1' and Retail_Store_1 &amp;lt; 10;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;is &lt;EM&gt;&lt;STRONG&gt;"throwing off the where clause and I am getting results for distances &amp;gt; 10 as well"&lt;/STRONG&gt;&lt;/EM&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are you saying that in the absence of the "obs=1000", you only get observations with &lt;EM&gt;Retail_Store_1&amp;lt;10&lt;/EM&gt;, but in using the obs= parameter you start getting instances of &lt;EM&gt;Retail_Store_1&amp;gt;=10&lt;/EM&gt;?&amp;nbsp; I'll believe it when I see it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In your particular example code, you don't have an variables listed in the select clause (or even "select *").&amp;nbsp; SQL would throw an error and stop processing the request.&amp;nbsp; If you already had a table named County_retail_store_1, it would not be replaced.&amp;nbsp; And if that prior table had distances over 10, they would still be there, wholly unrelated to any "obs=1000" parameter.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;By the way, you could create all three datasets in one process, using a DATA step, as in:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data ret1 ret2 ret3;
  set county_data;

  if closest_store='Retail_Store_1' and retail_store_1&amp;lt;10 then output ret1; else
  if closest_store='Retail_Store_2' and retail_store_1&amp;lt;10 then output ret2; else
  if closest_store='Retail_Store_3' and retail_store_1&amp;lt;10 then output ret3; 
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;To replicate this via PROC SQL, you'd have to read the data three times, one for each result table.&amp;nbsp; But the code above reads it only once to produce the same 3 "tables".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Edited correction - the 2nd and 3rd IF statement used the wrong variables in the "&amp;lt;10" comparison.&amp;nbsp; It should be:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data ret1 ret2 ret3;
  set county_data;

  if closest_store='Retail_Store_1' and retail_store_1&amp;lt;10 then output ret1; else
  if closest_store='Retail_Store_2' and retail_store_2&amp;lt;10 then output ret2; else
  if closest_store='Retail_Store_3' and retail_store_3&amp;lt;10 then output ret3; 
run;

&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Thu, 19 Sep 2019 18:33:46 GMT</pubDate>
    <dc:creator>mkeintz</dc:creator>
    <dc:date>2019-09-19T18:33:46Z</dc:date>
    <item>
      <title>Splitting dataset by column and limiting output</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Splitting-dataset-by-column-and-limiting-output/m-p/589881#M168774</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a large&amp;nbsp;dataset containing distances of different stores to a customer. My dataset looks something like this:&lt;/P&gt;&lt;P&gt;Customer_ID&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Retail_Store_1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Retail_Store_2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Retail_Store_3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Closest_Store&lt;/P&gt;&lt;P&gt;1234&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.56&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.78&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;3.1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Retail_Store_1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to split this dataset into 3 datasets for each retail store. I want to split by closest store and Retail_Store_# &amp;lt; 10 miles.&lt;/P&gt;&lt;P&gt;I used the obs=1000 to limit it but it is affecting my where clause. There are scenarios where the dataset is less than 1000 and I want to add more records to it. I want to check the remaining records in the original dataset and check again for the closest store and less than 10 miles&amp;nbsp;until count reaches 1000 or there is no more records left.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is my code to search for the closest store:&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;%macro&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; find_closest_store(county, numrows);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%let&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; county = &amp;amp;county;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%let&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; numrows = &amp;amp;numrows;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data closest_store;&lt;/P&gt;&lt;P&gt;set &amp;amp;county.data&lt;/P&gt;&lt;P&gt;array Retail_Store(&amp;amp;numrows) Retail_Store_1-Retail_Store_&amp;amp;numrows;&lt;/P&gt;&lt;P&gt;&amp;amp;county._min_value=min(of Retail_Event(*));&lt;/P&gt;&lt;P&gt;closest_store=vname(Retail_Store(whichn(&amp;amp;county._min_value, of Retail_Store(*))));&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;%mend&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; find_closest_store;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;%&lt;STRONG&gt;&lt;I&gt;find_closest_store&lt;/I&gt;&lt;/STRONG&gt;(county1, &lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;10&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;My code for split (I had a macro but it was a long code so I just took a piece of it)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt;&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;sql&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;create&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;table&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; county_Retail_Store_1 &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;as&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;select&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;from&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; county_data (obs=1000)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;where&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; closest_store = &lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;'Retail_Store_1'&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;and&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; Retail_Store_1 &amp;lt; &lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;10&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;quit&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, this obs statement is throwing off the where clause and I am getting results for distances &amp;gt; 10 as well.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;These are the steps that I want to take but I am running into issues.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Assign the closest store to each member. (I did that successfully)&lt;/LI&gt;&lt;LI&gt;Split the event data per the closest store but limit the count to 1000&lt;STRONG&gt; (need help with this - not getting good results)&lt;/STRONG&gt;&lt;/LI&gt;&lt;LI&gt;If less than 1000, search for more members if available and calculate the closest store again and&amp;nbsp;if within 10 miles of the desired store, add those members until it reaches 1000 or no more records left.&amp;nbsp;&amp;nbsp;&lt;STRONG&gt; (Need help with this as well).&lt;/STRONG&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Any help will be appreciated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Many Thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 19 Sep 2019 01:45:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Splitting-dataset-by-column-and-limiting-output/m-p/589881#M168774</guid>
      <dc:creator>mraza12</dc:creator>
      <dc:date>2019-09-19T01:45:37Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting dataset by column and limiting output</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Splitting-dataset-by-column-and-limiting-output/m-p/589889#M168779</link>
      <description>&lt;P&gt;You report that the "obs=1000" option in this code&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  create table county_Retail_Store_1 as
  select
  from county_data (obs=1000)
  where closest_store = 'Retail_Store_1' and Retail_Store_1 &amp;lt; 10;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;is &lt;EM&gt;&lt;STRONG&gt;"throwing off the where clause and I am getting results for distances &amp;gt; 10 as well"&lt;/STRONG&gt;&lt;/EM&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are you saying that in the absence of the "obs=1000", you only get observations with &lt;EM&gt;Retail_Store_1&amp;lt;10&lt;/EM&gt;, but in using the obs= parameter you start getting instances of &lt;EM&gt;Retail_Store_1&amp;gt;=10&lt;/EM&gt;?&amp;nbsp; I'll believe it when I see it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In your particular example code, you don't have an variables listed in the select clause (or even "select *").&amp;nbsp; SQL would throw an error and stop processing the request.&amp;nbsp; If you already had a table named County_retail_store_1, it would not be replaced.&amp;nbsp; And if that prior table had distances over 10, they would still be there, wholly unrelated to any "obs=1000" parameter.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;By the way, you could create all three datasets in one process, using a DATA step, as in:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data ret1 ret2 ret3;
  set county_data;

  if closest_store='Retail_Store_1' and retail_store_1&amp;lt;10 then output ret1; else
  if closest_store='Retail_Store_2' and retail_store_1&amp;lt;10 then output ret2; else
  if closest_store='Retail_Store_3' and retail_store_1&amp;lt;10 then output ret3; 
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;To replicate this via PROC SQL, you'd have to read the data three times, one for each result table.&amp;nbsp; But the code above reads it only once to produce the same 3 "tables".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Edited correction - the 2nd and 3rd IF statement used the wrong variables in the "&amp;lt;10" comparison.&amp;nbsp; It should be:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data ret1 ret2 ret3;
  set county_data;

  if closest_store='Retail_Store_1' and retail_store_1&amp;lt;10 then output ret1; else
  if closest_store='Retail_Store_2' and retail_store_2&amp;lt;10 then output ret2; else
  if closest_store='Retail_Store_3' and retail_store_3&amp;lt;10 then output ret3; 
run;

&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 19 Sep 2019 18:33:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Splitting-dataset-by-column-and-limiting-output/m-p/589889#M168779</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2019-09-19T18:33:46Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting dataset by column and limiting output</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Splitting-dataset-by-column-and-limiting-output/m-p/590113#M168857</link>
      <description>&lt;P&gt;Thanks for your response. I actually wrote that piece of code in a hurry. I had a * after select. I did see results for more than 10. I am not sure why it would give me that because when I remove the (obs=1000), I do get proper results.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks a lot for your suggestion on data step instead proc sql, I will use that instead.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Sep 2019 16:54:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Splitting-dataset-by-column-and-limiting-output/m-p/590113#M168857</guid>
      <dc:creator>mraza12</dc:creator>
      <dc:date>2019-09-19T16:54:58Z</dc:date>
    </item>
  </channel>
</rss>

