<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: split a data set to multiple data sets in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/split-a-data-set-to-multiple-data-sets/m-p/614337#M179558</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159549"&gt;@Ronein&lt;/a&gt;: Beating a dead horse, eh?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If your input is sorted by your categorical variable, cool. If not, index it to facilitate BY processing. Then use hash extrusion to get what you want. Example (the categorical variable below being ORIGIN):&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have (index = (origin)) ;                                                                                                          
  set sashelp.cars ;                                                                                                                    
run ;                                                                                                                                   
                                                                                                                                        
data _null_ ;                                                                                                                           
  if _n_ = 1 then do ;                                                                                                                  
    dcl hash h (dataset: "have (obs=0)", multidata:"y") ;                                                                               
    h.definekey ("origin") ;                                                                                                            
    h.definedata (all: "y") ;                                                                                                           
    h.definedone () ;                                                                                                                   
  end ;                                                                                                                                 
  do until (last.origin) ;                                                                                                              
    set have ;                                                                                                                          
    by origin ;                                                                                                                         
    h.add() ;                                                                                                                           
  end ;                                                                                                                                 
  h.output (dataset: catx ("_", "ds", origin)) ;                                                                                        
  h.clear() ;                                                                                                                           
run ;                       
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The alternative is to write code to pre-read the input and assemble a DATA statement with the requisite output file names plus a SELECT block or bunch of IF-THEN-ELSE statements for each distinct value of the categorical variable; then execute the generated code. The generation can be done by a macro, SQL, CALL EXECUTE, PUT/%INC, etc. Example (SQL):&lt;/P&gt;
&lt;PRE&gt;proc sql noprint ;                                                                                                                      
  select distinct                                                                                                                       
    catx ("_", "ds", origin)                                                                                                            
  , catx (" ", "when (", quote (trim (origin)), ") output", catx ("_", "ds", origin))                                                   
  into :filelist separated by ' '                                                                                                       
     , :whenlist separated by ';'                                                                                                       
 from  have                                                                                                                             
 ;                                                                                                                                      
quit ;                                                                                                                                  
                                                                                                                                        
data &amp;amp;filelist ;                                                                                                                        
  set have ;                                                                                                                            
  select (origin) ;                                                                                                                     
    &amp;amp;whenlist ;                                                                                                                         
    otherwise ;                                                                                                                         
  end ;                                                                                                                                 
run ;                           
&lt;/PRE&gt;
&lt;P&gt;Both methods have their pros and cons:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;In the hash case, the largest BY group with all its data must fit into the hash table memory. If the source file is split into a large number of small files, not a problem.&lt;/LI&gt;
&lt;LI&gt;But it can become a problem with the other method, as too many data sets in the DATA statement may run the step out of buffer memory. Besides, the assembled code pieces may not fit into the FILELIST and WHENLIST macro variables, in which case a different way of assembling code devoid of this limitation (e.g., PUT/%INC) can be used. &amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Note that in the hash case, the splitting can be done using hash-of-hashes without the source file having to be either sorted or indexed (the method invented by &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12477"&gt;@RichardDeVen&lt;/a&gt;&amp;nbsp;in 2004 in the wake of SUGI in Montreal). However, in this case, memory must be plentiful enough to fit the entire file in the hash table memory.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kind regards&lt;/P&gt;
&lt;P&gt;Paul D.&lt;/P&gt;</description>
    <pubDate>Mon, 30 Dec 2019 04:34:13 GMT</pubDate>
    <dc:creator>hashman</dc:creator>
    <dc:date>2019-12-30T04:34:13Z</dc:date>
    <item>
      <title>split a data set to multiple data sets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-a-data-set-to-multiple-data-sets/m-p/614295#M179531</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;
&lt;P&gt;What is the way to split a data set to multiple data sets by value of a categorical variable?&lt;/P&gt;
&lt;P&gt;Let's say that my data set called RawTbl has a categorical field called group.&lt;/P&gt;
&lt;P&gt;LEt's say that the number of the values that&amp;nbsp;group field get is not fixed (It means that each day that&amp;nbsp;RawTbl&amp;nbsp; data set is published there are different number of categories).&lt;/P&gt;
&lt;P&gt;The target is to split data set&amp;nbsp;RawTbl&amp;nbsp; into multiple data sets (for each value of firld group there will be a seprate data set).&lt;/P&gt;
&lt;P&gt;The name of&amp;nbsp; each data set will be&amp;nbsp; new_name_of_the_group&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 29 Dec 2019 17:51:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-a-data-set-to-multiple-data-sets/m-p/614295#M179531</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2019-12-29T17:51:31Z</dc:date>
    </item>
    <item>
      <title>Re: split a data set to multiple data sets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-a-data-set-to-multiple-data-sets/m-p/614303#M179538</link>
      <description>&lt;P&gt;Please refer to&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://blogs.sas.com/content/sasdummy/2015/01/26/how-to-split-one-data-set-into-many/" target="_blank"&gt;https://blogs.sas.com/content/sasdummy/2015/01/26/how-to-split-one-data-set-into-many/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://www.sascommunity.org/wiki/Split_Data_into_Subsets" target="_blank"&gt;http://www.sascommunity.org/wiki/Split_Data_into_Subsets&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 29 Dec 2019 18:25:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-a-data-set-to-multiple-data-sets/m-p/614303#M179538</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-12-29T18:25:03Z</dc:date>
    </item>
    <item>
      <title>Re: split a data set to multiple data sets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-a-data-set-to-multiple-data-sets/m-p/614337#M179558</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159549"&gt;@Ronein&lt;/a&gt;: Beating a dead horse, eh?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If your input is sorted by your categorical variable, cool. If not, index it to facilitate BY processing. Then use hash extrusion to get what you want. Example (the categorical variable below being ORIGIN):&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have (index = (origin)) ;                                                                                                          
  set sashelp.cars ;                                                                                                                    
run ;                                                                                                                                   
                                                                                                                                        
data _null_ ;                                                                                                                           
  if _n_ = 1 then do ;                                                                                                                  
    dcl hash h (dataset: "have (obs=0)", multidata:"y") ;                                                                               
    h.definekey ("origin") ;                                                                                                            
    h.definedata (all: "y") ;                                                                                                           
    h.definedone () ;                                                                                                                   
  end ;                                                                                                                                 
  do until (last.origin) ;                                                                                                              
    set have ;                                                                                                                          
    by origin ;                                                                                                                         
    h.add() ;                                                                                                                           
  end ;                                                                                                                                 
  h.output (dataset: catx ("_", "ds", origin)) ;                                                                                        
  h.clear() ;                                                                                                                           
run ;                       
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The alternative is to write code to pre-read the input and assemble a DATA statement with the requisite output file names plus a SELECT block or bunch of IF-THEN-ELSE statements for each distinct value of the categorical variable; then execute the generated code. The generation can be done by a macro, SQL, CALL EXECUTE, PUT/%INC, etc. Example (SQL):&lt;/P&gt;
&lt;PRE&gt;proc sql noprint ;                                                                                                                      
  select distinct                                                                                                                       
    catx ("_", "ds", origin)                                                                                                            
  , catx (" ", "when (", quote (trim (origin)), ") output", catx ("_", "ds", origin))                                                   
  into :filelist separated by ' '                                                                                                       
     , :whenlist separated by ';'                                                                                                       
 from  have                                                                                                                             
 ;                                                                                                                                      
quit ;                                                                                                                                  
                                                                                                                                        
data &amp;amp;filelist ;                                                                                                                        
  set have ;                                                                                                                            
  select (origin) ;                                                                                                                     
    &amp;amp;whenlist ;                                                                                                                         
    otherwise ;                                                                                                                         
  end ;                                                                                                                                 
run ;                           
&lt;/PRE&gt;
&lt;P&gt;Both methods have their pros and cons:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;In the hash case, the largest BY group with all its data must fit into the hash table memory. If the source file is split into a large number of small files, not a problem.&lt;/LI&gt;
&lt;LI&gt;But it can become a problem with the other method, as too many data sets in the DATA statement may run the step out of buffer memory. Besides, the assembled code pieces may not fit into the FILELIST and WHENLIST macro variables, in which case a different way of assembling code devoid of this limitation (e.g., PUT/%INC) can be used. &amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Note that in the hash case, the splitting can be done using hash-of-hashes without the source file having to be either sorted or indexed (the method invented by &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12477"&gt;@RichardDeVen&lt;/a&gt;&amp;nbsp;in 2004 in the wake of SUGI in Montreal). However, in this case, memory must be plentiful enough to fit the entire file in the hash table memory.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kind regards&lt;/P&gt;
&lt;P&gt;Paul D.&lt;/P&gt;</description>
      <pubDate>Mon, 30 Dec 2019 04:34:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-a-data-set-to-multiple-data-sets/m-p/614337#M179558</guid>
      <dc:creator>hashman</dc:creator>
      <dc:date>2019-12-30T04:34:13Z</dc:date>
    </item>
    <item>
      <title>Re: split a data set to multiple data sets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-a-data-set-to-multiple-data-sets/m-p/614381#M179574</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/138205"&gt;@novinosrin&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://www.sascommunity.org/wiki/Split_Data_into_Subsets" target="_blank" rel="noopener"&gt;http://www.sascommunity.org/wiki/Split_Data_into_Subsets&lt;/A&gt;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Hey, &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159549"&gt;@Ronein&lt;/a&gt; , please read the above wiki article, where it says &lt;/P&gt;
&lt;H2&gt;"B&lt;SPAN class="mw-headline"&gt;est Practice: Just Don't Do It&lt;/SPAN&gt;"&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The reason it says "Don't Do it" is that there are rarely good reasons to do this, and plenty of drawbacks. Don't make your life harder than it has to be.&lt;/P&gt;</description>
      <pubDate>Mon, 30 Dec 2019 12:47:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-a-data-set-to-multiple-data-sets/m-p/614381#M179574</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2019-12-30T12:47:56Z</dc:date>
    </item>
  </channel>
</rss>

