<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Data set split by category variable: SAS CODE in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672370#M202061</link>
    <description>&lt;P&gt;You needn't create new tables from this.&lt;/P&gt;
&lt;P&gt;For example, to generate samples:&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;proc surveyselect data=HAVE(where=(length(NAME)=3))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sun, 26 Jul 2020 05:27:44 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2020-07-26T05:27:44Z</dc:date>
    <item>
      <title>Data set split by category variable: SAS CODE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672269#M202013</link>
      <description>&lt;P&gt;Hi folks,&lt;/P&gt;&lt;P&gt;Need a little SAS coding help- At first, I thought I just needed to split a column but then I realized I need to split the entire dataset into several according to the length of the column text length of a name.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Looking to split a single &lt;STRIKE&gt;column(&lt;/STRIKE&gt;Dataset) of data into various &lt;STRIKE&gt;columns&lt;/STRIKE&gt;(datasets) according to the length of the textual name size.&amp;nbsp; My categories are: name, score, weight&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Example dataset "ALL_NAMES" has a category "NAME", which has four-letter names and three-letter names.&amp;nbsp;I want to split the data into two datasets according to the length of the person's name.&lt;/P&gt;&lt;P&gt;NAME| tom, jeff, bob, sam, fran, biff, tran, ned, jill&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I want would be two different tables: name3 &amp;amp; name4&lt;/P&gt;&lt;TABLE border="1"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;NAME&lt;/TD&gt;&lt;TD&gt;Score&lt;/TD&gt;&lt;TD&gt;Weight&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;tom&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;100&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;bob&lt;/TD&gt;&lt;TD&gt;7&lt;/TD&gt;&lt;TD&gt;90&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;sam&lt;/TD&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;80&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;ned&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;120&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;dataset: Name3&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;TABLE border="1"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;Name&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;Score&lt;/TD&gt;&lt;TD&gt;Weight&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;jeff&lt;/TD&gt;&lt;TD&gt;25&lt;/TD&gt;&lt;TD&gt;80&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;fran&lt;/TD&gt;&lt;TD&gt;50&lt;/TD&gt;&lt;TD&gt;125&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;biff&lt;/TD&gt;&lt;TD&gt;100&lt;/TD&gt;&lt;TD&gt;100&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;tran&lt;/TD&gt;&lt;TD&gt;70&lt;/TD&gt;&lt;TD&gt;80&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;jill&lt;/TD&gt;&lt;TD&gt;65&lt;/TD&gt;&lt;TD&gt;90&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;dataset: Name4&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help would be appreciated.&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jul 2020 05:38:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672269#M202013</guid>
      <dc:creator>TronicLaine</dc:creator>
      <dc:date>2020-07-25T05:38:36Z</dc:date>
    </item>
    <item>
      <title>Re: Data set split by category variable: SAS CODE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672272#M202014</link>
      <description>&lt;P&gt;This is relatively easy to do but I'm not going to show how until you can show me what is easier or more accurately done with multiple data sets than with a single set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For almost any purpose I suspect that just adding a variable to the existing data that has the length would be sufficient for any processing. And that is done easily with the Length function.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;Data want;
   set have;
   namelength = length(name);
run;&lt;/PRE&gt;
&lt;P&gt;Likely later processing would involve sorting by the Namelength and using BY group processing.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;One reason I say that is the next time you get a data set and need similar processing if there are more lengths of names involved then you have to re-write your code to 1) create additional data sets, 2) process each of those data sets with more new code.&lt;/P&gt;
&lt;P&gt;A By group approach will adjust based on the values of the data without having to rewrite code (if done in a reasonable manner).&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jul 2020 06:15:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672272#M202014</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-07-25T06:15:22Z</dc:date>
    </item>
    <item>
      <title>Re: Data set split by category variable: SAS CODE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672280#M202021</link>
      <description>&lt;P&gt;I strongly question why you'd need to do this.&lt;/P&gt;
&lt;P&gt;In any case, something like this should work:&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;data L3 L4;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; set HAVE;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; if length(NAME)=3 then output L3;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; if length(NAME)=4 then output L4;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jul 2020 09:15:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672280#M202021</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-07-25T09:15:30Z</dc:date>
    </item>
    <item>
      <title>Re: Data set split by category variable: SAS CODE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672311#M202040</link>
      <description>&lt;P&gt;&lt;A href="https://blogs.sas.com/content/sasdummy/2015/01/26/how-to-split-one-data-set-into-many/" target="_self"&gt;This blog post covers it&lt;/A&gt; -- but I agree with others, you might be better off not splitting the data and instead using filters or other processing to save on I/O and storage.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But, sometimes you need to split the data because your "customers" need it that way.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For cases other than "by category", check out this new post from&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/51532"&gt;@LeonidBatkhan&lt;/a&gt;:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &lt;A href="https://blogs.sas.com/content/sgf/2020/07/23/splitting-a-data-set-into-smaller-data-sets/" target="_self"&gt;Splitting a data set into smaller data sets&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jul 2020 13:31:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672311#M202040</guid>
      <dc:creator>ChrisHemedinger</dc:creator>
      <dc:date>2020-07-25T13:31:38Z</dc:date>
    </item>
    <item>
      <title>Re: Data set split by category variable: SAS CODE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672315#M202042</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data have;
input NAME $	Score	Weight;
cards;
tom	5	100
bob	7	90
sam	6	80
ned	2	120
jeff	25	80
fran	50	125
biff	100	100
tran	70	80
jill	65	90
;

data temp;
 set have;
 l=lengthn(name);
run;

proc sql;
 create index l on temp(l);
quit;


data _null_;
 if _n_=1 then do;
  dcl hash H (multidata:'y') ;
  h.definekey  ("l") ;
  h.definedata ("name", "score", "weight") ;
  h.definedone () ;
 end;
 do _n_=h.clear() by 0 until(last.l);
  set temp;
  by l;
  h.add();
 end;
 h.output(dataset:cats('name',l));
run;
 &lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 25 Jul 2020 13:57:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672315#M202042</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2020-07-25T13:57:15Z</dc:date>
    </item>
    <item>
      <title>Re: Data set split by category variable: SAS CODE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672369#M202060</link>
      <description>&lt;P&gt;Reason for the split:&lt;BR /&gt;My plan after separating the category by the length of the name is to compare the scores between the three-letter and those of the four-letter name scores by sampling from each set.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;By what manner:&lt;BR /&gt;What I want to do is then make a table of 100 samples from each category attribute group (three or four-letter names) where 10 rows are selected for the samples&amp;nbsp; (the original dataset includes rows of 500+ entries).&amp;nbsp;&lt;BR /&gt;I want to figure out is who would score better on average by using total score samples at sets of 10.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can I reasonably bet that Tom's group will win, or should I choose Jill's group?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 26 Jul 2020 05:20:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672369#M202060</guid>
      <dc:creator>TronicLaine</dc:creator>
      <dc:date>2020-07-26T05:20:56Z</dc:date>
    </item>
    <item>
      <title>Re: Data set split by category variable: SAS CODE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672370#M202061</link>
      <description>&lt;P&gt;You needn't create new tables from this.&lt;/P&gt;
&lt;P&gt;For example, to generate samples:&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;proc surveyselect data=HAVE(where=(length(NAME)=3))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 26 Jul 2020 05:27:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672370#M202061</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-07-26T05:27:44Z</dc:date>
    </item>
    <item>
      <title>Re: Data set split by category variable: SAS CODE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672385#M202072</link>
      <description>&lt;P&gt;As stated above, creating separate data sets is not necessary, and will just cause extra work to get the comparisons computed properly.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can do sampling in a number of ways in SAS, for example PROC SURVEYSELECT with the STRATA statement for your two groups, or just assigning numbers randomly to the observations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But then, the actual comparison of means ought be done via one data set, for example using PROC&amp;nbsp; TTEST. &lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;DO NOT SPLIT THE DATA!&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 26 Jul 2020 10:50:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672385#M202072</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2020-07-26T10:50:21Z</dc:date>
    </item>
    <item>
      <title>Re: Data set split by category variable: SAS CODE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672386#M202073</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/339102"&gt;@TronicLaine&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Reason for the split:&lt;BR /&gt;My plan after separating the category by the length of the name is to compare the scores between the three-letter and those of the four-letter name scores by sampling from each set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;By what manner:&lt;BR /&gt;What I want to do is then make a table of 100 samples from each category attribute group (three or four-letter names) where 10 rows are selected for the samples&amp;nbsp; (the original dataset includes rows of 500+ entries).&amp;nbsp;&lt;BR /&gt;I want to figure out is who would score better on average by using total score samples at sets of 10.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can I reasonably bet that Tom's group will win, or should I choose Jill's group?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Some examples of "comparison" in long form:&lt;/P&gt;
&lt;PRE&gt;data have;
   input NAME $	Score	Weight;
   namelength = length(name);
datalines;
tom	5	100
bob	7	90
sam	6	80
ned	2	120
jeff	25	80
fran	50	125
biff	100	100
tran	70	80
jill	65	90
;

proc ttest data=have;
   class namelength;
   var score weight;
run;
/* or if the weight is to be weighting variable*/
proc ttest data=have;
   class namelength;
   var score;
   weight weight;
run;

proc tabulate data=have;
   class namelength;
   var score weight;
   table namelength,
         (score weight) *(mean min max std n)
   ;
run;&lt;/PRE&gt;
&lt;P&gt;The surveyselect procedure would add a variable to indicate selected to filter on for such.&lt;/P&gt;</description>
      <pubDate>Sun, 26 Jul 2020 11:01:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-set-split-by-category-variable-SAS-CODE/m-p/672386#M202073</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-07-26T11:01:31Z</dc:date>
    </item>
  </channel>
</rss>

