<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Create Index to use it in by group processing in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Create-Index-to-use-it-in-by-group-processing/m-p/577744#M163744</link>
    <description>&lt;P&gt;Thanks for providing the code! It works as I wanted it to.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regarding the time efficiencies, I thought that using index is faster. Please see below the times I am getting:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;NOTE: There were 748528 observations read from the data set WORK.SORTDATA.
NOTE: The data set WORK.ORIGDATA has 748528 observations and 213 variables.
NOTE: Compressing data set WORK.ORIGDATA decreased size by 86.35 percent. 
      Compressed is 10219 pages; un-compressed would require 74854 pages.
NOTE: PROCEDURE SORT used (Total process time):
      real time           4:41.85
      cpu time            3:14.87&lt;/CODE&gt;&lt;/PRE&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;NOTE: Composite index _mylist has been defined.
17         run;

NOTE: MODIFY was successful for WORK.INDEXDATA.DATA.

NOTE: PROCEDURE DATASETS used (Total process time):
      real time           50.81 seconds
      cpu time            29.16 seconds
      &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And the times for the second data step is always the same. Maybe there are 2-3 seconds differences.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any thoughts on that? Am I missing anything? Are there any other disadvantages?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks a lot!&lt;/P&gt;</description>
    <pubDate>Tue, 30 Jul 2019 13:59:16 GMT</pubDate>
    <dc:creator>Zatere</dc:creator>
    <dc:date>2019-07-30T13:59:16Z</dc:date>
    <item>
      <title>Create Index to use it in by group processing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-Index-to-use-it-in-by-group-processing/m-p/577659#M163708</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What I am really trying to achieve is to replace the PROC SORT with a more time efficient way.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I came across the idea to use index. That is my original code:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt; proc sort data=MyData;
      by CustID BranchNumber BrancType Territory;                    
   run;

   data MyData;
      set MyData;
      by CustID BranchNumber BrancType Territory;                  
      count;	
      if first.Territory then count = 0;

      if not first.Territory then do;
         count+1;
         CustID = strip(CustID)!!'_'!!strip(put(count,z4.));
      end;
   run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And this is what I have tried to do:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc datasets library=work;
modify MyData;
index create _mylist=(CustID BranchNumber BrancType Territory);
run;

data MyData;
      set MyData;
      by CustID BranchNumber BrancType Territory;                  
      count;	
      if first.Territory then count = 0;

      if not first.Territory then do;
         count+1;
         CustID = strip(CustID)!!'_'!!strip(put(count,z4.));
      end;
   run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However it does not work even though there are duplicates CustIDs per Territory.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are there any ideas, alternatives and/or reasons why it does not work?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Jul 2019 11:02:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-Index-to-use-it-in-by-group-processing/m-p/577659#M163708</guid>
      <dc:creator>Zatere</dc:creator>
      <dc:date>2019-07-30T11:02:46Z</dc:date>
    </item>
    <item>
      <title>Re: Create Index to use it in by group processing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-Index-to-use-it-in-by-group-processing/m-p/577678#M163710</link>
      <description>&lt;P&gt;First of all: indexes won't help with performance. The creation of the index already involves a sort, and when you process the whole dataset, you need to read the dataset (in random mode!)&amp;nbsp;&lt;EM&gt;and&lt;/EM&gt; the index, which slows you down quite a lot. Indexes are only good (performancewise) for selecting small subsets of the whole data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But a small demonstration with one of the SASHELP datasets shows that both methods provide the same result:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data cars;
set sashelp.cars;
run;

proc sort
  data=cars
  out=cars1
;
by origin make;
run;

data want1 (keep=origin make);
set cars1;
by origin make;
retain count;
if first.make then count = 0;
else do;
  count + 1;
  origin = catx('_',origin,count);
end;
run;

proc datasets lib=work nolist;
modify cars;
index create myind=(origin make);
quit;

data want2 (keep=origin make);
set cars;
by origin make;
retain count;
if first.make then count = 0;
else do;
  count + 1;
  origin = catx('_',origin,count);
end;
run;

proc compare
  base=want1
  compare=want2
  criterion=0.00001
  method=relative
  out=comp(label="Compare Data for WANT1 and WANT2")
  outstats=stat(label="Compare Data Summary Statistics for WANT1 and WANT2")
  outdif
  outnoequal
;
var make origin;
with make origin;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Log:&lt;/P&gt;
&lt;PRE&gt;27         data cars;
28         set sashelp.cars;
29         run;

NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set WORK.CARS has 428 observations and 15 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.00 seconds
      

30         
31         proc sort
32           data=cars
33           out=cars1
34         ;
35         by origin make;
36         run;

NOTE: There were 428 observations read from the data set WORK.CARS.
NOTE: The data set WORK.CARS1 has 428 observations and 15 variables.
NOTE: PROZEDUR SORT used (Total process time):
      real time           0.03 seconds
      cpu time            0.00 seconds
      

37         
38         data want1 (keep=origin make);
39         set cars1;
40         by origin make;
41         retain count;
2                                                          Das SAS System                               07:31 Tuesday, July 30, 2019

42         if first.make then count = 0;
43         else do;
44           count + 1;
45           origin = catx('_',origin,count);
46         end;
47         run;

NOTE: There were 428 observations read from the data set WORK.CARS1.
NOTE: The data set WORK.WANT1 has 428 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.00 seconds
      

48         
49         proc datasets lib=work nolist;
50         modify cars;
51         index create myind=(origin make);
NOTE: Composite index myind has been defined.
52         quit;

NOTE: MODIFY was successful for WORK.CARS.DATA.
NOTE: PROZEDUR DATASETS used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds
      

53         
54         data want2 (keep=origin make);
55         set cars;
56         by origin make;
57         retain count;
58         if first.make then count = 0;
59         else do;
60           count + 1;
61           origin = catx('_',origin,count);
62         end;
63         run;

NOTE: There were 428 observations read from the data set WORK.CARS.
NOTE: The data set WORK.WANT2 has 428 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.00 seconds
      

64         
65         proc compare
66           base=want1
67           compare=want2
68           criterion=0.00001
69           method=relative
70           out=comp(label="Compare Data for WANT1 and WANT2")
71           outstats=stat(label="Compare Data Summary Statistics for WANT1 and WANT2")
72           outdif
73           outnoequal
74         ;
75         var make origin;
3                                                          Das SAS System                               07:31 Tuesday, July 30, 2019

76         with make origin;
77         run;

NOTE: There were 428 observations read from the data set WORK.WANT1.
NOTE: There were 428 observations read from the data set WORK.WANT2.
NOTE: The data set WORK.COMP has 0 observations and 4 variables.
NOTE: The data set WORK.STAT has 0 observations and 7 variables.
NOTE: The PROCEDURE COMPARE printed page 1.
NOTE: PROZEDUR COMPARE used (Total process time):
      real time           0.07 seconds
      cpu time            0.00 seconds
 &lt;/PRE&gt;
&lt;P&gt;You can see that the second data step is slightly slower than the first, and that the combined time for index creation + data step is more than the combined time for sort and data step.&lt;/P&gt;
&lt;P&gt;And part of the result:&lt;/P&gt;
&lt;PRE&gt;NOTE: No unequal values were found. All values compared are exactly equal.&lt;/PRE&gt;
&lt;P&gt;shows that the resulting datasets are identical.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If that does not align with what you found, please supply sufficient example data (in a data step with datalines) to illustrate your issue.&lt;/P&gt;</description>
      <pubDate>Tue, 30 Jul 2019 11:28:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-Index-to-use-it-in-by-group-processing/m-p/577678#M163710</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-07-30T11:28:42Z</dc:date>
    </item>
    <item>
      <title>Re: Create Index to use it in by group processing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-Index-to-use-it-in-by-group-processing/m-p/577744#M163744</link>
      <description>&lt;P&gt;Thanks for providing the code! It works as I wanted it to.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regarding the time efficiencies, I thought that using index is faster. Please see below the times I am getting:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;NOTE: There were 748528 observations read from the data set WORK.SORTDATA.
NOTE: The data set WORK.ORIGDATA has 748528 observations and 213 variables.
NOTE: Compressing data set WORK.ORIGDATA decreased size by 86.35 percent. 
      Compressed is 10219 pages; un-compressed would require 74854 pages.
NOTE: PROCEDURE SORT used (Total process time):
      real time           4:41.85
      cpu time            3:14.87&lt;/CODE&gt;&lt;/PRE&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;NOTE: Composite index _mylist has been defined.
17         run;

NOTE: MODIFY was successful for WORK.INDEXDATA.DATA.

NOTE: PROCEDURE DATASETS used (Total process time):
      real time           50.81 seconds
      cpu time            29.16 seconds
      &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And the times for the second data step is always the same. Maybe there are 2-3 seconds differences.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any thoughts on that? Am I missing anything? Are there any other disadvantages?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks a lot!&lt;/P&gt;</description>
      <pubDate>Tue, 30 Jul 2019 13:59:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-Index-to-use-it-in-by-group-processing/m-p/577744#M163744</guid>
      <dc:creator>Zatere</dc:creator>
      <dc:date>2019-07-30T13:59:16Z</dc:date>
    </item>
  </channel>
</rss>

