<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Index data sets before Merge in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749715#M235663</link>
    <description>&lt;P&gt;SASHELP datasets are too small for indexes to be of any use. Also it is good policy not to tamper with the original SASHELP datasets so they work the same for all users.&lt;/P&gt;</description>
    <pubDate>Tue, 22 Jun 2021 20:34:04 GMT</pubDate>
    <dc:creator>SASKiwi</dc:creator>
    <dc:date>2021-06-22T20:34:04Z</dc:date>
    <item>
      <title>Index data sets before Merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749700#M235654</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;Is is better to index data sets before merging with Data step ?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is is better to index data sets before merging with Proc SQL ?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let's say that I want to do inner Join of SASHELP.prdsal12&amp;nbsp; with SASHELP.us_data.&lt;/P&gt;
&lt;P&gt;The merge will be done by&amp;nbsp; state field from&amp;nbsp;&amp;nbsp; SASHELP.prdsal12&amp;nbsp; with statename field from&amp;nbsp;SASHELP.us_data.&lt;/P&gt;
&lt;P&gt;How Can I know if data set&amp;nbsp;SASHELP.prdsal12&amp;nbsp; is already indexed?&lt;/P&gt;
&lt;P&gt;How Can I know if data set SASHELP.us_data &amp;nbsp;is already indexed?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jun 2021 20:09:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749700#M235654</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2021-06-22T20:09:27Z</dc:date>
    </item>
    <item>
      <title>Re: Index data sets before Merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749707#M235659</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159549"&gt;@Ronein&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;Is is better to index data sets before merging with Data step ?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is is better to index data sets before merging with Proc SQL ?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let's say that I want to do inner Join of SASHELP.prdsal12&amp;nbsp; with SASHELP.us_data.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Better than what? Not doing an Index? Depends on data set size, for smaller data sets it's not worth creating an index and creating the index may take more time than the merge.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159549"&gt;@Ronein&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How Can I know if data set&amp;nbsp;SASHELP.prdsal12&amp;nbsp; is already indexed?&lt;/P&gt;
&lt;P&gt;How Can I know if data set SASHELP.us_data &amp;nbsp;is already indexed?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Run a proc contents on the data set and examine the output. One of the displayed information is if any index exists. You can also query SASHELP.VTABLE to see if any indexes are present.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;The CONTENTS Procedure

Data Set Name	SASHELP.PRDSAL2	Observations	23040
Member Type	DATA	Variables	11
Engine	V9	&lt;FONT size="4" color="#FF0000"&gt;&lt;STRONG&gt;Indexes	0&lt;/STRONG&gt;&lt;/FONT&gt;
Created	10/24/2018 20:14:37	Observation Length	120
Last Modified	10/24/2018 20:14:37	Deleted Observations	0
Protection	 	Compressed	NO
Data Set Type	 	&lt;FONT size="4" color="#FF0000"&gt;&lt;STRONG&gt;Sorted	YES&lt;/STRONG&gt;&lt;/FONT&gt;
Label	Furniture sales data	 	 
Data Representation	SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64	 	 
Encoding	us-ascii ASCII (ANSI)	 	 &lt;/PRE&gt;
&lt;P&gt;Or query the SASHELP.VTABLE or VINDEX data sets.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc contents data=SASHELP.prdsal2;
run;

proc sql;
create table want as
select * from sashelp.vindex 
where libname='SASHELP' 
and memname = 'PRDSAL2';
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 22 Jun 2021 20:21:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749707#M235659</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-06-22T20:21:37Z</dc:date>
    </item>
    <item>
      <title>Re: Index data sets before Merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749714#M235662</link>
      <description>&lt;P&gt;"Better" would need some description on what is considered "better". Indexing takes resources. With large data sets it may take a fair amount of time to add the index(s). With small sets the performance difference of indexed vs non-indexed may be trivial.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can tell if sets are indexed with Proc Contents, which reports number of indexes, or Dictionary.Tables "Type of Indexes", or Dictionary.Indexes which will have the name of the index and more information about the index.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jun 2021 20:26:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749714#M235662</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2021-06-22T20:26:55Z</dc:date>
    </item>
    <item>
      <title>Re: Index data sets before Merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749715#M235663</link>
      <description>&lt;P&gt;SASHELP datasets are too small for indexes to be of any use. Also it is good policy not to tamper with the original SASHELP datasets so they work the same for all users.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jun 2021 20:34:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749715#M235663</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2021-06-22T20:34:04Z</dc:date>
    </item>
    <item>
      <title>Re: Index data sets before Merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749718#M235665</link>
      <description>I assumed you were referring to PRDSAL2 by the way, I don't have a dataset name PRDSAL12.</description>
      <pubDate>Tue, 22 Jun 2021 20:36:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749718#M235665</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-06-22T20:36:15Z</dc:date>
    </item>
    <item>
      <title>Re: Index data sets before Merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749736#M235673</link>
      <description>&lt;P&gt;If you are doing a join with SQL, you don't need to sort by state first.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jun 2021 21:55:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749736#M235673</guid>
      <dc:creator>tarheel13</dc:creator>
      <dc:date>2021-06-22T21:55:19Z</dc:date>
    </item>
    <item>
      <title>Re: Index data sets before Merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749759#M235683</link>
      <description>&lt;P&gt;I'll address just one of the ignored points here.&amp;nbsp; When would you index a data set before a DATA step MERGE?&amp;nbsp; Absolutely never, unless you have no choice.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The MERGE will be horribly inefficient, if you use an index to read in the observations in order.&amp;nbsp; Ian Whitlock once wrote a Coder's Corner paper along the lines of "Why Did This Code Take 24 Hours to Run?"&amp;nbsp; I can give you technical details about why this is true, if you are interested.&amp;nbsp; Just sort your data, then merge it.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 00:00:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749759#M235683</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2021-06-23T00:00:32Z</dc:date>
    </item>
    <item>
      <title>Re: Index data sets before Merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749778#M235694</link>
      <description>&lt;P&gt;To shine a light from another view point than the valid ones already mentioned:&lt;/P&gt;
&lt;P&gt;Production tables, as we assume they will be used over and over, should be optimised for their use. Typically, they should be sorted by the most frequent merge key(s), indexed by other useful merge key(s), indexed by frequent query key(s) (such as: give me all records matching condition A and B). The idea is that the cost of creating the table will be offset many times by repeatedly using these sort and index properties.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Additionally, production tables could have other desirable properties such as: suitable compression, buffer size, password protection, labelling.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To further answer your question, index relative efficiency depends on many factors such as observation length, cardinality, expected subset size (do you expect to extract 1% or 100% of the table?). So asking if having an index is better (than what?) is like asking how long a piece of string is: It depends.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note that merging in a data step requires either a sorted source dataset, and indexed source data set or an SPDE source data set. SQL does not have these requirements, but only because it will be doing the sorting itself: The sorting still takes place.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 04:06:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749778#M235694</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-06-23T04:06:53Z</dc:date>
    </item>
    <item>
      <title>Re: Index data sets before Merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749785#M235700</link>
      <description>&lt;P&gt;Indexes only help when you need to extract a small (10% or so) subset of a dataset. When more (or all) of the dataset is needed, indexes will slow down the merge to a point where sorting + merging will be faster than just merging with the index.&lt;/P&gt;
&lt;P&gt;This is because reading the index file as such will cause additional I/O, and reading the dataset is done in random order, which is always slower than sequential.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 05:53:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749785#M235700</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-06-23T05:53:12Z</dc:date>
    </item>
    <item>
      <title>Re: Index data sets before Merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749788#M235703</link>
      <description>&lt;P&gt;Just add the 10% figure is a rule-of-thumb only.&lt;/P&gt;
&lt;P&gt;A solid state storage will not suffer from random access as much as disks, and disks also depend on the RAID level.&lt;/P&gt;
&lt;P&gt;A narrow table will be less expensive to sort than a wide table, for a given number of observations.&lt;/P&gt;
&lt;P&gt;And so on. Many factors to consider. As usual.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 06:15:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749788#M235703</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-06-23T06:15:25Z</dc:date>
    </item>
    <item>
      <title>Re: Index data sets before Merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749790#M235704</link>
      <description>&lt;P&gt;What one must never forget: on a physical level, SAS does not read &lt;EM&gt;observations&lt;/EM&gt;, but &lt;EM&gt;dataset pages&lt;/EM&gt;.&lt;/P&gt;
&lt;P&gt;So, if you consider a dataset with an observation size of 1000 and the typical page size of 128K, you have more than 100 obs in a page, and statistically that means that even a small subset of obs will have to read the whole physical dataset. And that's why indexes usually slow everything down, unless obs size approaches page size.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 06:37:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Index-data-sets-before-Merge/m-p/749790#M235704</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-06-23T06:37:36Z</dc:date>
    </item>
  </channel>
</rss>

