<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Model table with big text field in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/657950#M197183</link>
    <description>&lt;P&gt;Hello, I'm designing a Data Mart solution on SAS. I have a couple of design question. The main fact table is a list of tickets. By business request each update to one of this ticket overwrites the last one instead of adding a row. This is the first problem, I'm afraid that an update of such a large table is more expensive than simply adding a row (we're talking of 100M distinct tickets).&lt;/P&gt;&lt;P&gt;Another problem is that one field of this table consist of a text description of the ticket status. The variable lenght is defined at 2000 and multiplied for 100M rows I'm worry it will generate a table so big it'll be difficult to handle.&lt;/P&gt;&lt;P&gt;How can I manage the text field? Do I keep it in the main fact table or do I treat it like a separated dimension (even if it has the same cardinality as the fact_table?)&lt;/P&gt;&lt;P&gt;I'm thinking that many of the analytical jobs that use this table downstream don't need the text field so keeping it in a different table could improve join elaboration time but I also understand it's bad design.&lt;/P&gt;&lt;P&gt;What's your advice? Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 12 Jun 2020 10:52:35 GMT</pubDate>
    <dc:creator>Crysis85</dc:creator>
    <dc:date>2020-06-12T10:52:35Z</dc:date>
    <item>
      <title>Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/657950#M197183</link>
      <description>&lt;P&gt;Hello, I'm designing a Data Mart solution on SAS. I have a couple of design question. The main fact table is a list of tickets. By business request each update to one of this ticket overwrites the last one instead of adding a row. This is the first problem, I'm afraid that an update of such a large table is more expensive than simply adding a row (we're talking of 100M distinct tickets).&lt;/P&gt;&lt;P&gt;Another problem is that one field of this table consist of a text description of the ticket status. The variable lenght is defined at 2000 and multiplied for 100M rows I'm worry it will generate a table so big it'll be difficult to handle.&lt;/P&gt;&lt;P&gt;How can I manage the text field? Do I keep it in the main fact table or do I treat it like a separated dimension (even if it has the same cardinality as the fact_table?)&lt;/P&gt;&lt;P&gt;I'm thinking that many of the analytical jobs that use this table downstream don't need the text field so keeping it in a different table could improve join elaboration time but I also understand it's bad design.&lt;/P&gt;&lt;P&gt;What's your advice? Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 12 Jun 2020 10:52:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/657950#M197183</guid>
      <dc:creator>Crysis85</dc:creator>
      <dc:date>2020-06-12T10:52:35Z</dc:date>
    </item>
    <item>
      <title>Re: Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/657958#M197184</link>
      <description>&lt;P&gt;&lt;SPAN&gt;&amp;gt;&lt;EM&gt; This is the first problem, I'm afraid that an update of such a large table is more expensive that simply adding a row (we're talking of 100M distinct tickets).&lt;/EM&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Yes it is more expensive. It looks like you have no choice though. Even if you append you need to disable the existing entry. How many updates a day on this table?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am unsure I understand the rest of the text. Paragraphs help legibility too.&lt;/P&gt;</description>
      <pubDate>Fri, 12 Jun 2020 08:27:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/657958#M197184</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-06-12T08:27:57Z</dc:date>
    </item>
    <item>
      <title>Re: Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/658016#M197194</link>
      <description>&lt;P&gt;Thanks, I rewrote my question, I hope it's more readable now.&lt;/P&gt;&lt;P&gt;I have about 90k tickets coming from operational systems daily, about 10% of which are updates, but it varies a lot (100k new ticket the other day and only 4k yesterday)&lt;/P&gt;</description>
      <pubDate>Fri, 12 Jun 2020 13:44:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/658016#M197194</guid>
      <dc:creator>Crysis85</dc:creator>
      <dc:date>2020-06-12T13:44:56Z</dc:date>
    </item>
    <item>
      <title>Re: Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/658088#M197221</link>
      <description>&lt;P&gt;Many of the procedures will ignore variables not specifically listed. Exceptions are a few like Proc Print or Proc Freq that process every variable unless restricted with either data set options or list of variables explicitly to use such on a Var statement in Proc Print or Tables statement in Proc Freq.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;One of the nice things about SAS are the data set options that let you filter variables or records for any other purpose. So you can ignore undesired fields by Dropping them when not wanted. &lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Proc anyproc&amp;nbsp;&amp;nbsp; data=mybigdataset (drop=longtextvariable);&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;is pseudo&amp;nbsp; code for removing the variable if it might be a problem.&lt;/P&gt;</description>
      <pubDate>Fri, 12 Jun 2020 14:39:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/658088#M197221</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-06-12T14:39:17Z</dc:date>
    </item>
    <item>
      <title>Re: Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/658239#M197276</link>
      <description>Thanks for clarifying.&lt;BR /&gt;This looks like a good match for SPDE's strengths. &lt;BR /&gt;- The indexing performance is much better. Perfect for these daily updates of less than 1% of the table &lt;BR /&gt;- The compression is much better. Perfect for that long field. I typically use compress=binary but try what's best for your data. And use partsize=500g if the table is stored wholly in one location. These options go in the libname statement. &lt;BR /&gt;- Whether you need one or two tables depends on the usage as you have pointed out. One is simpler to manage, and avoids a massive index-based join when you need the text. Test how much a single table slows down the updates and the common analytics jobs, and how slow the join is when the text is needed, and decide. Figure-based decisions are best. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;</description>
      <pubDate>Sat, 13 Jun 2020 03:52:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/658239#M197276</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-06-13T03:52:56Z</dc:date>
    </item>
    <item>
      <title>Re: Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/658241#M197278</link>
      <description>&lt;P&gt;Will this be a SAS or a database table? If database - which one?&lt;/P&gt;
&lt;P&gt;Are the text descriptions "free form" or is there a limited number of possible descriptions which you could store in a reference table?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have 10% updates: Is that eventually only driven by a few columns that are often changing? If so then you could consider to store the data in two tables with one table having the frequently changing columns. If stored as SAS tables then both tables could have an identical sort order so they are easily combined (could be a data step view).&lt;/P&gt;</description>
      <pubDate>Sat, 13 Jun 2020 04:20:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/658241#M197278</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2020-06-13T04:20:44Z</dc:date>
    </item>
    <item>
      <title>Re: Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/658242#M197279</link>
      <description>&lt;P&gt;How I understand things even if dropping the variables you will still have to read the data initially (if a SAS table). It's just not adding the variables to the PDV.&lt;/P&gt;</description>
      <pubDate>Sat, 13 Jun 2020 04:13:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/658242#M197279</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2020-06-13T04:13:40Z</dc:date>
    </item>
    <item>
      <title>Re: Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/660056#M197616</link>
      <description>&lt;P&gt;As a first step I'll&amp;nbsp; load the updated data from Oracle in a staging area then I'll update the tickets in the main table based on new data so the answer is SAS dataset. The description is free form I'm afraid.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Jun 2020 14:52:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/660056#M197616</guid>
      <dc:creator>Crysis85</dc:creator>
      <dc:date>2020-06-16T14:52:21Z</dc:date>
    </item>
    <item>
      <title>Re: Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/660611#M197687</link>
      <description>&lt;P&gt;So have you tried to use SPDE? What difference did it make?&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2020 04:03:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/660611#M197687</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-06-17T04:03:15Z</dc:date>
    </item>
    <item>
      <title>Re: Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/660654#M197689</link>
      <description>&lt;P&gt;Then giving SPDE a shot like&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;&amp;nbsp;proposes is definitely worth it.&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2020 04:09:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/660654#M197689</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2020-06-17T04:09:13Z</dc:date>
    </item>
    <item>
      <title>Re: Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/661944#M197838</link>
      <description>&lt;P&gt;I see you chose the SPDE reply as the solution. Would you care to share your findings?&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2020 22:52:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/661944#M197838</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-06-17T22:52:23Z</dc:date>
    </item>
    <item>
      <title>Re: Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/663506#M198077</link>
      <description>&lt;P&gt;Yes we're still in design phase, some new requirements are rolling in so It'll be a while before implementation, but I'll definitely try this approach.&lt;/P&gt;&lt;P&gt;It's even possible that the main stakeholder changed his mind about wanting to keep the history of past ticket status. In this eventuality SPDE is no longer a viable option? (Plus we'd have to deal with possibly a 300M rows fact table assuming about 3 updates for ticket)&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jun 2020 14:34:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/663506#M198077</guid>
      <dc:creator>Crysis85</dc:creator>
      <dc:date>2020-06-19T14:34:57Z</dc:date>
    </item>
    <item>
      <title>Re: Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/663681#M198172</link>
      <description>SPDE is often a better option.You should benchmark and pick what works best.&lt;BR /&gt;</description>
      <pubDate>Sat, 20 Jun 2020 06:31:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/663681#M198172</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-06-20T06:31:11Z</dc:date>
    </item>
    <item>
      <title>Re: Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/663682#M198173</link>
      <description>&lt;P&gt;How often do you get such updates/inserts? Is that some overnight process or are you considering intraday processing. If it's the later then you might also face some table locking challenges.&lt;/P&gt;</description>
      <pubDate>Sat, 20 Jun 2020 06:42:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/663682#M198173</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2020-06-20T06:42:48Z</dc:date>
    </item>
    <item>
      <title>Re: Model table with big text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/663939#M198302</link>
      <description>&lt;P&gt;Definitely overnight process. The process that is up currently (developed by another company) runs for half a day, that's why we are starting from scratch, revisiting requisites and try to optimize the solution.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We'll try both engine options and see what works best then. Thanks again&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jun 2020 09:39:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Model-table-with-big-text-field/m-p/663939#M198302</guid>
      <dc:creator>Crysis85</dc:creator>
      <dc:date>2020-06-22T09:39:56Z</dc:date>
    </item>
  </channel>
</rss>

