<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS Parquet Engine Character Columns in SAS Studio</title>
    <link>https://communities.sas.com/t5/SAS-Studio/SAS-Parquet-Engine-Character-Columns/m-p/971594#M11440</link>
    <description>&lt;P&gt;Have you checked out the&amp;nbsp;&lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/v_065/enghdff/p12rwps1tjlz0zn1cnvobnjeu96x.htm" target="_blank" rel="noopener"&gt;CHAR_COLUMN_LIMIT&lt;/A&gt; option? By default it is 32767, but you could set it to the maximum expected limit across all of your character columns. It can be applied as both a LIBNAME option and dataset option so you could vary this by dataset.&lt;/P&gt;</description>
    <pubDate>Mon, 28 Jul 2025 21:49:44 GMT</pubDate>
    <dc:creator>SASKiwi</dc:creator>
    <dc:date>2025-07-28T21:49:44Z</dc:date>
    <item>
      <title>SAS Parquet Engine Character Columns</title>
      <link>https://communities.sas.com/t5/SAS-Studio/SAS-Parquet-Engine-Character-Columns/m-p/971552#M11438</link>
      <description>&lt;P&gt;When loading parquet files in SAS Studio, the String columns are inferred to be&amp;nbsp;&lt;SPAN&gt;32767 characters wide, even if they are much smaller.&lt;BR /&gt;This results in tables that are far too big to fit into memory and makes working with parquet files unmanageable.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Does the Parquet engine have an option to automatically infer character column sizes? We work with some files with thousands of columns, so manually setting each column size would be impractical.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Code used:&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;libname myprqt parquet "&amp;amp;sasworkdir" ;&lt;/PRE&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="AndrewM_0-1753700066768.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/108583i0D8A3DA750368BF8/image-size/medium?v=v2&amp;amp;px=400" role="button" title="AndrewM_0-1753700066768.png" alt="AndrewM_0-1753700066768.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Jul 2025 11:00:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/SAS-Parquet-Engine-Character-Columns/m-p/971552#M11438</guid>
      <dc:creator>Andrew-M</dc:creator>
      <dc:date>2025-07-28T11:00:48Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Parquet Engine Character Columns</title>
      <link>https://communities.sas.com/t5/SAS-Studio/SAS-Parquet-Engine-Character-Columns/m-p/971557#M11439</link>
      <description>&lt;P&gt;Note that it is not just an issue with Parquet files.&amp;nbsp; The same problem occurs when connecting to databases that allow the use of a STRING or VARCHAR variable type with undefined maximum length.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So a general solution that could use the same syntax to decide what lengths to use for character variables read from external databases would be very valuable.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Jul 2025 12:17:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/SAS-Parquet-Engine-Character-Columns/m-p/971557#M11439</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2025-07-28T12:17:28Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Parquet Engine Character Columns</title>
      <link>https://communities.sas.com/t5/SAS-Studio/SAS-Parquet-Engine-Character-Columns/m-p/971594#M11440</link>
      <description>&lt;P&gt;Have you checked out the&amp;nbsp;&lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/v_065/enghdff/p12rwps1tjlz0zn1cnvobnjeu96x.htm" target="_blank" rel="noopener"&gt;CHAR_COLUMN_LIMIT&lt;/A&gt; option? By default it is 32767, but you could set it to the maximum expected limit across all of your character columns. It can be applied as both a LIBNAME option and dataset option so you could vary this by dataset.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Jul 2025 21:49:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/SAS-Parquet-Engine-Character-Columns/m-p/971594#M11440</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2025-07-28T21:49:44Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Parquet Engine Character Columns</title>
      <link>https://communities.sas.com/t5/SAS-Studio/SAS-Parquet-Engine-Character-Columns/m-p/971597#M11441</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Note that it is not just an issue with Parquet files.&amp;nbsp; The same problem occurs when connecting to databases that allow the use of a STRING or VARCHAR variable type with undefined maximum length.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So a general solution that could use the same syntax to decide what lengths to use for character variables read from external databases would be very valuable.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To have such an option is certainly worth proposing as a new idea. I'll vote for it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/469107"&gt;@Andrew-M&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What you are raising is certainly creating performance issues. It shouldn't create memory issues though because a lot of SAS processing under compute only loads a single row at a time into memory.&lt;/P&gt;
&lt;P&gt;Under CAS the STRING data type gets mapped to VARCHAR(*) and will only consume as much memory as there is actual data (plus overhead for varchar). And I believe to remember (can't test) that CAS actually "knows" the max string length stored under a varchar(*) and will use this value to create a CHAR when moving the data from in-memory CAS to .sas7bdat compute.&lt;/P&gt;
&lt;P&gt;It's not pretty but one way to go could be to first load the parquet data into CAS and then from CAS to SAS Compute.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 29 Jul 2025 03:17:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/SAS-Parquet-Engine-Character-Columns/m-p/971597#M11441</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2025-07-29T03:17:17Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Parquet Engine Character Columns</title>
      <link>https://communities.sas.com/t5/SAS-Studio/SAS-Parquet-Engine-Character-Columns/m-p/971601#M11442</link>
      <description>Thanks, that's good advise. I'll advise my team on this option.&lt;BR /&gt;&lt;BR /&gt;Here is the syntax I've used:&lt;BR /&gt;`libname pqt parquet "&amp;amp;sasworkdir" CHAR_COLUMN_LIMIT=8;`&lt;BR /&gt;&lt;BR /&gt;Note, this quietly truncates values without giving any warnings.</description>
      <pubDate>Tue, 29 Jul 2025 08:01:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/SAS-Parquet-Engine-Character-Columns/m-p/971601#M11442</guid>
      <dc:creator>Andrew-M</dc:creator>
      <dc:date>2025-07-29T08:01:53Z</dc:date>
    </item>
  </channel>
</rss>

