<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Sorting Very Large Files with SAS in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Sorting-Very-Large-Files-with-SAS/m-p/958954#M83908</link>
    <description>&lt;H2&gt;&lt;STRONG&gt;Introduction&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P&gt;Class action lawsuits often require handling extremely large datasets, sometimes exceeding 100 million records with more than 50 columns. Efficiently sorting these files is critical for analysis and reporting. In this report, I discuss my experience sorting such large files using SAS on an MSI laptop equipped with the Nvidia chip.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;STRONG&gt;Hardware and Software Setup &lt;/STRONG&gt;&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Laptop: &lt;/STRONG&gt;MSI with Nvidia chip Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz 2.59 GHz&lt;/LI&gt;
&lt;LI&gt;16.0 GB (15.8 GB usable)&lt;/LI&gt;
&lt;LI&gt;64-bit operating system, x64-based processor&lt;/LI&gt;
&lt;LI&gt;MSI (Micro-Star International) is a well-known manufacturer of high-performance laptops, particularly suited for gaming and data-intensive tasks. The model used in this case features high-speed processing and robust cooling capabilities.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Processor: &lt;/STRONG&gt;The laptop includes an &lt;STRONG&gt;Nvidia GPU (Graphics Processing Unit)&lt;/STRONG&gt;, which enhances computational efficiency, particularly for parallel processing tasks. While primarily used for graphics rendering, Nvidia chips can significantly accelerate data processing tasks when properly leveraged.&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Software: &lt;/STRONG&gt;SAS (Statistical Analysis System) is a powerful software suite used for data management, advanced analytics, and statistical modeling. It is widely used in industries such as healthcare, finance, and legal analytics for handling large datasets efficiently.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;File Type: &lt;/STRONG&gt;Converted to SAS dataset format (sas7bdat) for improved performance.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;H3&gt;&lt;STRONG&gt;Initial Attempt and Challenges &lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;Initially, I attempted to sort the entire file without downsizing. However, this resulted in an immediate laptop crash. While the MSI laptop recovered, I had to manually delete some work files generated during the crash to free up space and restore functionality. More RAM would have helped.&lt;/P&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;H3&gt;&lt;STRONG&gt;Optimizing the Sorting Process &lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;After experiencing the crash, I attempted to downsize the file multiple times to determine a manageable size for efficient sorting. Key findings include:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Sorting 25 Million Records&lt;/STRONG&gt;: Sorting this size was practical, completing in a reasonable time—approximately the time it takes to pour a cup of coffee.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Sorting 15 Million Records&lt;/STRONG&gt;: This size offered even faster sorting speeds without straining system resources.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Sorting Larger Files&lt;/STRONG&gt;: Any dataset significantly exceeding 25 million records risked performance issues or crashes.&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;H3&gt;Advanced Sorting Techniques&lt;/H3&gt;
&lt;P&gt;Advanced sorting techniques for very large datasets, were not used but are listed next:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Use PROC SORT with &lt;/STRONG&gt;&lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p02bhn81rn4u64n1b6l00ftdnxge.htm#n16kpz92dhej7pn1ql6tl0xbyyjc" target="_self"&gt;TAGSORT&lt;/A&gt;&amp;nbsp;&lt;STRONG&gt;option&lt;/STRONG&gt;:&lt;/P&gt;
&lt;P&gt;This can help overcome insufficient disk space issues.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Utilize the &lt;/STRONG&gt;&lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lesysoptsref/n0ipa8xt1ma3h7n1wqjqr99679pg.htm" target="_self"&gt;SORTSIZE&amp;nbsp;&lt;/A&gt;&lt;STRONG&gt;option&lt;/STRONG&gt;:&lt;/P&gt;
&lt;P&gt;Set SORTSIZE to limit the amount of available memory to about 1 or 2 megabytes to prevent unnecessary swapping. For example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="sas"&gt;PROC SORT data=large_dataset SORTSIZE=2M;
 BY key_variable;
RUN;&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Conclusion&lt;/H2&gt;
&lt;P&gt;For optimal performance on an MSI laptop with an Nvidia chip, sorting SAS datasets should be limited to approximately 15–25 million records at a time. Converting raw data into the SAS dataset format (sas7bdat) significantly improves sorting efficiency. By implementing advanced sorting techniques such as TAGSORT, and optimizing SORTSIZE, it's possible to further enhance sorting performance for very large datasets in class action lawsuit analytics.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Other Reports by Melvin Ott:&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://doi.org/10.21985/n2-g5nc-k574" target="_self"&gt;Leveraging SASPy for Efficient Analytics in Class Action&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://doi.org/10.21985/n2-3qek-9s30" target="_self"&gt;Sort Large Files Faster with SASPy&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
    <pubDate>Tue, 11 Feb 2025 21:37:32 GMT</pubDate>
    <dc:creator>SAS242424</dc:creator>
    <dc:date>2025-02-11T21:37:32Z</dc:date>
    <item>
      <title>Sorting Very Large Files with SAS</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Sorting-Very-Large-Files-with-SAS/m-p/958954#M83908</link>
      <description>&lt;H2&gt;&lt;STRONG&gt;Introduction&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P&gt;Class action lawsuits often require handling extremely large datasets, sometimes exceeding 100 million records with more than 50 columns. Efficiently sorting these files is critical for analysis and reporting. In this report, I discuss my experience sorting such large files using SAS on an MSI laptop equipped with the Nvidia chip.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;STRONG&gt;Hardware and Software Setup &lt;/STRONG&gt;&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Laptop: &lt;/STRONG&gt;MSI with Nvidia chip Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz 2.59 GHz&lt;/LI&gt;
&lt;LI&gt;16.0 GB (15.8 GB usable)&lt;/LI&gt;
&lt;LI&gt;64-bit operating system, x64-based processor&lt;/LI&gt;
&lt;LI&gt;MSI (Micro-Star International) is a well-known manufacturer of high-performance laptops, particularly suited for gaming and data-intensive tasks. The model used in this case features high-speed processing and robust cooling capabilities.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Processor: &lt;/STRONG&gt;The laptop includes an &lt;STRONG&gt;Nvidia GPU (Graphics Processing Unit)&lt;/STRONG&gt;, which enhances computational efficiency, particularly for parallel processing tasks. While primarily used for graphics rendering, Nvidia chips can significantly accelerate data processing tasks when properly leveraged.&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Software: &lt;/STRONG&gt;SAS (Statistical Analysis System) is a powerful software suite used for data management, advanced analytics, and statistical modeling. It is widely used in industries such as healthcare, finance, and legal analytics for handling large datasets efficiently.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;File Type: &lt;/STRONG&gt;Converted to SAS dataset format (sas7bdat) for improved performance.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;H3&gt;&lt;STRONG&gt;Initial Attempt and Challenges &lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;Initially, I attempted to sort the entire file without downsizing. However, this resulted in an immediate laptop crash. While the MSI laptop recovered, I had to manually delete some work files generated during the crash to free up space and restore functionality. More RAM would have helped.&lt;/P&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;H3&gt;&lt;STRONG&gt;Optimizing the Sorting Process &lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;After experiencing the crash, I attempted to downsize the file multiple times to determine a manageable size for efficient sorting. Key findings include:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Sorting 25 Million Records&lt;/STRONG&gt;: Sorting this size was practical, completing in a reasonable time—approximately the time it takes to pour a cup of coffee.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Sorting 15 Million Records&lt;/STRONG&gt;: This size offered even faster sorting speeds without straining system resources.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Sorting Larger Files&lt;/STRONG&gt;: Any dataset significantly exceeding 25 million records risked performance issues or crashes.&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;H3&gt;Advanced Sorting Techniques&lt;/H3&gt;
&lt;P&gt;Advanced sorting techniques for very large datasets, were not used but are listed next:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Use PROC SORT with &lt;/STRONG&gt;&lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p02bhn81rn4u64n1b6l00ftdnxge.htm#n16kpz92dhej7pn1ql6tl0xbyyjc" target="_self"&gt;TAGSORT&lt;/A&gt;&amp;nbsp;&lt;STRONG&gt;option&lt;/STRONG&gt;:&lt;/P&gt;
&lt;P&gt;This can help overcome insufficient disk space issues.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Utilize the &lt;/STRONG&gt;&lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lesysoptsref/n0ipa8xt1ma3h7n1wqjqr99679pg.htm" target="_self"&gt;SORTSIZE&amp;nbsp;&lt;/A&gt;&lt;STRONG&gt;option&lt;/STRONG&gt;:&lt;/P&gt;
&lt;P&gt;Set SORTSIZE to limit the amount of available memory to about 1 or 2 megabytes to prevent unnecessary swapping. For example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="sas"&gt;PROC SORT data=large_dataset SORTSIZE=2M;
 BY key_variable;
RUN;&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Conclusion&lt;/H2&gt;
&lt;P&gt;For optimal performance on an MSI laptop with an Nvidia chip, sorting SAS datasets should be limited to approximately 15–25 million records at a time. Converting raw data into the SAS dataset format (sas7bdat) significantly improves sorting efficiency. By implementing advanced sorting techniques such as TAGSORT, and optimizing SORTSIZE, it's possible to further enhance sorting performance for very large datasets in class action lawsuit analytics.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Other Reports by Melvin Ott:&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://doi.org/10.21985/n2-g5nc-k574" target="_self"&gt;Leveraging SASPy for Efficient Analytics in Class Action&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://doi.org/10.21985/n2-3qek-9s30" target="_self"&gt;Sort Large Files Faster with SASPy&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Tue, 11 Feb 2025 21:37:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Sorting-Very-Large-Files-with-SAS/m-p/958954#M83908</guid>
      <dc:creator>SAS242424</dc:creator>
      <dc:date>2025-02-11T21:37:32Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting Very Large Files with SAS</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Sorting-Very-Large-Files-with-SAS/m-p/958976#M83909</link>
      <description>&lt;P&gt;Thank you for sharing your experience, Dr. Ott. For readability, I pulled the content of your PDF attachment into the body of the message so that more community members might see it. Others may comment with other sort tips and experiences.&lt;/P&gt;</description>
      <pubDate>Tue, 11 Feb 2025 21:39:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Sorting-Very-Large-Files-with-SAS/m-p/958976#M83909</guid>
      <dc:creator>ChrisHemedinger</dc:creator>
      <dc:date>2025-02-11T21:39:33Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting Very Large Files with SAS</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Sorting-Very-Large-Files-with-SAS/m-p/958996#M83910</link>
      <description>Here is my code to sort  a BIG table.&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://communities.sas.com/t5/SAS-Programming/Proc-sort/m-p/955131" target="_blank"&gt;https://communities.sas.com/t5/SAS-Programming/Proc-sort/m-p/955131&lt;/A&gt;</description>
      <pubDate>Wed, 12 Feb 2025 01:43:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Sorting-Very-Large-Files-with-SAS/m-p/958996#M83910</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2025-02-12T01:43:49Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting Very Large Files with SAS</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Sorting-Very-Large-Files-with-SAS/m-p/959023#M83911</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/471294"&gt;@SAS242424&lt;/a&gt;&amp;nbsp;Thanks for sharing!&lt;/P&gt;
&lt;P&gt;Here my five cents:&lt;/P&gt;
&lt;P&gt;I guess what's "right" will very much depend on your data, your environment, the requirements and the usage of your data.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I assume with "class action data" you actually want to query your data in different ways - like once per claim type, the next time per case status and then per ... If that's true then I guess no single sort order will suffice to avoid "full table scans".&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4"&gt;@ChrisHemedinger&lt;/a&gt;&amp;nbsp;was apparently too humble to cite himself but it might be worth your while to read &lt;A href="https://blogs.sas.com/content/sasdummy/2016/02/04/avoid-sorting-data-in-sas/" target="_self"&gt;Sorting data in SAS: can you skip it?&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;With your data and SAS on a laptop it might be worth to consider storing the data on your c-drive under a library with the SPDE engine and with indexes created that match your most common where clauses or by groups.&lt;/P&gt;</description>
      <pubDate>Wed, 12 Feb 2025 10:00:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Sorting-Very-Large-Files-with-SAS/m-p/959023#M83911</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2025-02-12T10:00:57Z</dc:date>
    </item>
  </channel>
</rss>

