<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS generating a large file, but barely using any computer resources in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832519#M329059</link>
    <description>It's installed on VM</description>
    <pubDate>Fri, 09 Sep 2022 15:26:10 GMT</pubDate>
    <dc:creator>Wolverine</dc:creator>
    <dc:date>2022-09-09T15:26:10Z</dc:date>
    <item>
      <title>SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832329#M328967</link>
      <description>&lt;P&gt;I'm running SAS on a Windows-based virtual machine.&amp;nbsp; I used PROC SQL to join 2 files, and it took 30 hours.&amp;nbsp; The output file is about 140GB.&amp;nbsp; While this file was running, I used the Task Manager to examine CPU, Memory, and Ethernet usage, to see which of them was the bottleneck.&amp;nbsp; Because it's a VM, there are no hard drives directly connected, and Ethernet is used to transfer all data in and out.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;During the first couple of hours, it varied.&amp;nbsp; Sometimes the CPU would be running high, sometimes it would be Memory, and sometimes it would be Ethernet.&amp;nbsp; However, for many hours during the middle of the process, it didn't seem like anything was running high.&amp;nbsp; CPU was in the 3-8% range (occasionally spiking to about 30% for a second and then dropping down again), Memory held steady at 39%, and Ethernet was about 88Kbs upload and 0Kbs download.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So what is SAS doing for all those hours?&amp;nbsp; It's barely processing data, it's not holding much data in active memory, and it's not transferring much data.&amp;nbsp; It's frustrating to wait 30 hours for a file when it seems like SAS isn't doing anything!&lt;/P&gt;</description>
      <pubDate>Thu, 08 Sep 2022 15:36:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832329#M328967</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2022-09-08T15:36:58Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832334#M328968</link>
      <description>&lt;P&gt;If this SQL accesses databases, a lot of the time would be while the databases are doing the work, so SAS isn't doing anything until the DB returns the information to SAS.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Sep 2022 15:43:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832334#M328968</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2022-09-08T15:43:00Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832336#M328969</link>
      <description>&lt;P&gt;The files are .sas7bdat.&amp;nbsp; As I understand it, these are static files, so SAS is not querying a database.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Sep 2022 15:46:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832336#M328969</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2022-09-08T15:46:06Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832339#M328970</link>
      <description>&lt;P&gt;Without knowing what was requested it is kind of hard to guess what might be going on.&lt;/P&gt;
&lt;P&gt;You might want to provide the example code and the number of records in each data set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Sep 2022 15:50:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832339#M328970</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2022-09-08T15:50:45Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832358#M328982</link>
      <description>&lt;P&gt;It's a complicated merge because there are about 50 variables that it can match on.&amp;nbsp; The file prepost_ICD_CPT has about 54 million records and is about 82GB.&amp;nbsp; The file prepost_FLAGS has 930 records and is about 1MB.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
	Create table temp.prepost_ICD_CPT_flags
	as Select distinct a.*,b.*
	From temp.prepost_ICD_CPT a JOIN temp.prepost_FLAGS b
	On a.cpt = b.HEDIS_proc_code OR a.cpt_prof = b.HEDIS_proc_code OR
		a.icd_pr1 = b.HEDIS_proc_code OR 
		a.icd_pr2 = b.HEDIS_proc_code OR 
		a.icd_pr3 = b.HEDIS_proc_code OR 
		a.icd_pr4 = b.HEDIS_proc_code OR 
		a.icd_pr5 = b.HEDIS_proc_code OR 
		a.icd_pr6 = b.HEDIS_proc_code OR 
		a.icd_pr7 = b.HEDIS_proc_code OR 
		a.icd_pr8 = b.HEDIS_proc_code OR 
		a.icd_pr9 = b.HEDIS_proc_code OR 
		a.icd_pr10 = b.HEDIS_proc_code OR 
		a.icd_pr11 = b.HEDIS_proc_code OR 
		a.icd_pr12 = b.HEDIS_proc_code OR 
		a.icd_pr13 = b.HEDIS_proc_code OR 
		a.icd_pr14 = b.HEDIS_proc_code OR 
		a.icd_pr15 = b.HEDIS_proc_code OR 
		a.icd_pr16 = b.HEDIS_proc_code OR 
		a.icd_pr17 = b.HEDIS_proc_code OR 
		a.icd_pr18 = b.HEDIS_proc_code OR 
		a.icd_pr19 = b.HEDIS_proc_code OR 
		a.icd_pr20 = b.HEDIS_proc_code OR 
		a.icd_pr21 = b.HEDIS_proc_code OR 
		a.icd_pr22 = b.HEDIS_proc_code OR 
		a.icd_pr23 = b.HEDIS_proc_code OR 
		a.icd_pr24 = b.HEDIS_proc_code OR 
		a.icd_pr25 = b.HEDIS_proc_code OR
		a.icd_dx1_prof = b.HEDIS_proc_code OR
		a.comp_icd_DX1 = b.HEDIS_DX_code OR 
		a.comp_icd_DX2 = b.HEDIS_DX_code OR 
		a.comp_icd_DX3 = b.HEDIS_DX_code OR 
		a.comp_icd_DX4 = b.HEDIS_DX_code OR 
		a.comp_icd_DX5 = b.HEDIS_DX_code OR 
		a.comp_icd_DX6 = b.HEDIS_DX_code OR 
		a.comp_icd_DX7 = b.HEDIS_DX_code OR 
		a.comp_icd_DX8 = b.HEDIS_DX_code OR 
		a.comp_icd_DX9 = b.HEDIS_DX_code OR 
		a.comp_icd_DX10 = b.HEDIS_DX_code OR 
		a.comp_icd_DX11 = b.HEDIS_DX_code OR 
		a.comp_icd_DX12 = b.HEDIS_DX_code OR 
		a.comp_icd_DX13 = b.HEDIS_DX_code OR 
		a.comp_icd_DX14 = b.HEDIS_DX_code OR 
		a.comp_icd_DX15 = b.HEDIS_DX_code OR 
		a.comp_icd_DX16 = b.HEDIS_DX_code OR 
		a.comp_icd_DX17 = b.HEDIS_DX_code OR 
		a.comp_icd_DX18 = b.HEDIS_DX_code OR 
		a.comp_icd_DX19 = b.HEDIS_DX_code OR 
		a.comp_icd_DX20 = b.HEDIS_DX_code OR 
		a.comp_icd_DX21 = b.HEDIS_DX_code OR 
		a.comp_icd_DX22 = b.HEDIS_DX_code OR 
		a.comp_icd_DX23 = b.HEDIS_DX_code OR 
		a.comp_icd_DX24 = b.HEDIS_DX_code OR 
		a.comp_icd_DX25 = b.HEDIS_DX_code
	Where /*Delete records that don't match any of the flag proc or DX codes*/
		(a.cpt in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.cpt_prof in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr1 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr2 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr3 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr4 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr5 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr6 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr7 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr8 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr9 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr10 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr11 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr12 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr13 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr14 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr15 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr16 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr17 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr18 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr19 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr20 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr21 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr22 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr23 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr24 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_pr25 in (select HEDIS_proc_code from temp.prepost_FLAGS) OR
			a.icd_DX1_prof in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX1 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX2 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX3 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX4 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX5 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX6 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX7 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX8 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX9 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX10 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX11 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX12 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX13 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX14 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX15 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX16 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX17 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX18 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX19 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX20 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX21 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX22 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX23 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX24 in (select HEDIS_DX_code from temp.prepost_FLAGS) OR
			a.comp_icd_DX25 in (select HEDIS_DX_code from temp.prepost_FLAGS));
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Sep 2022 16:59:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832358#M328982</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2022-09-08T16:59:21Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832412#M329013</link>
      <description>&lt;P&gt;Where is the TEMP SAS library stored? On your local hard drive?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your query is obviously IO bound and is seriously inefficient because of all of the sub-queries you have in your WHERE clause. The fact that you have to check over 50 variables to do your join points to a very inefficient table design. Rewriting this in a DATA step will speed this up drastically. We can't help you with that until you post some sample data for the two input tables.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Sep 2022 23:22:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832412#M329013</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2022-09-08T23:22:36Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832417#M329014</link>
      <description>Agreed with &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13976"&gt;@SASKiwi&lt;/a&gt;, this is an inefficient query that can be optimized with a format and possibly a data step or simpler SQL. Post some sample data and show the expected logic and we can probably save you hours with your query.</description>
      <pubDate>Thu, 08 Sep 2022 21:47:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832417#M329014</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2022-09-08T21:47:25Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832437#M329025</link>
      <description>&lt;P&gt;Since you have one quite small lookup table, a DATA step using two hash objects (for the separate keys) will be much faster.&lt;/P&gt;
&lt;P&gt;And you must seriously consider a long layout for&amp;nbsp;&lt;SPAN&gt;prepost_ICD_CPT, see Maxim 19.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Sep 2022 07:52:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832437#M329025</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-09-09T07:52:24Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832502#M329052</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13976"&gt;@SASKiwi&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Where is the TEMP SAS library stored? On your local hard drive?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your query is obviously IO bound and is seriously inefficient because of all of the sub-queries you have in your WHERE clause. The fact that you have to check over 50 variables to do your join points to a very inefficient table design. Rewriting this in a DATA step will speed this up drastically. We can't help you with that until you post some sample data for the two input tables.&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I recognize that the syntax is very inefficient, but let me provide some context.&amp;nbsp; This is an older bit of code that I wrote a few months ago.&amp;nbsp; Originally, only proc codes were required.&amp;nbsp; So while it was slow, it still completed by the following morning.&amp;nbsp; In the months since, I worked on some similar projects and developed some better code.&amp;nbsp; I even posted about it &lt;A href="https://communities.sas.com/t5/SAS-Programming/Medical-data-matching-diagnosis-codes-with-multiple-variables/m-p/830130" target="_self"&gt;here&amp;nbsp; &lt;/A&gt;and &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt; helped me further improve my approach.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This past Friday afternoon, I got a request to update this analysis file to include DX codes.&amp;nbsp; Rather than trying to rewrite the syntax with this better approach, it was a lot faster and easier to simply expand what I had already written.&amp;nbsp; I knew it would be slow, but it had the long Labor Day weekend to run.&amp;nbsp; I remotely logged in several times over the weekend to keep an eye on it and make sure it didn't crash.&amp;nbsp; That's when I noticed the lack of computer resources being used, and that piqued my curiosity.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And that's what this post is really about.&amp;nbsp; What is SAS doing when it doesn't appear to be doing much at all?&amp;nbsp; You say that it's very IO intensive, but doesn't that imply sustained periods of near-max transfer speeds?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It's a virtual machine, so my understanding is that there is no local hard drive.&amp;nbsp; All data is transferred via Ethernet.&amp;nbsp; There is a C: drive (where Windows is stored) and an E: drive (titled "Data"), but even if they are connected locally, neither of these is sufficiently large to work as the temp directory.&lt;/P&gt;</description>
      <pubDate>Fri, 09 Sep 2022 13:52:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832502#M329052</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2022-09-09T13:52:01Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832518#M329058</link>
      <description>&lt;P&gt;Are you using SAS installed on the VM or is it on a server?&lt;/P&gt;</description>
      <pubDate>Fri, 09 Sep 2022 15:24:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832518#M329058</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2022-09-09T15:24:55Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832519#M329059</link>
      <description>It's installed on VM</description>
      <pubDate>Fri, 09 Sep 2022 15:26:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832519#M329059</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2022-09-09T15:26:10Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832590#M329095</link>
      <description>&lt;P&gt;It's the sub-queries that are adding a lot of extra IO. Every row read in the main table, requires the sub-queries to be run so its no wonder it runs so slowly. Personally I don't think it is worth persevering with this approach unless you are OK with having to run it overnight all the time.&lt;/P&gt;</description>
      <pubDate>Fri, 09 Sep 2022 22:58:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832590#M329095</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2022-09-09T22:58:02Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832601#M329100</link>
      <description>&lt;P&gt;I don't think you need the WHERE clause, which looks like it's filtering rows with a successful match.&amp;nbsp; It's just reiterating your intention to produce an inner join, which I think is already the case for an unmodified "JOIN" operation.&amp;nbsp; Others more expert than I can give a more authoritative opinion, but I believe the particular WHERE clauses you specify are totally superfluous ... but very resource costly.&amp;nbsp; So maybe just dropping the WHERE conditions would speed things up sufficiently, without changing results.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now, in a context more familiar to me: If the temp.preport_FLAGS dataset has no duplicate HEDIS_proc_code values or duplicate HEDIS_dx_code values, then this data step could be a lot faster.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data temp.prepost_ICD_CPT_flags (drop=rc i);
  set temp.prepost_ICD_CPT ;
  if _n_=1 then do;
    if 0 then set temp.preport_FLAGS;
    declare hash hash_pc (dataset:'temp.preport_FLAGS');
      hash_pc.definekey('HEDIS_proc_code');
      hash_pc.definedata(all:'Y');
      hash_pc.definedone();
    declare hash hash_dx (dataset:'temp.preport_FLAGS');
      hash_dx.definekey('HEDIS_dx_code');
      hash_dx.definedata(all:'Y');
      hash_dx.definedone();
  end;

  array icd_p {*} cpt cpt_prof icd_pr1-icd_pr25 icd_dx1_prof ;
  /*Check the 28 vars above until first HEDIS_proc_code match (i.e. rc=0)*/
  do i=1 to dim(icd_p) until (rc=0);
    rc=hash_pc.find(key:icd_p{i});
  end;

  array icd_d {*} comp_icd_DX1-comp_icd_DX25 ;
  /*If still unmatched, check the 25 vars above until first HEDIS_dx_code match */
  if rc^=0 then do i=1 to dim(icd_d) until (rc=0);
    rc=hash_dx.find(key:icd_d{i});
  end;

  if rc=0 then output; /*Only output successful matches */
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;BTW this code assumes:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;all the matchable codes are numeric.&amp;nbsp; If not, the ARRAY statements need to be changed to specify character arrays.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;There are no identical variable names in the two datasets.&amp;nbsp; That's because this code would allow the preport_flags variable values to overwrite the same name variables found in the prepost_ICD_CPT, while the proc sql code would do the opposite.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;[editted addition].&amp;nbsp; If there are certain variables listed in the arrays that you expect to be significantly more likely to be matched than others, list them at the start of their corresponding arrays.&amp;nbsp; That rearrangement will find those (frequent) matches in earlier iterations of the DO loops.&amp;nbsp; &amp;nbsp;You could even make a couple of smaller, high-likelihood, arrays and do loops (one for the HEDIS_proc_code candidates, one for the HEDIS_dx_code) that you could start with, followed by arrays and loops for the remaining match candidates.&amp;nbsp;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Of course, this is untested code in the absence of sample data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 10 Sep 2022 12:58:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832601#M329100</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2022-09-10T12:58:09Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832692#M329145</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/43822"&gt;@Wolverine&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Further to what has been said in the posts above I&amp;nbsp; would like to share my observation.&lt;BR /&gt;First, having looked at the code and the output file size, the time of 30 hours is understandable. It's running on a VM would not matter (More often then not servers run on VM infrastructure).&amp;nbsp; Similarly the data being in external data bases would impact all programs and not just this one. It is the norm that external database and SAS work on different machines.&lt;BR /&gt;In this code a large dataset (54 million records) is being joined to a small dataset (930 records).&amp;nbsp;&lt;BR /&gt;Consider alternative approaches such as the one by&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp; OR from this paper&amp;nbsp; (&amp;nbsp;&lt;A href="https://support.sas.com/resources/papers/proceedings/proceedings/sugi26/p102-26.pdf" target="_blank"&gt;https://support.sas.com/resources/papers/proceedings/proceedings/sugi26/p102-26.pdf&lt;/A&gt;&amp;nbsp;).&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 10 Sep 2022 22:26:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832692#M329145</guid>
      <dc:creator>Sajid01</dc:creator>
      <dc:date>2022-09-10T22:26:39Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832710#M329148</link>
      <description>&lt;P&gt;Here a coding variant to the hash lookup approach as mentioned by&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;&amp;nbsp;and for which&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;already posted some code.&amp;nbsp;&lt;BR /&gt;Fully untested of course.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want(drop=_:);

  if _n_=1 then
    do;
      if 0 then set temp.prepost_FLAGS;
      dcl hash h_proc (dataset:"temp.prepost_FLAGS");
      h_proc.defineKey('HEDIS_proc_code');
      h_proc.defineData(all:'y');
      h_proc.defineDone();

      dcl hash h_dx(dataset:"temp.prepost_FLAGS");
      h_dx.defineKey('HEDIS_DX_code');
      h_dx.defineData(all:'y');
      h_dx.defineDone();
    end;
  call missing(of _all_);

  set temp.prepost_ICD_CPT;

  array _a_proc {*} icd_pr1-icd_pr25 icd_dx1_prof;
  do _i=1 to dim(_a_proc);
    if h_proc.find(key:_a_proc[_i])=0 then 
      do;
        output;
        return;
      end;
  end;

  array _a_dx {*} comp_icd_DX1-comp_icd_DX25;
  do _i=1 to dim(_a_dx);
    if h_dx.find(key:_a_dx[_i])=0 then 
      do;
        output;
        return;
      end;
  end;

run;

/* in case there can be duplicates in temp.prepost_ICD_CPT here mimicking the DISTINCT from the SQL */
proc sort data=want;
  by _all_;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The code above writes a row to output as soon as it finds a matching flag and then stops further checks (return statement).&amp;nbsp;&lt;BR /&gt;If for a single row there can be multiple flags that match and you want to write out the row multiple times then you would have to remove the RETURN; statements.&lt;/P&gt;</description>
      <pubDate>Sun, 11 Sep 2022 02:07:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832710#M329148</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2022-09-11T02:07:10Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832759#M329181</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12447"&gt;@Patrick&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The code you present includes this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;  array _a_proc {*} icd_pr1-icd_pr25 icd_dx1_prof;
  do _i=1 to dim(_a_proc);
    if h_proc.find(key:_a_proc[_i])=0 then 
      do;
        output;
        return;
      end;
  end;

  array _a_dx {*} comp_icd_DX1-comp_icd_DX25;
  do _i=1 to dim(_a_dx);
    if h_dx.find(key:_a_dx[_i])=0 then 
      do;
        output;
        return;
      end;
  end;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;which is presented as an alternative to my code:&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;  array icd_p {*} cpt cpt_prof icd_pr1-icd_pr25 icd_dx1_prof ;
  /*Check the 28 vars above until first HEDIS_proc_code match (i.e. rc=0)*/
  do i=1 to dim(icd_p) until (rc=0);
    rc=hash_pc.find(key:icd_p{i});
  end;

  array icd_d {*} comp_icd_DX1-comp_icd_DX25 ;
  /*If still unmatched, check the 25 vars above until first HEDIS_dx_code match */
  if rc^=0 then do i=1 to dim(icd_d) until (rc=0);
    rc=hash_dx.find(key:icd_d{i});
  end;

  if rc=0 then output; /*Only output successful matches */&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;These are logically equivalent in that each stops looping once a match is found.&amp;nbsp; They both do the same number of loop iterations.&amp;nbsp; Your code may make the loop-stopping more self-evident, but I prefer the economy of the "until ..." expression to produce the same effect.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And the "call missing(of _all_)" statement is not harmful (and is good use in some circumstances).&amp;nbsp; But it is superfluous in this case, since only successful hash "find's" are output, meaning there will be no inadvertent output of previously-"found" values from the prepost_flags dataset.&lt;/P&gt;</description>
      <pubDate>Sun, 11 Sep 2022 18:21:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832759#M329181</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2022-09-11T18:21:24Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832921#M329255</link>
      <description>&lt;P&gt;I appreciate all the suggestions, and I should have some time this week to do some comparison testing among these approaches.&amp;nbsp; I'm going to be doing plenty of similar work in the future, so I really want to find the best solution.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But what I'm really trying to understand in this post is what's going on "under the hood".&amp;nbsp; For example, if my original approach is very IO intensive, I would expect to see sustained periods of near-max transfer speeds.&amp;nbsp; Or if it requires a lot of processing, I would expect high CPU usage.&amp;nbsp; But there were 8+ hour stretches where CPU &amp;lt;10%, RAM usage was &amp;lt;40%, and transfer speeds were slower than an old dial-up modem!&amp;nbsp; What is SAS doing when it doesn't appear to be doing much at all?&lt;/P&gt;</description>
      <pubDate>Mon, 12 Sep 2022 15:20:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832921#M329255</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2022-09-12T15:20:49Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832992#M329289</link>
      <description>&lt;P&gt;Without knowing more about your VM setup, its hard to know what's going on. IO is definitely the problem, not CPU or memory. Do you have Task Manager available on your VM? I would expect to see a lot of activity in the SAS WORK utility files but these days Windows Explorer won't necessarily show real time updates.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Sep 2022 20:01:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/832992#M329289</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2022-09-12T20:01:04Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/833857#M329673</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13976"&gt;@SASKiwi&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Without knowing more about your VM setup, its hard to know what's going on. IO is definitely the problem, not CPU or memory. Do you have Task Manager available on your VM? I would expect to see a lot of activity in the SAS WORK utility files but these days Windows Explorer won't necessarily show real time updates.&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I have access to Task Manager (which is where I pulled the performance numbers from), but not Resource Manager (which provides more in-depth performance numbers).&amp;nbsp; I do see SAS using a lot of space in the D:\Data drive, and I assume that is where the WORK utility files are stored.&amp;nbsp; Still, with the amount of space being used and the amount of time it takes to do it, it works out to about 4.7MB/s transfer speed.&amp;nbsp; That's less than half of my home internet speed.&amp;nbsp; That still seems pretty slow for a "local" drive.&amp;nbsp; I should also note that when the match is first submitted, SAS will transfer data at upwards of 50MB/s to the temp directory for nearly an hour.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Anyway, I've been benchmarking various approaches.&amp;nbsp; I've tried a few variants of the PROC SQL merge that involve altering the Where clause (including removing it altogether).&amp;nbsp; It doesn't seem to make much difference, as every version has taken 27-30hrs to run.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm now attempting to use the &lt;A href="https://communities.sas.com/t5/SAS-Programming/Medical-data-matching-diagnosis-codes-with-multiple-variables/m-p/830130" target="_self"&gt;approach that I mentioned earlier &lt;/A&gt;&amp;nbsp;from &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;. &amp;nbsp;&amp;nbsp; I've added a macro to it to import the code lists, and added a %global statement to initialize the code list variables.&amp;nbsp; However, it seems to initialize them as numeric variables when they should be character, and that causes an error when I try to match the data to the code lists in the arrays.&amp;nbsp; Here is the syntax with a couple of the lists to be created:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%macro import(sheet);

/*global statement initializes one or more macro variables.*/
%global &amp;amp;sheet._DX &amp;amp;sheet._pr;

PROC SQL noprint;
	Select quote(HEDIS_DX_code) into :&amp;amp;sheet._DX separated by ", "
	From temp.cpt_hyst_&amp;amp;sheet.; 
QUIT;

PROC SQL noprint;
	Select quote(HEDIS_proc_code) into :&amp;amp;sheet._pr separated by ", "
	From temp.cpt_hyst_&amp;amp;sheet.; 
QUIT;

%mend;
%import (cpt_hyst_all)
%import (cpt_hyst_lap_all)

DATA temp.hyst_prepost_ICD_CPT_flags_TEST; SET temp.hyst_prepost_ICD_CPT;

/*Set up an array to recode missing values to 0*/
Array _arr(*) FLAG_cpt_hyst_ab FLAG_cpt_hyst_all
Do i=1 to dim(_arr);
If _arr(i)=. then _arr(i)=0;
End;
Drop i;

Array dx_codes PRINCIPAL_DIAGNOSIS_CODE DIAGNOSIS_CODE_1-DIAGNOSIS_CODE_36;
Array proc_codes PROCEDURE_CODE ICD_PROCEDURE_CODE_1-ICD_PROCEDURE_CODE_25;

do index=1 to dim(dx_codes );
if dx_codes[index] in: (&amp;amp;cpt_hyst_ab_DX.) THEN FLAG_cpt_hyst_ab=1;
if dx_codes[index] in: (&amp;amp;cpt_hyst_all_DX.) THEN FLAG_cpt_hyst_all=1;

END;

do index=1 to dim(proc_codes );
if proc_codes[index] in: (&amp;amp;cpt_hyst_ab_pr.) THEN FLAG_cpt_hyst_ab=1;
if proc_codes[index] in: (&amp;amp;cpt_hyst_all_pr.) THEN FLAG_cpt_hyst_all=1;
END;

RUN;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;And a portion of the error msg:&lt;/P&gt;
&lt;P&gt;NOTE: Invalid numeric data, 'N9961' , at line 380 column 6.&lt;BR /&gt;NOTE: Invalid numeric data, 'S372' , at line 381 column 6.&lt;BR /&gt;NOTE: Invalid numeric data, 'S350' , at line 382 column 6.&lt;BR /&gt;NOTE: Invalid numeric data, 'N99510' , at line 383 column 6.&lt;BR /&gt;NOTE: Invalid numeric data, 'T811' , at line 384 column 6.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 14:03:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/833857#M329673</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2022-09-16T14:03:44Z</dc:date>
    </item>
    <item>
      <title>Re: SAS generating a large file, but barely using any computer resources</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/833880#M329684</link>
      <description>You need to show the code from the log to tie the errors back to the code, what is line 380-384?</description>
      <pubDate>Fri, 16 Sep 2022 15:13:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-generating-a-large-file-but-barely-using-any-computer/m-p/833880#M329684</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2022-09-16T15:13:12Z</dc:date>
    </item>
  </channel>
</rss>

