<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Reducing disk usage while concatenating and merging in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936298#M368053</link>
    <description>&lt;P&gt;I think this might work. (I've already tried compress, length and program reorganisation).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The IDvars are not all unique, but I think I can see how it would work. The IDvars are the file numbers (file=1 for P1 and H1, =2 for P2 and H2 etc) and case numbers within each file (once each for the H (household) file, and multiple times for the P (person) file).&amp;nbsp; The desired output is one record for each P record, with the H vars attached.&lt;/P&gt;</description>
    <pubDate>Thu, 18 Jul 2024 23:29:38 GMT</pubDate>
    <dc:creator>BruceBrad</dc:creator>
    <dc:date>2024-07-18T23:29:38Z</dc:date>
    <item>
      <title>Reducing disk usage while concatenating and merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936155#M368004</link>
      <description>&lt;P&gt;I have a job which concatenates many large files into two files then merges them. I'm running out of disk space. Can I concatenate and merge in the same data step to remove some of the work files? Would proc sql help, or will it also create large working files. The pseudo code below describes the current structure.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data P;
set P1 P2 ... Pn;
run;
data H;
set H1 H2 ...Hn;
run;
data M;
merge P H;
by IDvars;
run;
[output based on M]&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 18 Jul 2024 12:41:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936155#M368004</guid>
      <dc:creator>BruceBrad</dc:creator>
      <dc:date>2024-07-18T12:41:33Z</dc:date>
    </item>
    <item>
      <title>Re: Reducing disk usage while concatenating and merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936164#M368007</link>
      <description>&lt;P&gt;Do you really need all of the data?&amp;nbsp; Can you eliminate observations or variables in any of the steps?&amp;nbsp; Getting rid of data you don't need as early as you can will save the most space and time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are you using COMPRESSION option? Either the system option or the dataset option?&amp;nbsp; Does that reduce the size of the datasets?&amp;nbsp; If you have long character variables this can reduce disk usage by a lot.&amp;nbsp; But if the datasets have only a few short variables it might not help at all.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Did you try making the first two steps views? (Note that might still require large utility space)&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data P / view=P ;
set P1 P2 ... Pn;
run;
data H / view=H;
set H1 H2 ...Hn;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Or even the third step?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jul 2024 13:12:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936164#M368007</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-07-18T13:12:34Z</dc:date>
    </item>
    <item>
      <title>Re: Reducing disk usage while concatenating and merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936277#M368045</link>
      <description>&lt;P&gt;Before trying different joining strategies, I suggest you add this at the start of your program and just rerun as is to see how much further it gets you:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options compress = yes;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 18 Jul 2024 20:05:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936277#M368045</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2024-07-18T20:05:19Z</dc:date>
    </item>
    <item>
      <title>Re: Reducing disk usage while concatenating and merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936286#M368048</link>
      <description>&lt;P&gt;Let's talk about the IDVARS.&amp;nbsp; Does the same combination appear once or more than once?&amp;nbsp; If it's just once among all the P data sets, and just once among all the H data sets, you can eliminate the middleman:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
   merge p1 p2 p3 ... pn
         h1 h2 h3 ... hn;
   by idvars;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 18 Jul 2024 20:54:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936286#M368048</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2024-07-18T20:54:38Z</dc:date>
    </item>
    <item>
      <title>Re: Reducing disk usage while concatenating and merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936298#M368053</link>
      <description>&lt;P&gt;I think this might work. (I've already tried compress, length and program reorganisation).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The IDvars are not all unique, but I think I can see how it would work. The IDvars are the file numbers (file=1 for P1 and H1, =2 for P2 and H2 etc) and case numbers within each file (once each for the H (household) file, and multiple times for the P (person) file).&amp;nbsp; The desired output is one record for each P record, with the H vars attached.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jul 2024 23:29:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936298#M368053</guid>
      <dc:creator>BruceBrad</dc:creator>
      <dc:date>2024-07-18T23:29:38Z</dc:date>
    </item>
    <item>
      <title>Re: Reducing disk usage while concatenating and merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936301#M368055</link>
      <description>&lt;P&gt;Here is a test program using the single datastep. The output seems correct, but I get lots of information messages about data being overwritten. What's going on here?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%macro hfile(filen);
data h&amp;amp;filen;
retain file &amp;amp;filen;
do id = 1 to 5;
  v1 = 1000*file + id;
  output;
end;
run;
%mend;
%macro pfile(filen);
data p&amp;amp;filen;
retain file &amp;amp;filen;
do id = 1 to 5;
  do person = 1 to 2;
    v2 = 100000*file + 100*id + person;
  output;
  end;
end;
run;
%mend;
%hfile(1);
%pfile(1);
%hfile(2);
%pfile(2);

data merged;
merge h1 h2 p1 p2;
by file id;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 19 Jul 2024 02:33:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936301#M368055</guid>
      <dc:creator>BruceBrad</dc:creator>
      <dc:date>2024-07-19T02:33:48Z</dc:date>
    </item>
    <item>
      <title>Re: Reducing disk usage while concatenating and merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936304#M368056</link>
      <description>&lt;P&gt;Because you have V1 (and V2) that is not one of the BY variables but appears on more than one dataset.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you are positive that the same values of the BY variables will never appear in both H1 and H2 then you can ignore the warning.&amp;nbsp; If you set the MSGLEVEL option to N instead of I then those messages will not be written.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Or you could rename those variables.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jul 2024 04:00:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936304#M368056</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-07-19T04:00:36Z</dc:date>
    </item>
    <item>
      <title>Re: Reducing disk usage while concatenating and merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936307#M368057</link>
      <description>&lt;P&gt;If you are sure that every H records has at least one P record (and every P record has a matching H record) then you could re-create the merge using interleaving SET instead.&amp;nbsp; Try something like this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options msglevel=i;
data test1;
  merge h1 h2 p1 p2;
  by file id;
run;

data test2;
  set h1(in=inh1 keep=file id) h2(in=inh2 keep=file id) p1(keep=file id) p2(keep=file id);
  by file id;
  if max(inh1,inh2) then do;
    set h1 h2;
    by file id;
    delete;
  end;
  set p1 p2;
  by file id;
run;

proc compare data=test1 compare=test2;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Log&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;163  options msglevel=i;
164  data test1;
165    merge h1 h2 p1 p2;
166    by file id;
167  run;

INFO: The variable v1 on data set WORK.H1 will be overwritten by data set WORK.H2.
INFO: The variable person on data set WORK.P1 will be overwritten by data set WORK.P2.
INFO: The variable v2 on data set WORK.P1 will be overwritten by data set WORK.P2.
NOTE: There were 5 observations read from the data set WORK.H1.
NOTE: There were 5 observations read from the data set WORK.H2.
NOTE: There were 10 observations read from the data set WORK.P1.
NOTE: There were 10 observations read from the data set WORK.P2.
NOTE: The data set WORK.TEST1 has 20 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


168
169  data test2;
170    set h1(in=inh1 keep=file id) h2(in=inh2 keep=file id) p1(keep=file id) p2(keep=file id);
171    by file id;
172    if max(inh1,inh2) then do;
173      set h1 h2;
174      by file id;
175      delete;
176    end;
177    set p1 p2;
178    by file id;
179  run;

NOTE: There were 5 observations read from the data set WORK.H1.
NOTE: There were 5 observations read from the data set WORK.H2.
NOTE: There were 10 observations read from the data set WORK.P1.
NOTE: There were 10 observations read from the data set WORK.P2.
NOTE: There were 5 observations read from the data set WORK.H1.
NOTE: There were 5 observations read from the data set WORK.H2.
NOTE: There were 10 observations read from the data set WORK.P1.
NOTE: There were 10 observations read from the data set WORK.P2.
NOTE: The data set WORK.TEST2 has 20 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Tom_0-1721362048977.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/98510iFC02C004BC266750/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Tom_0-1721362048977.png" alt="Tom_0-1721362048977.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jul 2024 04:07:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936307#M368057</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-07-19T04:07:37Z</dc:date>
    </item>
    <item>
      <title>Re: Reducing disk usage while concatenating and merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936479#M368094</link>
      <description>Given my data structure, doing the single merge (and suppressing the warnings) seems simplest. Thanks all.</description>
      <pubDate>Sat, 20 Jul 2024 02:37:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reducing-disk-usage-while-concatenating-and-merging/m-p/936479#M368094</guid>
      <dc:creator>BruceBrad</dc:creator>
      <dc:date>2024-07-20T02:37:00Z</dc:date>
    </item>
  </channel>
</rss>

