<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Combining datasets one to many where many dataset is wide in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802310#M315839</link>
    <description>Thanks for your quick reply!&lt;BR /&gt;The issue is, my patient data has 12 diagnosis codes per row. The dataset is 15GB so I'd prefer not to transpose the data from wide to long so I only have 1 diagnosis code. Would it make sense to I modify your sql code like this:&lt;BR /&gt;on a.dxcode=b.dxcode1 or a.dxcode=b.dxcode2 etc?</description>
    <pubDate>Tue, 15 Mar 2022 18:14:08 GMT</pubDate>
    <dc:creator>Lefty</dc:creator>
    <dc:date>2022-03-15T18:14:08Z</dc:date>
    <item>
      <title>Combining datasets one to many where many dataset is wide</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802290#M315827</link>
      <description>&lt;P&gt;Hi, I am trying to combine two datasets using Base SAS 9.4. &lt;BR /&gt;The first "master" dataset is laid out like this:&lt;/P&gt;
&lt;TABLE width="388"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;DxCode&lt;/TD&gt;
&lt;TD&gt;Disease1&lt;/TD&gt;
&lt;TD&gt;Disease2&lt;/TD&gt;
&lt;TD&gt;Disease3&lt;/TD&gt;
&lt;TD&gt;…&lt;/TD&gt;
&lt;TD&gt;Disease17&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;B180&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;TD&gt;1&lt;/TD&gt;
&lt;TD&gt;…&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;B181&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;TD&gt;1&lt;/TD&gt;
&lt;TD&gt;…&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;B188&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;TD&gt;1&lt;/TD&gt;
&lt;TD&gt;…&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;etc&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I want to combine it with a "patient" data file that looks like this:&lt;/P&gt;
&lt;TABLE width="388"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;PatientID&lt;/TD&gt;
&lt;TD&gt;Date&lt;/TD&gt;
&lt;TD&gt;DxCode1&lt;/TD&gt;
&lt;TD&gt;DxCode2&lt;/TD&gt;
&lt;TD&gt;…&lt;/TD&gt;
&lt;TD&gt;DxCode12&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;1&lt;/TD&gt;
&lt;TD&gt;1/1/2017&lt;/TD&gt;
&lt;TD&gt;A180&lt;/TD&gt;
&lt;TD&gt;B180&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;D43&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;1&lt;/TD&gt;
&lt;TD&gt;1/31/2017&lt;/TD&gt;
&lt;TD&gt;C059&lt;/TD&gt;
&lt;TD&gt;C060&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;I26&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;1&lt;/TD&gt;
&lt;TD&gt;5/1/2017&lt;/TD&gt;
&lt;TD&gt;D10&lt;/TD&gt;
&lt;TD&gt;C061&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;D33&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;1&lt;/TD&gt;
&lt;TD&gt;5/1/2017&lt;/TD&gt;
&lt;TD&gt;E123&lt;/TD&gt;
&lt;TD&gt;U32&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;S23&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;2&lt;/TD&gt;
&lt;TD&gt;5/1/2017&lt;/TD&gt;
&lt;TD&gt;I27&lt;/TD&gt;
&lt;TD&gt;S44&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;P55&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;2&lt;/TD&gt;
&lt;TD&gt;5/1/2017&lt;/TD&gt;
&lt;TD&gt;S43&lt;/TD&gt;
&lt;TD&gt;I28&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;B188&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I want to include only rows of data from the "patient" file where at least one of the values of the variables DX1-DX12 is found in the "master" file and I want to know which disease that DX1-DX12 corresponds to. Ultimately, I would like the resulting data to look like this (and it could have many rows per patient if they have multiple diseases on different dates):&lt;/P&gt;
&lt;TABLE width="452"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;PatientID&lt;/TD&gt;
&lt;TD&gt;Date&lt;/TD&gt;
&lt;TD&gt;Disease1&lt;/TD&gt;
&lt;TD&gt;Disease2&lt;/TD&gt;
&lt;TD&gt;Disease3&lt;/TD&gt;
&lt;TD&gt;…&lt;/TD&gt;
&lt;TD&gt;Disease17&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;1&lt;/TD&gt;
&lt;TD&gt;1/1/2017&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;TD&gt;1&lt;/TD&gt;
&lt;TD&gt;…&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;2&lt;/TD&gt;
&lt;TD&gt;5/1/2017&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;TD&gt;1&lt;/TD&gt;
&lt;TD&gt;…&lt;/TD&gt;
&lt;TD&gt;0&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;I don't care which "dx" field the disease appeared in or which particular dx code the patient had, just whether or not a dx code within each disease was diagnosed on a particular date. I was thinking I'd use proc sql, like this:&lt;/P&gt;
&lt;P&gt;proc sql;&lt;/P&gt;
&lt;P&gt;create table want as&lt;/P&gt;
&lt;P&gt;select a.*, b.patientID, b.date&lt;/P&gt;
&lt;P&gt;from master as a&lt;/P&gt;
&lt;P&gt;inner join patient as b&lt;/P&gt;
&lt;P&gt;on a.dxcode=b.dxcode1 or b.dxcode2 or b.dxcode3&lt;/P&gt;
&lt;P&gt;or b.dxcode4 or b.dxcode5 or b.dxcode6 or b.dxcode7&lt;/P&gt;
&lt;P&gt;or b.dxcode8 or b.dxcode9 or b.dxcode10 or b.dxcode11 or b.dxcode12;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But that is taking forever (my patient file is very large!) and I'm getting the error about "The execution of this query involves performing one or more Cartesian product joins that cannot be optimized"&lt;BR /&gt;I'm hoping there's a more efficient way to code this. Thank you so much in advance!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Mar 2022 17:30:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802290#M315827</guid>
      <dc:creator>Lefty</dc:creator>
      <dc:date>2022-03-15T17:30:46Z</dc:date>
    </item>
    <item>
      <title>Re: Combining datasets one to many where many dataset is wide</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802292#M315829</link>
      <description>&lt;P&gt;So you want patientID, date, and disease for each instance where a patient dxcode matches a dxcode in the master table?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Edit:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;OP replied&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Yes, exactly!&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Try this then.&amp;nbsp; With the data in a more normalized format:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data master;
   input dxcode:$4. disease;
datalines;
B180 3
B181 3
B188 3
S43 4
P55 17
;
run;

data patient;
	input id mydate:mmddyy10. dxcode:$4.;
datalines;
1 1/1/2017 A180	
1 1/1/2017 B180
1 1/1/2017 D43
1 1/31/2017 C059
1 1/31/2017 C060
1 1/31/2017 I26
1 5/1/2017 D10
1 5/1/2017 C061
1 5/1/2017 D33
1 5/1/2017 E123
1 5/1/2017 U32
1 5/1/2017 S23
2 5/1/2017 I27
2 5/1/2017 S44
2 5/1/2017 P55
2 5/1/2017 S43
2 5/1/2017 I28
2 5/1/2017 B188
;
run;

&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;We can pursue the SQL solution;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
	select a.id, a.mydate format date9., a.dxcode, b.disease
	from patient a inner join master b
	on a.dxcode = b.dxcode
	order by 1, 2;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;To get&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE style="border-collapse: collapse; width: 192pt;" border="0" width="256" cellspacing="0" cellpadding="0"&gt;
&lt;TBODY&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD width="59.5625px" height="20" class="xl65" style="height: 15.0pt; width: 48pt;"&gt;id&lt;/TD&gt;
&lt;TD width="65.525px" class="xl65" style="width: 48pt;"&gt;mydate&lt;/TD&gt;
&lt;TD width="63.7375px" class="xl66" style="width: 48pt;"&gt;dxcode&lt;/TD&gt;
&lt;TD width="66.175px" class="xl65" style="width: 48pt;"&gt;disease&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD width="59.5625px" height="20" class="xl67" style="height: 15.0pt;"&gt;1&lt;/TD&gt;
&lt;TD width="65.525px" class="xl68"&gt;1-Jan-17&lt;/TD&gt;
&lt;TD width="63.7375px" class="xl69"&gt;B180&lt;/TD&gt;
&lt;TD width="66.175px" class="xl67"&gt;3&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD width="59.5625px" height="20" class="xl67" style="height: 15.0pt;"&gt;2&lt;/TD&gt;
&lt;TD width="65.525px" class="xl68"&gt;1-May-17&lt;/TD&gt;
&lt;TD width="63.7375px" class="xl69"&gt;S43&lt;/TD&gt;
&lt;TD width="66.175px" class="xl67"&gt;4&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD width="59.5625px" height="20" class="xl67" style="height: 15.0pt;"&gt;2&lt;/TD&gt;
&lt;TD width="65.525px" class="xl68"&gt;1-May-17&lt;/TD&gt;
&lt;TD width="63.7375px" class="xl69"&gt;P55&lt;/TD&gt;
&lt;TD width="66.175px" class="xl67"&gt;17&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD width="59.5625px" height="20" class="xl67" style="height: 15.0pt;"&gt;2&lt;/TD&gt;
&lt;TD width="65.525px" class="xl68"&gt;1-May-17&lt;/TD&gt;
&lt;TD width="63.7375px" class="xl69"&gt;B188&lt;/TD&gt;
&lt;TD width="66.175px" class="xl67"&gt;3&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Note I inflicted additional disease on patient 2 for testing purposes.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Mar 2022 18:03:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802292#M315829</guid>
      <dc:creator>HB</dc:creator>
      <dc:date>2022-03-15T18:03:18Z</dc:date>
    </item>
    <item>
      <title>Re: Combining datasets one to many where many dataset is wide</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802293#M315830</link>
      <description>For that first table, is there always only one disease that is flagged or can multiple be flagged?</description>
      <pubDate>Tue, 15 Mar 2022 17:44:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802293#M315830</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2022-03-15T17:44:43Z</dc:date>
    </item>
    <item>
      <title>Re: Combining datasets one to many where many dataset is wide</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802296#M315832</link>
      <description>&lt;P&gt;Yes, exactly!&lt;/P&gt;</description>
      <pubDate>Tue, 15 Mar 2022 17:51:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802296#M315832</guid>
      <dc:creator>Lefty</dc:creator>
      <dc:date>2022-03-15T17:51:43Z</dc:date>
    </item>
    <item>
      <title>Re: Combining datasets one to many where many dataset is wide</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802297#M315833</link>
      <description>&lt;P&gt;Multiple diseases could be flagged and I actually do want to capture that because I want to know which dates the diseases were flagged.&lt;/P&gt;</description>
      <pubDate>Tue, 15 Mar 2022 17:52:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802297#M315833</guid>
      <dc:creator>Lefty</dc:creator>
      <dc:date>2022-03-15T17:52:21Z</dc:date>
    </item>
    <item>
      <title>Re: Combining datasets one to many where many dataset is wide</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802309#M315838</link>
      <description>&lt;P&gt;As long as your CODES to Disease flags is small enough to fit in memory this looks like a good use for HASH() object.&lt;/P&gt;
&lt;P&gt;First let's convert your printouts into actual datasets. Let's add at least one more code that matches.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data codes;
  input DxCode :$5. Disease1-Disease4 ;
cards;
B180 0 0 1 0
B181 0 0 1 0
S43  1 0 0 0
;
 

data have;
  input PatientID Date :mmddyy. (DxCode1-DxCode3) (:$5.);
  format date yymmdd10.;
cards;
1 1/1/2017 A180 B180   D43
1 1/31/2017 C059 C060   I26
1 5/1/2017 D10 C061   D33
1 5/1/2017 E123 U32   S23
2 5/1/2017 I27 S44   P55
2 5/1/2017 S43 I28   B188
;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Now let's combine the HAVE with CODES using a hash and write out one record for every match.&amp;nbsp; Having repeating values of DATE for the same ID is confusing, but we can deal with it if you want.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data match / view=match;
  set have codes(obs=0);
  if _n_=1 then do;
    declare hash h(dataset:'codes');
    rc=h.definekey('dxcode');
    rc=h.definedata('disease1','disease2','disease3','disease4');
    rc=h.definedone();
  end;
  array dx dxcode1-dxcode3 ;
  any=0;
  do index=1 to dim(dx);
    if not h.find(key:dx[index]) then do;
      any=1;
      output;
    end;
  end;
  if not any then do;
    array flags disease1-disease4;
    do index=1 to dim(flags);
       flags[index]=0;
    end;
    output;
  end;
  drop rc index dx:;
run;

proc summary data=match;
  by patientid;
  var date any disease: ;
  output out=want max= ;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Results:&lt;/P&gt;
&lt;PRE&gt;       Patient
Obs       ID      _TYPE_    _FREQ_          Date    any    Disease1    Disease2    Disease3    Disease4

 1        1          0         4      2017-05-01     1         0           0           1           0
 2        2          0         2      2017-05-01     1         1           0           0           0

&lt;/PRE&gt;</description>
      <pubDate>Tue, 15 Mar 2022 18:19:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802309#M315838</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2022-03-15T18:19:24Z</dc:date>
    </item>
    <item>
      <title>Re: Combining datasets one to many where many dataset is wide</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802310#M315839</link>
      <description>Thanks for your quick reply!&lt;BR /&gt;The issue is, my patient data has 12 diagnosis codes per row. The dataset is 15GB so I'd prefer not to transpose the data from wide to long so I only have 1 diagnosis code. Would it make sense to I modify your sql code like this:&lt;BR /&gt;on a.dxcode=b.dxcode1 or a.dxcode=b.dxcode2 etc?</description>
      <pubDate>Tue, 15 Mar 2022 18:14:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802310#M315839</guid>
      <dc:creator>Lefty</dc:creator>
      <dc:date>2022-03-15T18:14:08Z</dc:date>
    </item>
    <item>
      <title>Re: Combining datasets one to many where many dataset is wide</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802318#M315842</link>
      <description>Holy cow that is so much easier and faster than the sql way! Thank you thank you- I was not familiar with hash but it is amazing!</description>
      <pubDate>Tue, 15 Mar 2022 18:40:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-datasets-one-to-many-where-many-dataset-is-wide/m-p/802318#M315842</guid>
      <dc:creator>Lefty</dc:creator>
      <dc:date>2022-03-15T18:40:28Z</dc:date>
    </item>
  </channel>
</rss>

