<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Merging on several variables in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/Merging-on-several-variables/m-p/264521#M7283</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/81567"&gt;@SmallKatie﻿&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Glad to hear that my solution worked for you. I wouldn't have thought that you have &amp;gt;50 million observations. My background regarding patient data is clinical research, where large studies would have several thousand patients. I guess, your data is more related to health economics or insurance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Indeed, for datasets of this size the assignment of values from (not too large) look-up tables using formats or informats is particularly convenient as it doesn't require sorting, indexing or transposing. Your specific case of a character-to-numeric assignment was ideal for a numeric informat.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 18 Apr 2016 10:12:22 GMT</pubDate>
    <dc:creator>FreelanceReinh</dc:creator>
    <dc:date>2016-04-18T10:12:22Z</dc:date>
    <item>
      <title>Merging on several variables</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Merging-on-several-variables/m-p/264408#M7278</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a large dataset (as in table A) with around 100 variables that represent different diagnosis/procedure codes. Each row represents one patient.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Table A&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;var1 &amp;nbsp; &amp;nbsp;var2 &amp;nbsp; &amp;nbsp;var3 &amp;nbsp; &amp;nbsp; … &amp;nbsp; &amp;nbsp;var100 &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;diag1 &amp;nbsp;diag2 &amp;nbsp;diag4&lt;/P&gt;&lt;P&gt;diag3 &amp;nbsp;diag5 &amp;nbsp;diag4 &amp;nbsp; &amp;nbsp; ... &amp;nbsp; &amp;nbsp;diag800&lt;/P&gt;&lt;P&gt;diag4 &amp;nbsp;diag16&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also have a separate table, with values for each of the diagnosis/procedure codes (altogether more than 10 000 different codes).&lt;/P&gt;&lt;P&gt;The values range from 1-40, so many diagnosis codes have the same value.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Table B&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;diagnosis_code &amp;nbsp; &amp;nbsp; value&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;diag1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 12&lt;/P&gt;&lt;P&gt;diag2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;4&lt;/P&gt;&lt;P&gt;diag3 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;21&lt;/P&gt;&lt;P&gt;diag4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;39&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would like to merge this into my original value for each variable, to get something like this as a result:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Table C&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;var1 &amp;nbsp; &amp;nbsp;var2 &amp;nbsp; &amp;nbsp;var3 &amp;nbsp; &amp;nbsp; … &amp;nbsp; &amp;nbsp;var100 &amp;nbsp;value_var1 value_var2 ...&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;diag1 &amp;nbsp;diag2 &amp;nbsp;diag4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;12 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 4&lt;/P&gt;&lt;P&gt;diag3 &amp;nbsp;diag5 &amp;nbsp;diag4 &amp;nbsp; &amp;nbsp; ... &amp;nbsp; &amp;nbsp;diag800 &amp;nbsp; &amp;nbsp; &amp;nbsp;21 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 13&lt;/P&gt;&lt;P&gt;diag4 &amp;nbsp;diag16 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 39 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;19&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Of course I can merge in the Table B for each&amp;nbsp;of the 100 variables, but this will take too long.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Therefore I am thinking of constructing an array and checking for each of the values (its 40 possible values):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;array diagnosis{i} &amp;nbsp;{100} var1 - var100{&lt;/P&gt;&lt;P&gt;if diagnosis{i} in list{all variables with value 1}&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;then value_diagnosis{i} = 1&amp;nbsp;&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;if diagnosis{i} in list{all variables with value 40}&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;then value_diagnosis{i} = 40&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But this involves a lot of manual work - I am currenlty copy pasting the list of variables with a particular value.&lt;/P&gt;&lt;P&gt;Is there any more efficient way?&lt;/P&gt;&lt;P&gt;How could I create the list from the Table B in a better way?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using SAS Enterprise Guide, version 7.1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Many thanks indeed!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 17 Apr 2016 12:51:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Merging-on-several-variables/m-p/264408#M7278</guid>
      <dc:creator>SmallKatie</dc:creator>
      <dc:date>2016-04-17T12:51:52Z</dc:date>
    </item>
    <item>
      <title>Re: Merging on several variables</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Merging-on-several-variables/m-p/264411#M7279</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/81567"&gt;@SmallKatie﻿&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There are many situations where data processing is facilitated by datasets in a "long" format (many observations, but few variables).&lt;BR /&gt;The situation you describe is a case in point, but your Table A is in "wide" format (many variables, relatively few observations).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So, one approach would be to&amp;nbsp;transpose your data and then match the values to the diagnosis codes (see SAS code below). For the latter, I use PROC SQL, but after prior sorting (or indexing) you can use a data step (with a MERGE statement) as well.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Create test data */

data have_A;
input patid (var1 var2 var3) ($);
cards;
1 diag1  diag2  diag4
2 diag3  diag5  diag4
3 diag4  diag16 .
;

data have_B;
input diagnosis_code $ value;
cards;
diag1  12
diag2   4
diag3  21
diag4  39
diag5  13
diag16 19
;

/* Reshape dataset HAVE_A from wide to long format */

proc transpose data=have_A out=trans_A(rename=(col1=diagnosis_code)) name=diag_var;
by patid;
var var:;
run;

/* Match values to diagnosis codes */

proc sql;
create table want_long as
select a.*, b.value
from trans_A a natural left join have_B b
order by patid, diag_var;
quit;

/* If really necessary, return to wide format */

proc transpose data=want_long out=want_trans(drop=_:) prefix=value_;
by patid;
var value;
id diag_var;
run;

data want_wide;
merge have_A
      want_trans;
by patid;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;A different approach that could be applied to both "long" and "wide" format data is to use a user-defined informat to assign the values:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Create informat from dataset HAVE_B */

data infmt;
retain fmtname '@diagv';
set have_B(rename=(diagnosis_code=start value=label));
run;

proc format cntlin=infmt;
run;

/* Assign values using the informat */

data want_wide2;
set have_A;
array       var[3]; /* Please replace 3 by 100 */
array value_var[3]; /* for your real data.     */
do i=1 to dim(var);
  value_var[i]=input(var[i], diagv.);
end;
drop i;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The result is the same as dataset WANT_WIDE.&lt;/P&gt;</description>
      <pubDate>Sun, 17 Apr 2016 14:21:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Merging-on-several-variables/m-p/264411#M7279</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2016-04-17T14:21:39Z</dc:date>
    </item>
    <item>
      <title>Re: Merging on several variables</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Merging-on-several-variables/m-p/264506#M7281</link>
      <description>&lt;P&gt;I would also agree with&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/32733"&gt;@FreelanceReinh﻿&lt;/a&gt;, your problem is caused by the choice in data structure. &amp;nbsp;Whatever code you write against that type of dataset is going to be suboptimal. &amp;nbsp;Normalise your data, then process it. &amp;nbsp;If for some report at teh end you need transposed data then transpose it then. &amp;nbsp;Remember the data you program with should be structured for your benefit, to make your code efficient and simple. &amp;nbsp;There is a reason why CDISC for instance choose to use normalised structures, for example.&lt;/P&gt;</description>
      <pubDate>Mon, 18 Apr 2016 08:51:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Merging-on-several-variables/m-p/264506#M7281</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2016-04-18T08:51:41Z</dc:date>
    </item>
    <item>
      <title>Re: Merging on several variables</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Merging-on-several-variables/m-p/264512#M7282</link>
      <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/32733"&gt;@FreelanceReinh﻿&lt;/a&gt;&lt;SPAN class="login-bold"&gt;!&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="login-bold"&gt;Your solution worked perfectly! I ended up using&amp;nbsp;the second solution as my data is quite big (more than 50million observation) and transpose made it huge.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="login-bold"&gt;Thanks again!&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Apr 2016 09:33:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Merging-on-several-variables/m-p/264512#M7282</guid>
      <dc:creator>SmallKatie</dc:creator>
      <dc:date>2016-04-18T09:33:18Z</dc:date>
    </item>
    <item>
      <title>Re: Merging on several variables</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Merging-on-several-variables/m-p/264521#M7283</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/81567"&gt;@SmallKatie﻿&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Glad to hear that my solution worked for you. I wouldn't have thought that you have &amp;gt;50 million observations. My background regarding patient data is clinical research, where large studies would have several thousand patients. I guess, your data is more related to health economics or insurance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Indeed, for datasets of this size the assignment of values from (not too large) look-up tables using formats or informats is particularly convenient as it doesn't require sorting, indexing or transposing. Your specific case of a character-to-numeric assignment was ideal for a numeric informat.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Apr 2016 10:12:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Merging-on-several-variables/m-p/264521#M7283</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2016-04-18T10:12:22Z</dc:date>
    </item>
  </channel>
</rss>

