<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Using Proc SQL to create a pair of two samples from different datasets based on variables. in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Using-Proc-SQL-to-create-a-pair-of-two-samples-from-different/m-p/410001#M100211</link>
    <description>Thank you very much! I found a Matching MACRO to help me solve the problem&lt;BR /&gt;I had.&lt;BR /&gt;</description>
    <pubDate>Thu, 02 Nov 2017 20:34:02 GMT</pubDate>
    <dc:creator>kishiyo</dc:creator>
    <dc:date>2017-11-02T20:34:02Z</dc:date>
    <item>
      <title>Using Proc SQL to create a pair of two samples from different datasets based on variables.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-Proc-SQL-to-create-a-pair-of-two-samples-from-different/m-p/408416#M99707</link>
      <description>&lt;P&gt;Hello everyone,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am working on two different datasets, and would like to borrow your wisdom:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Dataset 1: A group of treatment samples with Variables A and B&lt;/P&gt;&lt;P&gt;Dataset 2: A group of control samples with Variables A and B&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There are 47 samples in Dataset 1 (Treatment) and +100 samples in Dataset 2 (Control).&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to create 47 pairs - 47 from Dataset1 and 47 from Dataset2- with the closest value of Variables A and B.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;First of all, I want to minimize the absolute difference in variable A between treatment sample i (from Dataset1) and control sample i (from Dataset2).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Second, likewise in the previous step, I want to minimize the absolute difference in variable B between&amp;nbsp;&lt;SPAN&gt;treatment sample i (from Dataset1) and control sample i (from Dataset2).&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Based on the minimum difference in variables A and B, I would like to create a pair of treatment and control samples such that they are very close in terms of the value of variables A and B.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I tried the following code, and the resulting dataset gave 47 pairs, matched in terms of minimum difference in the variables between treatment and control samples. However, there are a lot of duplicates-&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;For example,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;ID: A&lt;/STRONG&gt;&amp;nbsp;from Dataset 1 was paired with &lt;STRONG&gt;ID:&amp;nbsp;1&lt;/STRONG&gt; from Dataset 2 based on the matching criterion. However, &lt;STRONG&gt;ID: B&lt;/STRONG&gt; from Dataset 1 was also paired with &lt;STRONG&gt;ID:&amp;nbsp;1&lt;/STRONG&gt; from Dataset 2 because the matching variable of ID: B is very close to that of ID: R.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I would like to eliminate these duplicates (overlaps). I would like only one control sample (from Dataset 2) to be matched with one treatment sample (from Dataset 1), but codes that I came up with did not work as I want it to be.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;

create table dataset as

select a.id, a.variableA as variableA_treatment, a.variableB as variableB_treatment, b.id, b.variableA as variableA_control, b.variableB as variableB_control

from dataset1 as a left join dataset2 as b

on a.variableA &amp;gt; b.variableA or a.variableA &amp;lt; b.variableA or a.variableB &amp;gt; b.variableB or a.variableB &amp;lt; b.variableB

group by a.id

having abs(a.variableA-b.VariableA)=min(abs(a.variableA-b.variableA))
;

quit; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;Any advice and suggestion will be appreciated.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you very much for your time.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Kishio&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 29 Oct 2017 21:57:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-Proc-SQL-to-create-a-pair-of-two-samples-from-different/m-p/408416#M99707</guid>
      <dc:creator>kishiyo</dc:creator>
      <dc:date>2017-10-29T21:57:19Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc SQL to create a pair of two samples from different datasets based on variables.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-Proc-SQL-to-create-a-pair-of-two-samples-from-different/m-p/408428#M99708</link>
      <description>&lt;P&gt;The Mayo Clinic&amp;nbsp;has a bunch of algorithms on their website that illustrates case-control matching algorithms, greedy algorithms and variable matching. And the full SAS code is included.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://www.mayo.edu/research/departments-divisions/department-health-sciences-research/division-biomedical-statistics-informatics/software/locally-written-sas-macros" target="_blank"&gt;http://www.mayo.edu/research/departments-divisions/department-health-sciences-research/division-biomedical-statistics-informatics/software/locally-written-sas-macros&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;PS. Data management is typically for DataFlux/DI Studio questions so I'll move this post to the Base SAS forum&lt;/P&gt;</description>
      <pubDate>Sun, 29 Oct 2017 23:09:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-Proc-SQL-to-create-a-pair-of-two-samples-from-different/m-p/408428#M99708</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-10-29T23:09:27Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc SQL to create a pair of two samples from different datasets based on variables.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-Proc-SQL-to-create-a-pair-of-two-samples-from-different/m-p/410001#M100211</link>
      <description>Thank you very much! I found a Matching MACRO to help me solve the problem&lt;BR /&gt;I had.&lt;BR /&gt;</description>
      <pubDate>Thu, 02 Nov 2017 20:34:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-Proc-SQL-to-create-a-pair-of-two-samples-from-different/m-p/410001#M100211</guid>
      <dc:creator>kishiyo</dc:creator>
      <dc:date>2017-11-02T20:34:02Z</dc:date>
    </item>
  </channel>
</rss>

