<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: combine (concatenate) data sets and select unique obs in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253868#M48341</link>
    <description>Great. I will give it a try.</description>
    <pubDate>Wed, 02 Mar 2016 16:28:36 GMT</pubDate>
    <dc:creator>fengyuwuzu</dc:creator>
    <dc:date>2016-03-02T16:28:36Z</dc:date>
    <item>
      <title>combine (concatenate) data sets and select unique obs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253846#M48329</link>
      <description>&lt;P&gt;I have 7 data sets which have the same data structures (ID, age, Gender, etc).&lt;/P&gt;
&lt;P&gt;I want to combine them and then remove the duplicates. There is about 40% duplicated IDs. Some variables (like age or gender have a few missing values)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;one way is to use set operation, and concatenate them into a file, and then use proc sort and nodupkey to remove the duplicate IDs.&lt;/P&gt;
&lt;P&gt;In this way, there is a chance that I removed obs with age and gender info but kept those with missing values in age or gender.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;is there a way that I can combine them, during which the missing values in age or gender can be replaced by available values when ID is the same?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2016 15:45:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253846#M48329</guid>
      <dc:creator>fengyuwuzu</dc:creator>
      <dc:date>2016-03-02T15:45:09Z</dc:date>
    </item>
    <item>
      <title>Re: combine (concatenate) data sets and select unique obs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253852#M48332</link>
      <description>&lt;P&gt;Well, you could do a full join on all tables, and then do coalesce() on each column. Assuming that you don't have duplicates within each table.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2016 15:53:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253852#M48332</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-03-02T15:53:48Z</dc:date>
    </item>
    <item>
      <title>Re: combine (concatenate) data sets and select unique obs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253853#M48333</link>
      <description>&lt;P&gt;Missing is considered the lowest value and when sorting from low to high they end up first - and then selected over records with values.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you sort DESCENDING instead of the default ASCENDING you'll choose records with the information present. However, if you have multiple records where you're missing age in one record and sex in another record they wouldn't be combined so you'll need to modify them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can do each as a full join and use COALESCE to overwrite missing values. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There's also the UPDATE&amp;nbsp;statement, but I'm not sure it will handle multiple records properly.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2016 15:54:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253853#M48333</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-03-02T15:54:12Z</dc:date>
    </item>
    <item>
      <title>Re: combine (concatenate) data sets and select unique obs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253855#M48334</link>
      <description>&lt;P&gt;A couple of important questions ...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can any individual data set contain more than one observation for the same ID?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If two data sets contain conflicting values, does it matter which value gets used?&amp;nbsp; (For example, age=25 in one data set and age=26 for the same ID in a different data set.)&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2016 16:07:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253855#M48334</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2016-03-02T16:07:58Z</dc:date>
    </item>
    <item>
      <title>Re: combine (concatenate) data sets and select unique obs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253856#M48335</link>
      <description>There are some duplicates in ID variable in 6 of the 7 sets. But within the table, if age is missing for an ID, it is also missing in the same duplidate ID.</description>
      <pubDate>Wed, 02 Mar 2016 16:08:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253856#M48335</guid>
      <dc:creator>fengyuwuzu</dc:creator>
      <dc:date>2016-03-02T16:08:09Z</dc:date>
    </item>
    <item>
      <title>Re: combine (concatenate) data sets and select unique obs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253858#M48337</link>
      <description>&lt;P&gt;Within the same table, there are some duplicated in ID. &lt;BR /&gt;&lt;BR /&gt;It is a good question. Is there a way to confirm for the same ID the age might be different? I assume they reported the same in the 7 data sets but need to make sure.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2016 16:17:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253858#M48337</guid>
      <dc:creator>fengyuwuzu</dc:creator>
      <dc:date>2016-03-02T16:17:59Z</dc:date>
    </item>
    <item>
      <title>Re: combine (concatenate) data sets and select unique obs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253859#M48338</link>
      <description>If the data comes from the same source, I suggest that order new cleansed data.</description>
      <pubDate>Wed, 02 Mar 2016 16:13:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253859#M48338</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-03-02T16:13:45Z</dc:date>
    </item>
    <item>
      <title>Re: combine (concatenate) data sets and select unique obs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253865#M48339</link>
      <description>&lt;P&gt;Well, cleaning the data will have to remain a separate step.&amp;nbsp; Once you have clean data, here is a method that works regardless of the number of observations per ID in each source.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data all7;&lt;/P&gt;
&lt;P&gt;set source1 source2 source3 source4 source5 source6 source7;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sort data=all7;&lt;/P&gt;
&lt;P&gt;by ID;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want;&lt;/P&gt;
&lt;P&gt;update all7 (obs=0) all7;&lt;/P&gt;
&lt;P&gt;by ID;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The major advantage of this approach (and avoiding coalesce) is that you don't need to know the names of all the variables, and you don't need any sort of code to address each variable by name.&amp;nbsp; But this won't resolve conflicts such as different ages in different sources.&amp;nbsp; It will merely take the last nonmissing value that it locates.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2016 16:22:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253865#M48339</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2016-03-02T16:22:10Z</dc:date>
    </item>
    <item>
      <title>Re: combine (concatenate) data sets and select unique obs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253867#M48340</link>
      <description>Thank you Reeze. I am not familiar with COALESCE that you and LinusH pointed out. I will do some research on it. Thanks.</description>
      <pubDate>Wed, 02 Mar 2016 16:28:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253867#M48340</guid>
      <dc:creator>fengyuwuzu</dc:creator>
      <dc:date>2016-03-02T16:28:20Z</dc:date>
    </item>
    <item>
      <title>Re: combine (concatenate) data sets and select unique obs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253868#M48341</link>
      <description>Great. I will give it a try.</description>
      <pubDate>Wed, 02 Mar 2016 16:28:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/combine-concatenate-data-sets-and-select-unique-obs/m-p/253868#M48341</guid>
      <dc:creator>fengyuwuzu</dc:creator>
      <dc:date>2016-03-02T16:28:36Z</dc:date>
    </item>
  </channel>
</rss>

