<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to make sure everybody gets everything? in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372633#M11299</link>
    <description>&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;- Are REFER_ID's unique to a person?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Yes they are unique to one person. But one person can have several Clients: CL_ID&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;and a CL_ID can have several Cases: CAS_ID&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;So what to do about&amp;nbsp;that? I systematecally dropped all duplicates after generating a new dataset:&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;for example: 
 proc sort data=Kea.trail8A noduprecs; 
      by _all_ ; Run; &lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;and then joined it with the next set. Is that a sensible way to do it? eventually I should have everywhere the same N ? Right?&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Many thanks&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 03 Jul 2017 00:54:48 GMT</pubDate>
    <dc:creator>AnnaNZ</dc:creator>
    <dc:date>2017-07-03T00:54:48Z</dc:date>
    <item>
      <title>How to make sure everybody gets everything?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372627#M11295</link>
      <description>&lt;P&gt;I have ca 50 datasets, that can be &amp;nbsp;categorised into 3 dataset types I could say&lt;/P&gt;&lt;P&gt;the main joins are:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;A: Refer_ID&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;B:Refer_ID &amp;nbsp;Cl_ID&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;C: &amp;nbsp;CL_ID Cas_ID&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am wanting to make sure every dataset has all 3 variables:&amp;nbsp;&lt;SPAN&gt;Refer_ID, CL_ID , Cas_ID to make the processing later on easier. \&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There are never the same number of Refer_ID, CL_ID or Cas_ID,&lt;/P&gt;&lt;P&gt;Because there are datasets where Refer_ID has more or less CL_ID and there are also different numbers of CAS_ID depending on the dataset.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am ‘merging' datasets with INNER JOINS, to make sure that every dataset always has all three variables: Refer_ID CL_ID Cas_ID&lt;/P&gt;&lt;P&gt;But they don't end up with the same amount of datapoints&amp;nbsp;at the end. Should they not logically all have then the same amount of datapoints?&lt;/P&gt;&lt;P&gt;How can I improve on my inner join statements? &amp;nbsp;I am wondering, whether it should be "merged' based on two variables if possible and not only based on one (&lt;/P&gt;&lt;PRE&gt;on A.CL_ID = B.CL_ID ;)&lt;/PRE&gt;&lt;P&gt;?&lt;/P&gt;&lt;P&gt;Ideas are most welcome&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;*/ INNER JOIN - Kea.trails_file_7  */;  
Proc sql;  
Create table Kea.trail7A as   
Select A. * , B.REFER_ID 
from Kea.trails_file_7 A  inner join Kea.REFERENCE2 B
 on A.CL_ID = B.CL_ID ;    
quit;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In the example below:&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;trails_file_7 has CL_ID  and I use Reference2 to attach &lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;Refer_ID, CL_ID , Cas_ID to trails7 - I want to make sure that trails 7 has all three variables. &lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 02 Jul 2017 23:52:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372627#M11295</guid>
      <dc:creator>AnnaNZ</dc:creator>
      <dc:date>2017-07-02T23:52:44Z</dc:date>
    </item>
    <item>
      <title>Re: How to make sure everybody gets everything?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372629#M11296</link>
      <description>&lt;P&gt;What uniquely identifies a record/person?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jul 2017 00:23:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372629#M11296</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-07-03T00:23:05Z</dc:date>
    </item>
    <item>
      <title>Re: How to make sure everybody gets everything?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372630#M11297</link>
      <description>&lt;P&gt;At the end it will be the REFER_ID, but not all datasets have the refer_id. Therefore I have to find ways to include it, and I thought that&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;datasets with&amp;nbsp;only CL_ID could be merged with datasets that have CL_ID and REFER_ID&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;datasets with&amp;nbsp;only CAS_ID could be merged with datasets that have CL_ID and CAS_ID and&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;datasets with CAS_ID and &amp;nbsp;CL_ID could be merged with datasets that have CL_ID or&amp;nbsp;CAS_ID&amp;nbsp;in combination with&amp;nbsp;REFER_ID&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Therefore I have built a reference dataset with&amp;nbsp; &amp;nbsp; REFER_ID,&amp;nbsp;&amp;nbsp;CL_ID and CAS_ID&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;and this would then be used to give every dataset a column with REFER_ID. To build that&amp;nbsp;reference dataset, I used the dataset that will build the spine. Therefore, this dataset should have all the REFER_IDs needed.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Is that line of thought correct? And how do I make sure my otherdatasets get all these REFER_ID attached to them ?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Many thanks&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jul 2017 00:34:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372630#M11297</guid>
      <dc:creator>AnnaNZ</dc:creator>
      <dc:date>2017-07-03T00:34:24Z</dc:date>
    </item>
    <item>
      <title>Re: How to make sure everybody gets everything?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372631#M11298</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/144103"&gt;@AnnaNZ&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Is that line of thought correct? And how do I make sure my otherdatasets get all these REFER_ID attached to them ?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Many thanks&lt;/SPAN&gt;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;BR /&gt;This is known as creating a master 'person' list and is commonly done in data integration projects. So yes, this is the correct approach and how I'd work with it. What you do is create custom formats that will map CAS_ID and CL_ID to REFER_ID in this. But make absolutely sure that it's a one to one relationship, it may not be. For example can a REFER_ID have more than one CAS_ID or CL_ID? Are REFER_ID's unique to a person?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jul 2017 00:46:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372631#M11298</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-07-03T00:46:17Z</dc:date>
    </item>
    <item>
      <title>Re: How to make sure everybody gets everything?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372633#M11299</link>
      <description>&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;- Are REFER_ID's unique to a person?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Yes they are unique to one person. But one person can have several Clients: CL_ID&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;and a CL_ID can have several Cases: CAS_ID&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;So what to do about&amp;nbsp;that? I systematecally dropped all duplicates after generating a new dataset:&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;for example: 
 proc sort data=Kea.trail8A noduprecs; 
      by _all_ ; Run; &lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;and then joined it with the next set. Is that a sensible way to do it? eventually I should have everywhere the same N ? Right?&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Many thanks&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jul 2017 00:54:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372633#M11299</guid>
      <dc:creator>AnnaNZ</dc:creator>
      <dc:date>2017-07-03T00:54:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to make sure everybody gets everything?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372635#M11300</link>
      <description>&lt;P&gt;Should they all have the same number of records? I can't say to be honest. It depends more on your data than anything else, but in the end aren't you expecting a single master data set anyways, not multiples.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jul 2017 01:50:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372635#M11300</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-07-03T01:50:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to make sure everybody gets everything?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372637#M11301</link>
      <description>&lt;P&gt;yes&lt;/P&gt;&lt;P&gt;I started from scratch and joined all that have the REFER_ID and Cl_ID - to create REFERENCE&amp;nbsp;1&lt;/P&gt;&lt;P&gt;Then all with &amp;nbsp;CL_ID &amp;nbsp;and CAS_ID&lt;SPAN&gt;- to create&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;REFERENCE&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;2&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Join&amp;nbsp;&lt;SPAN&gt;REFERENCE&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;1 &amp;amp;&amp;nbsp;REFERENCE&amp;nbsp;2 to create a reference dataset for all REFERENCE_ALL&amp;nbsp;with&amp;nbsp;REFER_ID&amp;nbsp;and Cl_IDand CAS_ID&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;SPAN&gt;then join every single dataset with this REFERENCE_ALL.&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Maybe that is a better way&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jul 2017 01:56:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/How-to-make-sure-everybody-gets-everything/m-p/372637#M11301</guid>
      <dc:creator>AnnaNZ</dc:creator>
      <dc:date>2017-07-03T01:56:57Z</dc:date>
    </item>
  </channel>
</rss>

