<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Duplicate individuals information added to first individual reference in SAS Enterprise Guide</title>
    <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320746#M21490</link>
    <description>&lt;P&gt;to check the dupliactes better do:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data dup;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;set have;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; by Id;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; if not (first.id and last.id);&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;then observe dup dataset and dceide how to proceed;&lt;/P&gt;</description>
    <pubDate>Thu, 22 Dec 2016 15:09:57 GMT</pubDate>
    <dc:creator>Shmuel</dc:creator>
    <dc:date>2016-12-22T15:09:57Z</dc:date>
    <item>
      <title>Duplicate individuals information added to first individual reference</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320708#M21484</link>
      <description>&lt;P&gt;Folks,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm fairly new to SAS so not sure if something like this is possible even.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm working with a dataset of over 1,000,000 individuals. I've come to notice that I have duplicate individuals within my dataset, however, the information which is contained within the variables are different. I wonder thus is it possible to take the second reference of an individual and create x number of new variables with that individual.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thus I would no longer have dupilicate individuals&amp;nbsp;but instead extra variables for a certain individual.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've attached an image of what I'm trying to do which prehaps explains things better than what I've written.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help would be greatly appreciated.&amp;nbsp;&lt;/P&gt;&lt;BR /&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/13277i1D63780C07D8CC34/image-size/large?v=1.0&amp;amp;px=600" border="0" alt="Example.PNG" title="Example.PNG" /&gt;</description>
      <pubDate>Thu, 22 Dec 2016 11:31:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320708#M21484</guid>
      <dc:creator>Sean_OConnor</dc:creator>
      <dc:date>2016-12-22T11:31:09Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate individuals information added to first individual reference</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320712#M21485</link>
      <description>&lt;P&gt;You can absolutely&amp;nbsp;do this in SAS, but requires som e data step programming. If you haven't acquired&amp;nbsp;that skill yet, I strongly recommend that you take the frees Programming 1 online training.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, creating data structures like this is typically awkward. it requires that the analysis/data&amp;nbsp;management&amp;nbsp;steps to follow to take those anomalies into account. So i would strive for to clean the data "once and for all" - determine the golden record/fields.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That said, use a data step with BY, RETAIN and conditional assignment statements. Also, you need an explicit OUTPUT statement at&amp;nbsp;the end of the BY group (hint: last.)&lt;/P&gt;</description>
      <pubDate>Thu, 22 Dec 2016 11:51:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320712#M21485</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-12-22T11:51:54Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate individuals information added to first individual reference</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320713#M21486</link>
      <description>&lt;P&gt;Are you sure you only have duplicates and not triplicates or other multiple variations?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How useful are the repetive records/variables?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Otherwise consider marking the first recorded so you can easily filter it out in future processes, but leaving your data structure the same otherwise.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Dec 2016 12:04:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320713#M21486</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-12-22T12:04:11Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate individuals information added to first individual reference</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320714#M21487</link>
      <description>&lt;P&gt;Try next code assumed data sorted by ID and there are x1-x4 variables all of same type&lt;/P&gt;
&lt;P&gt;either numeric or alphanumeric;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;set have(rename=(x1=v1 x2=v2 x3=v3 x4=v4));&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;by ID;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;reatain &amp;nbsp;i x1 - x8;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;array vn v1=v4;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;array xn x1-x8; &amp;nbsp; /* x5-x8 instead x1a-x4a */&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;if first.id then i=1;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;do j=1 to 4;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; xv(i) = vn(j);&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; i+1;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; end;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; i+1; &amp;nbsp; /* prepared for next row of same ID */&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;if last.id then output;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;keep id x1-x8;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Dec 2016 12:07:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320714#M21487</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2016-12-22T12:07:16Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate individuals information added to first individual reference</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320715#M21488</link>
      <description>&lt;P&gt;To add to the great advice above, what I would do in this situation is to normalise the data. &amp;nbsp;To do this you would go from:&lt;BR /&gt;ID &amp;nbsp; &amp;nbsp;X1 &amp;nbsp; &amp;nbsp;X2 &amp;nbsp; X3 &amp;nbsp; X4&lt;/P&gt;
&lt;P&gt;1 &amp;nbsp; &amp;nbsp; 100 &amp;nbsp; 8 &amp;nbsp; &amp;nbsp; 6 &amp;nbsp; &amp;nbsp; 66&lt;/P&gt;
&lt;P&gt;1 &amp;nbsp; &amp;nbsp; 101 &amp;nbsp; 112 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; 6&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To this:&lt;/P&gt;
&lt;P&gt;ID &amp;nbsp; PARAMETER &amp;nbsp; &amp;nbsp;RESULT&lt;/P&gt;
&lt;P&gt;1 &amp;nbsp; &amp;nbsp; X1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 100&lt;/P&gt;
&lt;P&gt;1 &amp;nbsp; &amp;nbsp; X2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 8&lt;/P&gt;
&lt;P&gt;1 &amp;nbsp; &amp;nbsp; X3 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 6&lt;/P&gt;
&lt;P&gt;1 &amp;nbsp; &amp;nbsp; X4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 6&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1 &amp;nbsp; &amp;nbsp; X1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 101&lt;/P&gt;
&lt;P&gt;...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Something like:&lt;/P&gt;
&lt;PRE&gt;proc transpose data=have out=want;
  by id;
  var x1--x4;
run;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It is then a very simple task to find duplicates within an id:&lt;/P&gt;
&lt;PRE&gt;proc sort data=have out=want dupout=duplicates nodupkey;
  by id parameter result;
run;&lt;/PRE&gt;
&lt;P&gt;This will give you a dataset with unique id/param/result records. &amp;nbsp;Now how you go about shrinking that down to unique id/param is down to th logic you want to apply, maybe it is the highest value, the min, the average, the first etc. &amp;nbsp;You haven't provided this info so I can't say. &amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Once you have the unique id/param, you can then transpose the data up again to get a final dataset&lt;/P&gt;</description>
      <pubDate>Thu, 22 Dec 2016 12:12:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320715#M21488</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2016-12-22T12:12:14Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate individuals information added to first individual reference</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320733#M21489</link>
      <description>&lt;P&gt;I would suggest two steps that are somewhat different than what you are suggesting. &amp;nbsp;First, find out the extent of the problem. &amp;nbsp;That can be done using:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc freq data=have;&lt;/P&gt;
&lt;P&gt;tables individual / noprint out=counts;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;proc freq data=counts;&lt;/P&gt;
&lt;P&gt;tables count;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The table produced by the second PROC FREQ shows you how many individuals have 1 observation in your data set, how many have 2 observations, etc.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Secondly, instead of creating extra variables, dump the duplicates into a second data set until you figure out what you want to do with them. &amp;nbsp;For example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sort data=have;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;by individual;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data first_one extras;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;set have;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;by individual;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;if first.individual then output first_one;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;else output extras;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can easily keep the duplicate information (separately) without needing to create any extra variables. &amp;nbsp;Once you figure out the strategy for dealing with duplicates, you can plan a more detailed program.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Dec 2016 13:45:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320733#M21489</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2016-12-22T13:45:31Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate individuals information added to first individual reference</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320746#M21490</link>
      <description>&lt;P&gt;to check the dupliactes better do:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data dup;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;set have;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; by Id;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; if not (first.id and last.id);&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;then observe dup dataset and dceide how to proceed;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Dec 2016 15:09:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320746#M21490</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2016-12-22T15:09:57Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate individuals information added to first individual reference</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320845#M21494</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/116786"&gt;@Sean_OConnor&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;That sounds like a data quality issue.&amp;nbsp;Or do you have versions in your table? Is there something like a version_nr or date variable(s) in your data?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;From a data modelling perspective I'd go for a two table approach. Store the best quality data or latest version in your main table, store all other versions in a second table. This way you can have as many versions as you like.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Dec 2016 23:31:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicate-individuals-information-added-to-first-individual/m-p/320845#M21494</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2016-12-22T23:31:31Z</dc:date>
    </item>
  </channel>
</rss>

