<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Refining dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Refining-dataset/m-p/254881#M48650</link>
    <description>&lt;P&gt;No, there are more than 2 duplicates.&amp;nbsp; Here is some problem.&amp;nbsp; The whole row is not duplicate; as you can see the column C and D are different.&amp;nbsp; I do not want them.&amp;nbsp; I want those variables which I showed in table 2.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried to use sort procedure with no duplicates but it does not remove the duplicates, probably these are not 'real' duplicates as I mentioned above that one or two columns are different.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 07 Mar 2016 02:13:54 GMT</pubDate>
    <dc:creator>wajmsu</dc:creator>
    <dc:date>2016-03-07T02:13:54Z</dc:date>
    <item>
      <title>Refining dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Refining-dataset/m-p/254872#M48648</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have a dataset about the blood sample withdrawal with more information than I need.&amp;nbsp; I want to reduce the duplicates and change the way data are presented.&amp;nbsp; Here is an example of the format I have data in (table 1) below and I want to change it as is shown in table 2 (both tables are not real datasets).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for your guidance!&lt;/P&gt;&lt;P&gt;Table 1: Existing format&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;subject_id&lt;/TD&gt;&lt;TD&gt;label&lt;/TD&gt;&lt;TD&gt;sample_address&lt;/TD&gt;&lt;TD&gt;sample_id&lt;/TD&gt;&lt;TD&gt;date&lt;/TD&gt;&lt;TD&gt;subj_date&lt;/TD&gt;&lt;TD&gt;visit&lt;/TD&gt;&lt;TD&gt;sample type&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;12&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;7/23/2004&lt;/TD&gt;&lt;TD&gt;2 040723&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;Plasma&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;12&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;q&lt;/TD&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;7/23/2004&lt;/TD&gt;&lt;TD&gt;2 040723&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;Plasma&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;12&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;e&lt;/TD&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;2/10/2005&lt;/TD&gt;&lt;TD&gt;2 050210&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;Plasma&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;12&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;D,11&lt;/TD&gt;&lt;TD&gt;8&lt;/TD&gt;&lt;TD&gt;2/10/2005&lt;/TD&gt;&lt;TD&gt;2 050210&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;Plasma&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;12&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;s&lt;/TD&gt;&lt;TD&gt;10&lt;/TD&gt;&lt;TD&gt;7/23/2004&lt;/TD&gt;&lt;TD&gt;2 040723&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;Plasma&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;25&lt;/TD&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;c&lt;/TD&gt;&lt;TD&gt;20&lt;/TD&gt;&lt;TD&gt;4/3/2007&lt;/TD&gt;&lt;TD&gt;6 070403&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;Plasma&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;25&lt;/TD&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;x&lt;/TD&gt;&lt;TD&gt;14&lt;/TD&gt;&lt;TD&gt;4/3/2007&lt;/TD&gt;&lt;TD&gt;6 070403&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;Plasma&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;25&lt;/TD&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;D,11&lt;/TD&gt;&lt;TD&gt;16&lt;/TD&gt;&lt;TD&gt;12/10/2007&lt;/TD&gt;&lt;TD&gt;6 071210&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;Plasma&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;25&lt;/TD&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;y&lt;/TD&gt;&lt;TD&gt;18&lt;/TD&gt;&lt;TD&gt;12/10/2007&lt;/TD&gt;&lt;TD&gt;6 071210&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;Plasma&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Table 2: The format I need&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;subject_id&lt;/TD&gt;&lt;TD&gt;label&lt;/TD&gt;&lt;TD&gt;date of 1st sample&lt;/TD&gt;&lt;TD&gt;date of 2nd sample&lt;/TD&gt;&lt;TD&gt;samp_typ&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;12&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;7/23/2004&lt;/TD&gt;&lt;TD&gt;2/10/2005&lt;/TD&gt;&lt;TD&gt;Plasma&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;25&lt;/TD&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;4/3/2007&lt;/TD&gt;&lt;TD&gt;12/10/2007&lt;/TD&gt;&lt;TD&gt;Plasma&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;</description>
      <pubDate>Sun, 06 Mar 2016 23:58:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Refining-dataset/m-p/254872#M48648</guid>
      <dc:creator>wajmsu</dc:creator>
      <dc:date>2016-03-06T23:58:52Z</dc:date>
    </item>
    <item>
      <title>Re: Refining dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Refining-dataset/m-p/254874#M48649</link>
      <description>&lt;P&gt;Is 2 the maximum number of duplicates? Or will there be an indeterminate number of samples?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You could do a proc sort, with the nodupkey option, and a proc transpose.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2016 00:08:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Refining-dataset/m-p/254874#M48649</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-03-07T00:08:08Z</dc:date>
    </item>
    <item>
      <title>Re: Refining dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Refining-dataset/m-p/254881#M48650</link>
      <description>&lt;P&gt;No, there are more than 2 duplicates.&amp;nbsp; Here is some problem.&amp;nbsp; The whole row is not duplicate; as you can see the column C and D are different.&amp;nbsp; I do not want them.&amp;nbsp; I want those variables which I showed in table 2.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried to use sort procedure with no duplicates but it does not remove the duplicates, probably these are not 'real' duplicates as I mentioned above that one or two columns are different.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2016 02:13:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Refining-dataset/m-p/254881#M48650</guid>
      <dc:creator>wajmsu</dc:creator>
      <dc:date>2016-03-07T02:13:54Z</dc:date>
    </item>
    <item>
      <title>Re: Refining dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Refining-dataset/m-p/254882#M48651</link>
      <description>&lt;P&gt;What are your key variables that define a duplicate? It looks like ID, Label and Type?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So your proc sort would be something like:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have nodupkey out=deduped;
by id label type Date;
run;


*then run a proc transpose;

proc transpose data=deduped out=want prefix=date;
by id label type;
var date;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 07 Mar 2016 02:33:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Refining-dataset/m-p/254882#M48651</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-03-07T02:33:50Z</dc:date>
    </item>
    <item>
      <title>Re: Refining dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Refining-dataset/m-p/254901#M48654</link>
      <description>&lt;P&gt;Would not something along the lines of (and no test data provided in a useable format to test this - i.e. a datastep so I don't have to type all that in) this work:&lt;/P&gt;
&lt;PRE&gt;data want (keep=subject_id label date1 date2 samp_type);
  set have;
  by subject_id date;
  retain cnt date1 date2;
  if first.subject_id then do;
    cnt=1;
    date1=date;
  end;
  else if cnt=1 then do;
    cnt=2;
    date2=date;
  end;
  if last.subject_id then output;
  format date1 date2 date9.;
run;&lt;/PRE&gt;</description>
      <pubDate>Mon, 07 Mar 2016 09:29:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Refining-dataset/m-p/254901#M48654</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2016-03-07T09:29:58Z</dc:date>
    </item>
    <item>
      <title>Re: Refining dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Refining-dataset/m-p/255018#M48695</link>
      <description>&lt;P&gt;Hi Reeza,&lt;/P&gt;&lt;P&gt;I could reorganized the dat variables.&amp;nbsp; However, when I looked at the data found some IDs were with both dates and some were with one date.&amp;nbsp; How can I remove the IDs with one date?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2016 17:33:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Refining-dataset/m-p/255018#M48695</guid>
      <dc:creator>wajmsu</dc:creator>
      <dc:date>2016-03-07T17:33:05Z</dc:date>
    </item>
  </channel>
</rss>

