<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic how to remove duplicate values with different lengths in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/how-to-remove-duplicate-values-with-different-lengths/m-p/338343#M77031</link>
    <description>&lt;P&gt;Dear,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In my raw data (data1)the following values present. I used this code1 to get ouput I need (data2)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;code1:&lt;/P&gt;&lt;P&gt;if length(term) &amp;lt; 200 then TERM1=strip(term);&lt;BR /&gt;Else if length(term) &amp;gt;= 200 then do;&lt;BR /&gt;TERM 1= strip(prxChange("s/(.{1,200})\s.*/\1/os", 1, term));&lt;BR /&gt;end;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then I need to remove the duplicate vales:&lt;/P&gt;&lt;P&gt;code2 :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Proc sort data=data2 out=data3 nodupkey;&lt;BR /&gt;by id term1;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The data3 has both obs, eventhough the OBS have same id and term1 values.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I think this is due to different lengths.&lt;/P&gt;&lt;P&gt;In data1 the term length s are 193 and 209&lt;/P&gt;&lt;P&gt;In data2, the term1 lengths are 193 and 194.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How to get same lengths for the values to remove the duplicate values. Please help. Thank you.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data data1;&lt;BR /&gt;input id term $628;&lt;BR /&gt;datalines;&lt;BR /&gt;1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.&lt;BR /&gt;1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.&lt;BR /&gt;Duplicate data&lt;BR /&gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data data2:&lt;BR /&gt;input id term1 $200;&lt;BR /&gt;datalines;&lt;BR /&gt;1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.&lt;BR /&gt;1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.&lt;BR /&gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 06 Mar 2017 03:44:27 GMT</pubDate>
    <dc:creator>knveraraju91</dc:creator>
    <dc:date>2017-03-06T03:44:27Z</dc:date>
    <item>
      <title>how to remove duplicate values with different lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-remove-duplicate-values-with-different-lengths/m-p/338343#M77031</link>
      <description>&lt;P&gt;Dear,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In my raw data (data1)the following values present. I used this code1 to get ouput I need (data2)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;code1:&lt;/P&gt;&lt;P&gt;if length(term) &amp;lt; 200 then TERM1=strip(term);&lt;BR /&gt;Else if length(term) &amp;gt;= 200 then do;&lt;BR /&gt;TERM 1= strip(prxChange("s/(.{1,200})\s.*/\1/os", 1, term));&lt;BR /&gt;end;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then I need to remove the duplicate vales:&lt;/P&gt;&lt;P&gt;code2 :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Proc sort data=data2 out=data3 nodupkey;&lt;BR /&gt;by id term1;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The data3 has both obs, eventhough the OBS have same id and term1 values.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I think this is due to different lengths.&lt;/P&gt;&lt;P&gt;In data1 the term length s are 193 and 209&lt;/P&gt;&lt;P&gt;In data2, the term1 lengths are 193 and 194.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How to get same lengths for the values to remove the duplicate values. Please help. Thank you.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data data1;&lt;BR /&gt;input id term $628;&lt;BR /&gt;datalines;&lt;BR /&gt;1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.&lt;BR /&gt;1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.&lt;BR /&gt;Duplicate data&lt;BR /&gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data data2:&lt;BR /&gt;input id term1 $200;&lt;BR /&gt;datalines;&lt;BR /&gt;1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.&lt;BR /&gt;1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.&lt;BR /&gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Mar 2017 03:44:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-remove-duplicate-values-with-different-lengths/m-p/338343#M77031</guid>
      <dc:creator>knveraraju91</dc:creator>
      <dc:date>2017-03-06T03:44:27Z</dc:date>
    </item>
    <item>
      <title>Re: how to remove duplicate values with different lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-remove-duplicate-values-with-different-lengths/m-p/338346#M77032</link>
      <description>&lt;P&gt;Try using function&amp;nbsp;&lt;STRONG&gt;compbl&lt;/STRONG&gt; instead&amp;nbsp;&lt;STRONG&gt;strip:&lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;code1:
if length(term) &amp;lt; 200 then TERM1=compbl(term);
Else
TERM1= compbl(prxChange("s/(.{1,200})\s.*/\1/os", 1, term));
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Mar 2017 03:53:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-remove-duplicate-values-with-different-lengths/m-p/338346#M77032</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2017-03-06T03:53:13Z</dc:date>
    </item>
    <item>
      <title>Re: how to remove duplicate values with different lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-remove-duplicate-values-with-different-lengths/m-p/338356#M77036</link>
      <description>&lt;P&gt;Thank you for the support. But it did not work. The lengths of term1 still shows 193 and194.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In my program, one time I used "options varlenchk=nowarn;". Does this effect compbl function. Thank you&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Mar 2017 05:52:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-remove-duplicate-values-with-different-lengths/m-p/338356#M77036</guid>
      <dc:creator>knveraraju91</dc:creator>
      <dc:date>2017-03-06T05:52:17Z</dc:date>
    </item>
    <item>
      <title>Re: how to remove duplicate values with different lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-remove-duplicate-values-with-different-lengths/m-p/338359#M77038</link>
      <description>&lt;P&gt;I'm short in time now to run it myself. Run next code and check in log.&lt;/P&gt;
&lt;P&gt;Especialy check proc compare output ?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data data1;
input id term $628;
datalines;
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
;
run;
 
data data2:
input id term1 $200;
datalines;
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
;
run;

proc compare base=data1 compare=data2; run;

%macro ex;
len=length(term);
put _N_= ' Before ' len=;
if length(term) &amp;lt; 200 then TERM1=compbl(term);
Else
TERM1= compbl(prxChange("s/(.{1,200})\s.*/\1/os", 1, term));
len=length(term);
put _N_= ' After ' len=;
%mend ex;

data1x;
 set data1;
     %ex;
run;

data2x;
 set data2;
     %ex;
run;

Proc sort data=data1x out=data1y nodupkey;
by id term1;
run;&lt;BR /&gt;&lt;BR /&gt;&lt;/CODE&gt;&lt;CODE class=" language-sas"&gt;Proc sort data=data2x out=data2y nodupkey;
by id term1;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Mar 2017 06:20:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-remove-duplicate-values-with-different-lengths/m-p/338359#M77038</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2017-03-06T06:20:32Z</dc:date>
    </item>
    <item>
      <title>Re: how to remove duplicate values with different lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-remove-duplicate-values-with-different-lengths/m-p/338391#M77052</link>
      <description>&lt;P&gt;I have run the code I sent to you. The code is mostly copy/paste of code you posted.&lt;/P&gt;
&lt;P&gt;There were errors - listed below:&lt;/P&gt;
&lt;P&gt;1) You missed a dot after the informat - should be: &amp;nbsp;&lt;STRONG&gt;$628. &amp;nbsp;$200.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;2) The length of TERM is 193 charcters, less than length defined 628.&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; To avoid the note: "NOTE: SAS went to a new line when INPUT statement reached past the end of a line"&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; I added code line: &amp;nbsp;&lt;STRONG&gt;infile datalines truncover;&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;3) I fixed some more mine typos.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Run next fixed code: the data is identical and duplicaes were filtered:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data data1;
infile datalines truncover;
input id term $628.;
datalines;
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
;
run;
 
data data2;
infile datalines truncover;
input id term1 $200.;
datalines;
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
;
run;

proc compare base=data1 compare=data2(rename=(term1=term)); run;

%macro ex(var);
len=length(&amp;amp;var);
put _N_= ' Before ' len=;
if length(&amp;amp;var) &amp;lt; 200 then TERM1=compbl(&amp;amp;var);
Else
TERM1= compbl(prxChange("s/(.{1,200})\s.*/\1/os", 1, &amp;amp;var));
len=length(&amp;amp;var);
put _N_= ' After ' len=;
%mend ex;

data data1x;
 set data1;
     %ex(term);
run;

data data2x;
 set data2;
     %ex(term1);
run;

data data3;
  set data1x data2x;
run;
Proc sort data=data3 out=data4 nodupkey;
by id term1;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 06 Mar 2017 09:59:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-remove-duplicate-values-with-different-lengths/m-p/338391#M77052</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2017-03-06T09:59:30Z</dc:date>
    </item>
    <item>
      <title>Re: how to remove duplicate values with different lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-remove-duplicate-values-with-different-lengths/m-p/338395#M77053</link>
      <description>beyond there is an extra blank between TERM and 1 - in line:&lt;BR /&gt;TERM 1= strip(prxChange("s/(.{1,200})\s.*/\1/os", 1, term));</description>
      <pubDate>Mon, 06 Mar 2017 10:06:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-remove-duplicate-values-with-different-lengths/m-p/338395#M77053</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2017-03-06T10:06:16Z</dc:date>
    </item>
  </channel>
</rss>

