<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: De-dup a list of values stored in a variable separated by commas in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514953#M138905</link>
    <description>&lt;P&gt;On 10 million records on my machine&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Log report&lt;/P&gt;
&lt;P&gt;1055 data have;&lt;BR /&gt;1056 infile cards truncover;&lt;BR /&gt;1057 input HAVE_ID_NUM $200.;&lt;BR /&gt;1058 do _n_=1 to 10e6;&lt;BR /&gt;1059 output;&lt;BR /&gt;1060 end;&lt;BR /&gt;1061 cards;&lt;/P&gt;
&lt;P&gt;NOTE: The data set WORK.HAVE has 10000000 observations and 1 variables.&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt; real time 4.53 seconds&lt;BR /&gt; cpu time 1.81 seconds&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;1063 ;&lt;BR /&gt;1064&lt;BR /&gt;1065 data want;&lt;BR /&gt;1066 set have;&lt;BR /&gt;1067 length want $200;&lt;BR /&gt;1068 do _n_=1 to countw(HAVE_ID_NUM,',');&lt;BR /&gt;1069 temp=scan(HAVE_ID_NUM,_n_,',');&lt;BR /&gt;1070 if index((want),strip(temp))=0 then want=catx(', ',want,temp);&lt;BR /&gt;1071 end;&lt;BR /&gt;1072 drop temp;&lt;BR /&gt;1073 run;&lt;/P&gt;
&lt;P&gt;NOTE: There were 10000000 observations read from the data set WORK.HAVE.&lt;BR /&gt;NOTE: The data set WORK.WANT has 10000000 observations and 2 variables.&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt; real time 1:07.06&lt;BR /&gt; cpu time 23.84 seconds&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;1074&lt;BR /&gt;1075&lt;BR /&gt;1076 data want_array_method;&lt;BR /&gt;1077 set have;&lt;BR /&gt;1078 array t(100)$10 _temporary_ ;&lt;BR /&gt;1079 length want $200;&lt;BR /&gt;1080 do _n_=1 to countw(HAVE_ID_NUM,',');&lt;BR /&gt;1081 temp=strip(scan(HAVE_ID_NUM,_n_,','));&lt;BR /&gt;1082 if strip(temp) not in t then do;&lt;BR /&gt;1083 t(_n_)=temp;&lt;BR /&gt;1084 end;&lt;BR /&gt;1085 end;&lt;BR /&gt;1086 want=catx(', ', of t(*));&lt;BR /&gt;1087 drop temp;&lt;BR /&gt;1088 run;&lt;/P&gt;
&lt;P&gt;NOTE: There were 10000000 observations read from the data set WORK.HAVE.&lt;BR /&gt;NOTE: The data set WORK.WANT_ARRAY_METHOD has 10000000 observations and 2 variables.&lt;BR /&gt;&lt;STRONG&gt;NOTE: DATA statement used (Total process time):&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt; real time 28.25 seconds&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt; cpu time 23.56 seconds&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 20 Nov 2018 23:50:49 GMT</pubDate>
    <dc:creator>novinosrin</dc:creator>
    <dc:date>2018-11-20T23:50:49Z</dc:date>
    <item>
      <title>De-dup a list of values stored in a variable separated by commas</title>
      <link>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514931#M138892</link>
      <description>&lt;P&gt;Hi.&amp;nbsp; I'm wondering if there's a way to de-dup a list of values stored in a variable.&amp;nbsp; There could be duplicates in the middle or end of the string and there could be multiple duplicates within the string.&amp;nbsp; Any&amp;nbsp;ideas for doing this? I'd really appreciate the help.&amp;nbsp; Thanks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;HAVE_ID_NUM&lt;BR /&gt;1033112065, 1629420120,&amp;nbsp;&lt;SPAN&gt;1033112065, 1629420120,&amp;nbsp;1033112065,&amp;nbsp;1033112011, 1629420120&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;WANT_ID_NUM&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;1033112065, 1629420120,&amp;nbsp;1033112011&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Nov 2018 22:14:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514931#M138892</guid>
      <dc:creator>buechler66</dc:creator>
      <dc:date>2018-11-20T22:14:01Z</dc:date>
    </item>
    <item>
      <title>Re: De-dup a list of values stored in a variable separated by commas</title>
      <link>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514936#M138896</link>
      <description>&lt;P&gt;There isn't a tool that does this exactly, but you can program it:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want;&lt;/P&gt;
&lt;P&gt;set have;&lt;/P&gt;
&lt;P&gt;length want_id_num $ 200&amp;nbsp; one_id $ 10;&lt;/P&gt;
&lt;P&gt;do k=1 to countw(have_id_num);&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;one_id = scan(have_id_num, k, ', ');&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;if index(have_id_num, strip(one_id))=0 then have_id_num = strip(have_id_num) || ', ' || one_id;&lt;/P&gt;
&lt;P&gt;end;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The values in the LENGTH statement are guesses ... you have to know a little about your data to provide the right lengths.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It's untested code, so might need a small amount of tweaking.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Nov 2018 22:40:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514936#M138896</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2018-11-20T22:40:35Z</dc:date>
    </item>
    <item>
      <title>Re: De-dup a list of values stored in a variable separated by commas</title>
      <link>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514945#M138901</link>
      <description>&lt;P&gt;same approach but i find catx is convenient&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
infile cards truncover;
input HAVE_ID_NUM $200.;
cards;
1033112065, 1629420120, 1033112065, 1629420120, 1033112065, 1033112011, 1629420120
;

data want;
set have;
length want $200;
do _n_=1 to countw(HAVE_ID_NUM,',');
temp=scan(HAVE_ID_NUM,_n_,',');
if index((want),strip(temp))=0 then want=catx(',',want,temp);
end;
drop temp;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 20 Nov 2018 23:13:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514945#M138901</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-11-20T23:13:39Z</dc:date>
    </item>
    <item>
      <title>Re: De-dup a list of values stored in a variable separated by commas</title>
      <link>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514949#M138902</link>
      <description>&lt;P&gt;Yes, I started with CATX, but found it difficult to include the space after the comma.&amp;nbsp; There might be a way, but it was easier to switch away from CATX.&amp;nbsp; (Of course, the space after the comma might not be a rigid requirement.)&lt;/P&gt;</description>
      <pubDate>Tue, 20 Nov 2018 23:34:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514949#M138902</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2018-11-20T23:34:21Z</dc:date>
    </item>
    <item>
      <title>Re: De-dup a list of values stored in a variable separated by commas</title>
      <link>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514950#M138903</link>
      <description>&lt;P&gt;different approach which is reliant on the original list being comma separated.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;%let have_id_num = 1033112065, 1629420120, 1033112065, 1629420120, 1033112065, 1033112011, 1629420120;&lt;/P&gt;&lt;P&gt;proc sql ;&lt;BR /&gt;create table a as&lt;BR /&gt;select&lt;BR /&gt;&amp;amp;have_id_num&lt;BR /&gt;from&lt;BR /&gt;sashelp.air (obs=1)&lt;BR /&gt;;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;proc transpose data=a out=b;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;proc sql noprint;&lt;BR /&gt;select distinct col1 format=20. into: WANT_ID_NUM separated by ', '&lt;BR /&gt;from b;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;%put WANT_ID_NUM=&amp;amp;WANT_ID_NUM;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Nov 2018 23:43:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514950#M138903</guid>
      <dc:creator>34reqrwe</dc:creator>
      <dc:date>2018-11-20T23:43:51Z</dc:date>
    </item>
    <item>
      <title>Re: De-dup a list of values stored in a variable separated by commas</title>
      <link>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514951#M138904</link>
      <description>&lt;P&gt;space after comma&lt;/P&gt;
&lt;P&gt;want=1033112065, 1629420120, 1033112011&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;want=catx(', ',want,temp);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
infile cards truncover;
input HAVE_ID_NUM $200.;
/*do _n_=1 to 10e6;*/
/*output;*/
/*end;*/
cards;
1033112065, 1629420120, 1033112065, 1629420120, 1033112065, 1033112011, 1629420120
;

data want;
set have;
length want $200;
do _n_=1 to countw(HAVE_ID_NUM,',');
temp=scan(HAVE_ID_NUM,_n_,',');
if index((want),strip(temp))=0 then want=catx(', ',want,temp);
end;
drop temp;
run;


data want_array_method;
set have;
array t(100)$10 _temporary_ ;
length want $200;
do _n_=1 to countw(HAVE_ID_NUM,',');
temp=strip(scan(HAVE_ID_NUM,_n_,','));
if strip(temp) not in t then do;
t(_n_)=temp;
end;
end;
want=catx(', ', of t(*));
drop temp;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Nov 2018 23:45:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514951#M138904</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-11-20T23:45:50Z</dc:date>
    </item>
    <item>
      <title>Re: De-dup a list of values stored in a variable separated by commas</title>
      <link>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514953#M138905</link>
      <description>&lt;P&gt;On 10 million records on my machine&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Log report&lt;/P&gt;
&lt;P&gt;1055 data have;&lt;BR /&gt;1056 infile cards truncover;&lt;BR /&gt;1057 input HAVE_ID_NUM $200.;&lt;BR /&gt;1058 do _n_=1 to 10e6;&lt;BR /&gt;1059 output;&lt;BR /&gt;1060 end;&lt;BR /&gt;1061 cards;&lt;/P&gt;
&lt;P&gt;NOTE: The data set WORK.HAVE has 10000000 observations and 1 variables.&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt; real time 4.53 seconds&lt;BR /&gt; cpu time 1.81 seconds&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;1063 ;&lt;BR /&gt;1064&lt;BR /&gt;1065 data want;&lt;BR /&gt;1066 set have;&lt;BR /&gt;1067 length want $200;&lt;BR /&gt;1068 do _n_=1 to countw(HAVE_ID_NUM,',');&lt;BR /&gt;1069 temp=scan(HAVE_ID_NUM,_n_,',');&lt;BR /&gt;1070 if index((want),strip(temp))=0 then want=catx(', ',want,temp);&lt;BR /&gt;1071 end;&lt;BR /&gt;1072 drop temp;&lt;BR /&gt;1073 run;&lt;/P&gt;
&lt;P&gt;NOTE: There were 10000000 observations read from the data set WORK.HAVE.&lt;BR /&gt;NOTE: The data set WORK.WANT has 10000000 observations and 2 variables.&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt; real time 1:07.06&lt;BR /&gt; cpu time 23.84 seconds&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;1074&lt;BR /&gt;1075&lt;BR /&gt;1076 data want_array_method;&lt;BR /&gt;1077 set have;&lt;BR /&gt;1078 array t(100)$10 _temporary_ ;&lt;BR /&gt;1079 length want $200;&lt;BR /&gt;1080 do _n_=1 to countw(HAVE_ID_NUM,',');&lt;BR /&gt;1081 temp=strip(scan(HAVE_ID_NUM,_n_,','));&lt;BR /&gt;1082 if strip(temp) not in t then do;&lt;BR /&gt;1083 t(_n_)=temp;&lt;BR /&gt;1084 end;&lt;BR /&gt;1085 end;&lt;BR /&gt;1086 want=catx(', ', of t(*));&lt;BR /&gt;1087 drop temp;&lt;BR /&gt;1088 run;&lt;/P&gt;
&lt;P&gt;NOTE: There were 10000000 observations read from the data set WORK.HAVE.&lt;BR /&gt;NOTE: The data set WORK.WANT_ARRAY_METHOD has 10000000 observations and 2 variables.&lt;BR /&gt;&lt;STRONG&gt;NOTE: DATA statement used (Total process time):&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt; real time 28.25 seconds&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt; cpu time 23.56 seconds&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Nov 2018 23:50:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514953#M138905</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-11-20T23:50:49Z</dc:date>
    </item>
    <item>
      <title>Re: De-dup a list of values stored in a variable separated by commas</title>
      <link>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514956#M138907</link>
      <description>Thank you both for the solutions. I really appreciate your time!</description>
      <pubDate>Wed, 21 Nov 2018 00:06:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/514956#M138907</guid>
      <dc:creator>buechler66</dc:creator>
      <dc:date>2018-11-21T00:06:44Z</dc:date>
    </item>
    <item>
      <title>Re: De-dup a list of values stored in a variable separated by commas</title>
      <link>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/515068#M138954</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4954"&gt;@Astounding&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;I think better use INDEXW() or FINDW()&amp;nbsp; for this kind of scenario .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;123456,123,&lt;/P&gt;</description>
      <pubDate>Wed, 21 Nov 2018 12:46:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/515068#M138954</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2018-11-21T12:46:18Z</dc:date>
    </item>
    <item>
      <title>Re: De-dup a list of values stored in a variable separated by commas</title>
      <link>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/515126#M138972</link>
      <description>&lt;P&gt;Agreed.&amp;nbsp; I saw all equal length values in the post, and didn't consider other possibilities.&lt;/P&gt;</description>
      <pubDate>Wed, 21 Nov 2018 15:13:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/De-dup-a-list-of-values-stored-in-a-variable-separated-by-commas/m-p/515126#M138972</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2018-11-21T15:13:06Z</dc:date>
    </item>
  </channel>
</rss>

