<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Identifying data with character variable in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Identifying-data-with-character-variable/m-p/418197#M278604</link>
    <description>I separated the datasets and entered in combinations. I think there were just too many variations to account for. But thank you for all the suggestions. They were very helpful!</description>
    <pubDate>Mon, 04 Dec 2017 15:53:04 GMT</pubDate>
    <dc:creator>priya1286</dc:creator>
    <dc:date>2017-12-04T15:53:04Z</dc:date>
    <item>
      <title>Identifying data with character variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Identifying-data-with-character-variable/m-p/400237#M278600</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am working with the following data:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Obs&amp;nbsp; &amp;nbsp; line1_regimen&lt;/P&gt;&lt;P&gt;0013&amp;nbsp; adali certoli hcq&lt;/P&gt;&lt;P&gt;001a&amp;nbsp; certoli hcq etaner tofa goli&lt;/P&gt;&lt;P&gt;0034&amp;nbsp; adali inflix lef mtx&lt;/P&gt;&lt;P&gt;0054&amp;nbsp; adali hcq inflix mtx&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Obs is the unique identifier, and 'line1_regimen' is the variable name.&amp;nbsp;There are about 170000 observations in my data with different combinations of 'adali', 'certoli' etc. Max number of drugs in the variable is 6. I want to identify each of the combinations, however, if the variable line1_regimen has more than 2 drug names then I want to separate them into another dataset.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Currently, I am typing in these combinations individually into the if-then command. Please let me know if there is an efficient way to do this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers,&lt;/P&gt;&lt;P&gt;Priya&lt;/P&gt;</description>
      <pubDate>Mon, 02 Oct 2017 12:48:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Identifying-data-with-character-variable/m-p/400237#M278600</guid>
      <dc:creator>priya1286</dc:creator>
      <dc:date>2017-10-02T12:48:10Z</dc:date>
    </item>
    <item>
      <title>Re: Identifying data with character variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Identifying-data-with-character-variable/m-p/400241#M278601</link>
      <description>&lt;P&gt;Can you show what are the 2 output data sets you need based on the data you have shown here?&lt;/P&gt;</description>
      <pubDate>Mon, 02 Oct 2017 13:12:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Identifying-data-with-character-variable/m-p/400241#M278601</guid>
      <dc:creator>KachiM</dc:creator>
      <dc:date>2017-10-02T13:12:36Z</dc:date>
    </item>
    <item>
      <title>Re: Identifying data with character variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Identifying-data-with-character-variable/m-p/400257#M278602</link>
      <description>&lt;P&gt;Show your required output, we can't guess.&amp;nbsp; Also please post test data in the form of a datastep so we have something to run with.&amp;nbsp; At a guess from what you posted:&lt;/P&gt;
&lt;PRE&gt;data want (keep=regimen);
  set have;
  length regimen $2000;
  do i=1 to countw(line1_regimen," ");
    regimen=scan(line1_regimen,i," ");
    output;
  end;
run;
proc sort data=want nodupkey;
  by regimen;
run;&lt;/PRE&gt;</description>
      <pubDate>Mon, 02 Oct 2017 13:52:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Identifying-data-with-character-variable/m-p/400257#M278602</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2017-10-02T13:52:57Z</dc:date>
    </item>
    <item>
      <title>Re: Identifying data with character variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Identifying-data-with-character-variable/m-p/400258#M278603</link>
      <description>&lt;P&gt;Separating the data sets is easy:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data two many;&lt;/P&gt;
&lt;P&gt;set have;&lt;/P&gt;
&lt;P&gt;if countw(line1_regimen) &amp;gt; 2 then output many;&lt;/P&gt;
&lt;P&gt;else output two;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To find combinations,&amp;nbsp;important questions needs to be answered to determine a good way to proceed.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How many different drug names might appear in the entire data set?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are all the drug names just a single word or could a single drug name be two words?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do you need to identify all the patients who took "adali", all who took "certoli", etc.?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What other variables are part of the incoming data?&lt;/P&gt;</description>
      <pubDate>Mon, 02 Oct 2017 13:56:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Identifying-data-with-character-variable/m-p/400258#M278603</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2017-10-02T13:56:05Z</dc:date>
    </item>
    <item>
      <title>Re: Identifying data with character variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Identifying-data-with-character-variable/m-p/418197#M278604</link>
      <description>I separated the datasets and entered in combinations. I think there were just too many variations to account for. But thank you for all the suggestions. They were very helpful!</description>
      <pubDate>Mon, 04 Dec 2017 15:53:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Identifying-data-with-character-variable/m-p/418197#M278604</guid>
      <dc:creator>priya1286</dc:creator>
      <dc:date>2017-12-04T15:53:04Z</dc:date>
    </item>
    <item>
      <title>Re: Identifying data with character variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Identifying-data-with-character-variable/m-p/418410#M278605</link>
      <description>&lt;P&gt;How many drugs have you got? If not too many, you can use a binary variable to mark each combination, e.g.&lt;/P&gt;&lt;PRE&gt;data have;
infile cards truncover;
input id $4. line1_regimen $char80.;
cards;
0013 adali certoli hcq
001a certoli hcq etaner tofa goli
0034 adali inflix lef mtx
0054 adali hcq inflix mtx
;run;

data drugs;
  set have;
  length drug $10;
  do _N_=1 to countw(line1_regimen,' ');
    drug=scan(line1_regimen,_N_,' ');
    output;
    end;
  keep drug;
run;

proc sql noprint; /* put quoted drug names in macro variable */
  select distinct quote(strip(drug),'''') into :drugs separated by ' ' from drugs;
  %let noof_drugs=&amp;amp;sqlobs;  &amp;nbsp; /* no. of drugs */
quit;



data simple complex;
  set have;
  length comb $&amp;amp;noof_drugs;
  comb=repeat('0',&amp;amp;noof_drugs-1);
  array drugs(&amp;amp;noof_drugs) $10 _temporary_ (&amp;amp;drugs);
  do _N_=1 to countw(line1_regimen,' ');
    substr(comb,whichc(scan(line1_regimen,_N_,' '),of drugs(*)),1)='1';
    end;
  if length(compress(comb,'0'))&amp;gt;2 then output complex;
  else output simple;
run;

&lt;/PRE&gt;&lt;P&gt;COMB will then mark which unique drugs are present in each regimen. ('1' in position 1 means that drug no. 1 is present, '0' means that it is not, etc.)&lt;/P&gt;</description>
      <pubDate>Tue, 05 Dec 2017 09:03:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Identifying-data-with-character-variable/m-p/418410#M278605</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2017-12-05T09:03:20Z</dc:date>
    </item>
  </channel>
</rss>

