<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Using a multiple arrays to classify text in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Using-a-multiple-arrays-to-classify-text/m-p/568947#M160256</link>
    <description>&lt;P&gt;If I understand your situation you might consider reading some of those Excel columns into a data set. If you already have the 1 indicator then you should be able to join the values of the text to get the indicator (or missing).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You may also be better off to transpose the data so you have a single diagnosis variable plus the other identfications. Then you don't need to work with an array and&lt;/P&gt;
&lt;P&gt;Your issue with&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;The problem is that when I run it, some entries are populating more than one category. I've double checked the raw data to make sure entries are not in fact categorized multiple times. Please help!&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You likely need to provide some actual example data as there is not enough information provided to tell why some of your code might behave that way.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Instructions here: &lt;A href="https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-data-AKA-generate/ta-p/258712" target="_blank"&gt;https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-data-AKA-generate/ta-p/258712&lt;/A&gt; will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Remove any variables that are not directly related to this issue. Make sure to include some that are getting the "wrong" result.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;BTW it is not best practice to define multiple arrays for the exact same elements. You can confuse yourself and code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 25 Jun 2019 23:21:31 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2019-06-25T23:21:31Z</dc:date>
    <item>
      <title>Using a multiple arrays to classify text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-a-multiple-arrays-to-classify-text/m-p/568920#M160250</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to take free text entries and categorize them into disease categories using SAS 9.4.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My code looks like this:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data one;&lt;/P&gt;&lt;P&gt;set zero;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;array dx(6) il2a1 il2a2 il2a3 il2a4 il2a5 il2a6;&lt;BR /&gt;neo_unspec=0;&lt;BR /&gt;do i= 1 to 6;&lt;BR /&gt;if upcase (dx(i)) in ("(R) OCCIPITAL BRAIN TUMOUR"&lt;BR /&gt;"155.2 - LIVER MASS"&lt;BR /&gt;"1570 PANCREATIC MASS"&lt;BR /&gt;"1709 - TUMOR LEFT HIP"&lt;BR /&gt;"1916 - CEREBELLAR MASS"&lt;BR /&gt;"1918 BRAIN TUMOUR"&lt;BR /&gt;"1919 BRAIN TUMOR")&lt;/P&gt;&lt;P&gt;then neo_unspec=1;&lt;BR /&gt;end;&lt;BR /&gt;dropi;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;array dx1(6) il2a1 il2a2 il2a3 il2a4 il2a5 il2a6;&lt;BR /&gt;neo_malig=0;&lt;BR /&gt;do i= 1 to 6;&lt;BR /&gt;if upcase (dx1(i)) in ("CANCER OF PROSTATE (2002) WITH METS TO BONE AND LUNG"&lt;BR /&gt;"CANCER OF PROSTATE (EVIDENCE OF BONE METASTATIC DISEASE)"&lt;BR /&gt;"CANCER OF PROSTATE - 2005"&lt;BR /&gt;"CANCER OF PROSTATE - TURP JUNE 4/13"&lt;BR /&gt;"CANCER OF PROSTATE WIITH METS TO BONE"&lt;BR /&gt;"CANCER OF PROSTATE WITH BONE METASTASES, METS TO PELVIS AND LOWER ABDOMEN"&lt;BR /&gt;"CANCER OF PROSTATE WITH BONE METS"&lt;BR /&gt;"CANCER OF PROSTATE WITH METASTASES TO BONE"&lt;BR /&gt;"CANCER OF PROSTATE WITH METASTASES."&lt;BR /&gt;"CANCER OF PROSTATE WITH METS TO BACK AND LYMPH NODES.")&lt;/P&gt;&lt;P&gt;then neo_malig=1;&lt;BR /&gt;end;&lt;BR /&gt;dropi;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;array dx4(6) il2a1 il2a2 il2a3 il2a4 il2a5 il2a6;&lt;BR /&gt;circulatory=0;&lt;BR /&gt;do i= 1 to 6;&lt;BR /&gt;if upcase (dx4(i)) in ("HEART BLOCK-PACEMAKER"&lt;BR /&gt;"HEART BLOCK/PACEMAKER"&lt;BR /&gt;"HEART BLOCK/TINNITUS"&lt;BR /&gt;"HEART BLOCKAGES"&lt;BR /&gt;"HEART BURN"&lt;BR /&gt;"HEART CABG"&lt;BR /&gt;"HEART CALCIFICATION"&lt;BR /&gt;"HEART CATHETERIZATION"&lt;BR /&gt;"HEART CONDITION")&lt;/P&gt;&lt;P&gt;then circulatory=1;&lt;BR /&gt;end;&lt;BR /&gt;dropi;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This actually goes on with many more diagnosis categories and there are 100s of thousands of unique text entries, but above is just an example.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;il2a1-il2a6 are the free-text diagnosis variables.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We started with an excel spreadsheet of unique text entries, created columns for each new diagnosis category of interest and then manually went through each entry and placed a 1 in the appropriate category. Then, we sorted the excel data and added quotation marks to the free text entries to copy/paste into SAS.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The problem is that when I run it, some entries are populating more than one category. I've double checked the raw data to make sure entries are not in fact categorized multiple times. Please help!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 25 Jun 2019 20:31:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-a-multiple-arrays-to-classify-text/m-p/568920#M160250</guid>
      <dc:creator>tstevens</dc:creator>
      <dc:date>2019-06-25T20:31:17Z</dc:date>
    </item>
    <item>
      <title>Re: Using a multiple arrays to classify text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-a-multiple-arrays-to-classify-text/m-p/568947#M160256</link>
      <description>&lt;P&gt;If I understand your situation you might consider reading some of those Excel columns into a data set. If you already have the 1 indicator then you should be able to join the values of the text to get the indicator (or missing).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You may also be better off to transpose the data so you have a single diagnosis variable plus the other identfications. Then you don't need to work with an array and&lt;/P&gt;
&lt;P&gt;Your issue with&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;The problem is that when I run it, some entries are populating more than one category. I've double checked the raw data to make sure entries are not in fact categorized multiple times. Please help!&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You likely need to provide some actual example data as there is not enough information provided to tell why some of your code might behave that way.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Instructions here: &lt;A href="https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-data-AKA-generate/ta-p/258712" target="_blank"&gt;https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-data-AKA-generate/ta-p/258712&lt;/A&gt; will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Remove any variables that are not directly related to this issue. Make sure to include some that are getting the "wrong" result.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;BTW it is not best practice to define multiple arrays for the exact same elements. You can confuse yourself and code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 25 Jun 2019 23:21:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-a-multiple-arrays-to-classify-text/m-p/568947#M160256</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2019-06-25T23:21:31Z</dc:date>
    </item>
  </channel>
</rss>

