<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Text Processing for Cancer Types in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Text-Processing-for-Cancer-Types/m-p/796478#M255588</link>
    <description>Thank you so much, this is a lifesaver!</description>
    <pubDate>Wed, 16 Feb 2022 08:27:09 GMT</pubDate>
    <dc:creator>TC_</dc:creator>
    <dc:date>2022-02-16T08:27:09Z</dc:date>
    <item>
      <title>Text Processing for Cancer Types</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Text-Processing-for-Cancer-Types/m-p/796157#M255450</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I am stuck with some data manipulation.&lt;/P&gt;&lt;P&gt;This is a small portion of the input data&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;data test;
	length Site $20. Histology $20. Category $10.;
	input Site $ Histology $ Category $;
	cards;
	C000-C002 9835-9836 Leukemia
	C420-C421,C424 9811-9812,9837 Leukemia
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;And this is what I want it to look like&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="results.png" style="width: 197px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/68463iCBF7D4ECBCD7FEAD/image-size/large?v=v2&amp;amp;px=999" role="button" title="results.png" alt="results.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You may assume Site and Histology are both 4-digit after text splitting.&lt;/P&gt;&lt;P&gt;In case anyone is interested, the full table is &lt;A href="https://seer.cancer.gov/iccc/iccc-who2008.html" target="_self"&gt;here&lt;/A&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was able to split the text by comma with this &lt;A href="https://communities.sas.com/t5/General-SAS-Programming/parsing-a-character-string-into-new-variables/td-p/129189" target="_self"&gt;tutorial&lt;/A&gt; but I was not able to convert a range to actual numbers (i.e. C1-C3 to C1,C2,C3) and transpose them in the right way as shown.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for the help!&lt;/P&gt;</description>
      <pubDate>Mon, 14 Feb 2022 22:23:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Text-Processing-for-Cancer-Types/m-p/796157#M255450</guid>
      <dc:creator>TC_</dc:creator>
      <dc:date>2022-02-14T22:23:19Z</dc:date>
    </item>
    <item>
      <title>Re: Text Processing for Cancer Types</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Text-Processing-for-Cancer-Types/m-p/796271#M255500</link>
      <description>&lt;P&gt;I am unclear on what you are trying to do, are you trying to read a raw text file or create a text file?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data test;&lt;BR /&gt;length Site $20. Histology $20. Category $10.;&lt;BR /&gt;input Site $ Histology $ Category $;&lt;BR /&gt;cards;&lt;BR /&gt;C000 9835 Leukemia&lt;BR /&gt;C000 9836 Leukemia&lt;BR /&gt;C001 9835 Leukemia&lt;BR /&gt;C001 9836 Leukemia&lt;BR /&gt;C002 9835 Leukemia&lt;BR /&gt;C002 9836 Leukemia&lt;BR /&gt;C420 9811 Leukemia&lt;BR /&gt;C420 9812 Leukemia&lt;BR /&gt;C420 9837 Leukemia&lt;BR /&gt;C421 9811 Leukemia&lt;BR /&gt;C421 9812 Leukemia&lt;BR /&gt;C421 9837 Leukemia&lt;BR /&gt;C424 9811 Leukemia&lt;BR /&gt;C424 9812 Leukemia&lt;BR /&gt;C424 9837 Leukemia&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Feb 2022 14:00:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Text-Processing-for-Cancer-Types/m-p/796271#M255500</guid>
      <dc:creator>CarmineVerrell</dc:creator>
      <dc:date>2022-02-15T14:00:25Z</dc:date>
    </item>
    <item>
      <title>Re: Text Processing for Cancer Types</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Text-Processing-for-Cancer-Types/m-p/796346#M255531</link>
      <description>Sorry for the confusion.&lt;BR /&gt;The SAS code is what the input data looks like. For simplicity, I only showed a small portion of the input data (I also included the full table with a link if you are interested).&lt;BR /&gt;The screenshot is what I want the output to look like by processing the input data. I can tell it involves some text splitting with delimiter and transpose and maybe something more.</description>
      <pubDate>Tue, 15 Feb 2022 17:13:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Text-Processing-for-Cancer-Types/m-p/796346#M255531</guid>
      <dc:creator>TC_</dc:creator>
      <dc:date>2022-02-15T17:13:01Z</dc:date>
    </item>
    <item>
      <title>Re: Text Processing for Cancer Types</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Text-Processing-for-Cancer-Types/m-p/796433#M255571</link>
      <description>&lt;P&gt;Here's one approach. This creates all permutations of Site and Histogram. Site is set to 3 digits so the code would be need to be adjusted if a 4 digit code was required.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;data want ;
    length Site Histology $20 ;
    set test (rename=(Site=oldSite Histology=oldHistology)) ;

    siteCommaCount=count(oldSite,',') ;
    histCommaCount=count(oldHistology,',') ;

    do i=1 to siteCommaCount+1 ; * loop for each Site group that is split by a comma (or once if only one group);

        _Site=scan(oldSite,i,',') ;

        SiteStart=input(compress(scan(_Site,1,'-'), ,'kd'),3.) ; *First Site number (as a number) ;
        SiteEnd=input(compress(scan(_Site,2,'-'), ,'kd'),3.) ; * Last Site number ;
        if SiteEnd=. then SiteEnd=SiteStart ;  *If no range, just default SiteStart=SiteEnd ;

        do s=SiteStart to SiteEnd by 1;  *Loop for each Site range (or once if no range) ;
            Site=cat("C",put(s,z3.)) ;

            do j=1 to histCommaCount+1 ;  *Repeat as above but for Histology ;

                _Histology=scan(oldHistology,j,',') ;
    
                HistStart=input(scan(_Histology,1,'-'),4.) ;
                HistEnd=input(scan(_Histology,2,'-'),4.) ;
                if HistEnd=. then HistEnd=HistStart ;

                do h=HistStart to HistEnd by 1;
                    Histology=put(h,z4.) ;
                    output ; *Ouptut a record for each permutation ;
                end ;  *HistEnd by 1;

            end ; *to histCommaCount+1 ;
        end ; * to SiteEnd by 1;
    end ; *to siteCommaCount+1 ;

    keep Site Histology Category ;
run ; &lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 16 Feb 2022 02:30:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Text-Processing-for-Cancer-Types/m-p/796433#M255571</guid>
      <dc:creator>seemiyah</dc:creator>
      <dc:date>2022-02-16T02:30:31Z</dc:date>
    </item>
    <item>
      <title>Re: Text Processing for Cancer Types</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Text-Processing-for-Cancer-Types/m-p/796478#M255588</link>
      <description>Thank you so much, this is a lifesaver!</description>
      <pubDate>Wed, 16 Feb 2022 08:27:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Text-Processing-for-Cancer-Types/m-p/796478#M255588</guid>
      <dc:creator>TC_</dc:creator>
      <dc:date>2022-02-16T08:27:09Z</dc:date>
    </item>
  </channel>
</rss>

