BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TC_
Obsidian | Level 7 TC_
Obsidian | Level 7

Hello,

I am stuck with some data manipulation.

This is a small portion of the input data

 

data test;
	length Site $20. Histology $20. Category $10.;
	input Site $ Histology $ Category $;
	cards;
	C000-C002 9835-9836 Leukemia
	C420-C421,C424 9811-9812,9837 Leukemia
run;

And this is what I want it to look like

 

results.png

 

You may assume Site and Histology are both 4-digit after text splitting.

In case anyone is interested, the full table is here 

 

I was able to split the text by comma with this tutorial but I was not able to convert a range to actual numbers (i.e. C1-C3 to C1,C2,C3) and transpose them in the right way as shown.

 

Thank you for the help!

1 ACCEPTED SOLUTION

Accepted Solutions
seemiyah
Fluorite | Level 6

Here's one approach. This creates all permutations of Site and Histogram. Site is set to 3 digits so the code would be need to be adjusted if a 4 digit code was required. 

 

data want ;
    length Site Histology $20 ;
    set test (rename=(Site=oldSite Histology=oldHistology)) ;

    siteCommaCount=count(oldSite,',') ;
    histCommaCount=count(oldHistology,',') ;

    do i=1 to siteCommaCount+1 ; * loop for each Site group that is split by a comma (or once if only one group);

        _Site=scan(oldSite,i,',') ;

        SiteStart=input(compress(scan(_Site,1,'-'), ,'kd'),3.) ; *First Site number (as a number) ;
        SiteEnd=input(compress(scan(_Site,2,'-'), ,'kd'),3.) ; * Last Site number ;
        if SiteEnd=. then SiteEnd=SiteStart ;  *If no range, just default SiteStart=SiteEnd ;

        do s=SiteStart to SiteEnd by 1;  *Loop for each Site range (or once if no range) ;
            Site=cat("C",put(s,z3.)) ;

            do j=1 to histCommaCount+1 ;  *Repeat as above but for Histology ;

                _Histology=scan(oldHistology,j,',') ;
    
                HistStart=input(scan(_Histology,1,'-'),4.) ;
                HistEnd=input(scan(_Histology,2,'-'),4.) ;
                if HistEnd=. then HistEnd=HistStart ;

                do h=HistStart to HistEnd by 1;
                    Histology=put(h,z4.) ;
                    output ; *Ouptut a record for each permutation ;
                end ;  *HistEnd by 1;

            end ; *to histCommaCount+1 ;
        end ; * to SiteEnd by 1;
    end ; *to siteCommaCount+1 ;

    keep Site Histology Category ;
run ; 

View solution in original post

4 REPLIES 4
CarmineVerrell
SAS Employee

I am unclear on what you are trying to do, are you trying to read a raw text file or create a text file?

 

 

data test;
length Site $20. Histology $20. Category $10.;
input Site $ Histology $ Category $;
cards;
C000 9835 Leukemia
C000 9836 Leukemia
C001 9835 Leukemia
C001 9836 Leukemia
C002 9835 Leukemia
C002 9836 Leukemia
C420 9811 Leukemia
C420 9812 Leukemia
C420 9837 Leukemia
C421 9811 Leukemia
C421 9812 Leukemia
C421 9837 Leukemia
C424 9811 Leukemia
C424 9812 Leukemia
C424 9837 Leukemia
;
run;

TC_
Obsidian | Level 7 TC_
Obsidian | Level 7
Sorry for the confusion.
The SAS code is what the input data looks like. For simplicity, I only showed a small portion of the input data (I also included the full table with a link if you are interested).
The screenshot is what I want the output to look like by processing the input data. I can tell it involves some text splitting with delimiter and transpose and maybe something more.
seemiyah
Fluorite | Level 6

Here's one approach. This creates all permutations of Site and Histogram. Site is set to 3 digits so the code would be need to be adjusted if a 4 digit code was required. 

 

data want ;
    length Site Histology $20 ;
    set test (rename=(Site=oldSite Histology=oldHistology)) ;

    siteCommaCount=count(oldSite,',') ;
    histCommaCount=count(oldHistology,',') ;

    do i=1 to siteCommaCount+1 ; * loop for each Site group that is split by a comma (or once if only one group);

        _Site=scan(oldSite,i,',') ;

        SiteStart=input(compress(scan(_Site,1,'-'), ,'kd'),3.) ; *First Site number (as a number) ;
        SiteEnd=input(compress(scan(_Site,2,'-'), ,'kd'),3.) ; * Last Site number ;
        if SiteEnd=. then SiteEnd=SiteStart ;  *If no range, just default SiteStart=SiteEnd ;

        do s=SiteStart to SiteEnd by 1;  *Loop for each Site range (or once if no range) ;
            Site=cat("C",put(s,z3.)) ;

            do j=1 to histCommaCount+1 ;  *Repeat as above but for Histology ;

                _Histology=scan(oldHistology,j,',') ;
    
                HistStart=input(scan(_Histology,1,'-'),4.) ;
                HistEnd=input(scan(_Histology,2,'-'),4.) ;
                if HistEnd=. then HistEnd=HistStart ;

                do h=HistStart to HistEnd by 1;
                    Histology=put(h,z4.) ;
                    output ; *Ouptut a record for each permutation ;
                end ;  *HistEnd by 1;

            end ; *to histCommaCount+1 ;
        end ; * to SiteEnd by 1;
    end ; *to siteCommaCount+1 ;

    keep Site Histology Category ;
run ; 
TC_
Obsidian | Level 7 TC_
Obsidian | Level 7
Thank you so much, this is a lifesaver!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 722 views
  • 2 likes
  • 3 in conversation