Hello,
I am stuck with some data manipulation.
This is a small portion of the input data
data test;
length Site $20. Histology $20. Category $10.;
input Site $ Histology $ Category $;
cards;
C000-C002 9835-9836 Leukemia
C420-C421,C424 9811-9812,9837 Leukemia
run;
And this is what I want it to look like
You may assume Site and Histology are both 4-digit after text splitting.
In case anyone is interested, the full table is here
I was able to split the text by comma with this tutorial but I was not able to convert a range to actual numbers (i.e. C1-C3 to C1,C2,C3) and transpose them in the right way as shown.
Thank you for the help!
Here's one approach. This creates all permutations of Site and Histogram. Site is set to 3 digits so the code would be need to be adjusted if a 4 digit code was required.
data want ;
length Site Histology $20 ;
set test (rename=(Site=oldSite Histology=oldHistology)) ;
siteCommaCount=count(oldSite,',') ;
histCommaCount=count(oldHistology,',') ;
do i=1 to siteCommaCount+1 ; * loop for each Site group that is split by a comma (or once if only one group);
_Site=scan(oldSite,i,',') ;
SiteStart=input(compress(scan(_Site,1,'-'), ,'kd'),3.) ; *First Site number (as a number) ;
SiteEnd=input(compress(scan(_Site,2,'-'), ,'kd'),3.) ; * Last Site number ;
if SiteEnd=. then SiteEnd=SiteStart ; *If no range, just default SiteStart=SiteEnd ;
do s=SiteStart to SiteEnd by 1; *Loop for each Site range (or once if no range) ;
Site=cat("C",put(s,z3.)) ;
do j=1 to histCommaCount+1 ; *Repeat as above but for Histology ;
_Histology=scan(oldHistology,j,',') ;
HistStart=input(scan(_Histology,1,'-'),4.) ;
HistEnd=input(scan(_Histology,2,'-'),4.) ;
if HistEnd=. then HistEnd=HistStart ;
do h=HistStart to HistEnd by 1;
Histology=put(h,z4.) ;
output ; *Ouptut a record for each permutation ;
end ; *HistEnd by 1;
end ; *to histCommaCount+1 ;
end ; * to SiteEnd by 1;
end ; *to siteCommaCount+1 ;
keep Site Histology Category ;
run ;
I am unclear on what you are trying to do, are you trying to read a raw text file or create a text file?
data test;
length Site $20. Histology $20. Category $10.;
input Site $ Histology $ Category $;
cards;
C000 9835 Leukemia
C000 9836 Leukemia
C001 9835 Leukemia
C001 9836 Leukemia
C002 9835 Leukemia
C002 9836 Leukemia
C420 9811 Leukemia
C420 9812 Leukemia
C420 9837 Leukemia
C421 9811 Leukemia
C421 9812 Leukemia
C421 9837 Leukemia
C424 9811 Leukemia
C424 9812 Leukemia
C424 9837 Leukemia
;
run;
Here's one approach. This creates all permutations of Site and Histogram. Site is set to 3 digits so the code would be need to be adjusted if a 4 digit code was required.
data want ;
length Site Histology $20 ;
set test (rename=(Site=oldSite Histology=oldHistology)) ;
siteCommaCount=count(oldSite,',') ;
histCommaCount=count(oldHistology,',') ;
do i=1 to siteCommaCount+1 ; * loop for each Site group that is split by a comma (or once if only one group);
_Site=scan(oldSite,i,',') ;
SiteStart=input(compress(scan(_Site,1,'-'), ,'kd'),3.) ; *First Site number (as a number) ;
SiteEnd=input(compress(scan(_Site,2,'-'), ,'kd'),3.) ; * Last Site number ;
if SiteEnd=. then SiteEnd=SiteStart ; *If no range, just default SiteStart=SiteEnd ;
do s=SiteStart to SiteEnd by 1; *Loop for each Site range (or once if no range) ;
Site=cat("C",put(s,z3.)) ;
do j=1 to histCommaCount+1 ; *Repeat as above but for Histology ;
_Histology=scan(oldHistology,j,',') ;
HistStart=input(scan(_Histology,1,'-'),4.) ;
HistEnd=input(scan(_Histology,2,'-'),4.) ;
if HistEnd=. then HistEnd=HistStart ;
do h=HistStart to HistEnd by 1;
Histology=put(h,z4.) ;
output ; *Ouptut a record for each permutation ;
end ; *HistEnd by 1;
end ; *to histCommaCount+1 ;
end ; * to SiteEnd by 1;
end ; *to siteCommaCount+1 ;
keep Site Histology Category ;
run ;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.