Hi SAS forum,
I have strings are like this:
Heart failure, unspecified (I50.9) Osteomyelitis, unspecified (M86.9) Cardiogenic shock (R57.0)
I want to divide the strings into different variables according to the parentheses.
I saw a similar question: https://communities.sas.com/t5/SAS-Data-Management/splitting-a-variable-into-several-depending-on-pa...
and modified the code as follow:
data have;
infile cards dsd;
length name $100.;
input name $;
cards;
"Heart failure, unspecified (I50.9) Osteomyelitis, unspecified (M86.9) Cardiogenic shock (R57.0)"
run;
data want;
set have;
DIAGNOSIS1 = scan(name,1,'(,)');
DIAGNOSISCODE1 = scan(name,2,'(,)');
DIAGNOSIS2 = scan(name,3,'(,)');
DIAGNOSISCODE2 = scan(name,4,'(,)');
DIAGNOSIS3 = scan(name,5,'(,)');
DIAGNOSISCODE3 = scan(name,6,'(,)');
run;
But It seems I should remove the comma in the parentheses to get the desired results, like this:
data want;
set have;
DIAGNOSIS1 = scan(name,1,'()');
DIAGNOSISCODE1 = scan(name,2,'()');
DIAGNOSIS2 = scan(name,3,'()');
DIAGNOSISCODE2 = scan(name,4,'()');
DIAGNOSIS3 = scan(name,5,'()');
DIAGNOSISCODE3 = scan(name,6,'()');
run;
So I have 2 questions:
1. What's the difference between these 2 codes:
DIAGNOSIS1 = scan(name,1,'(,)');
DIAGNOSIS1 = scan(name,1,'()');
(with/without the comma inside the parentheses?)
2. Some of my data contains Chinese characters and fullwidth comma (,) and they are not in the parentheses. Would it be okay to deal with the data with the same code?
Thanks in advance!
Use nested SCAN functions:
data want;
set have;
length extract $10;
do i = 2 to countw(name,"(");
extract = scan(scan(name,i,"("),1,")");
output;
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.