If you clean up the formatting you can just use normal SAS functions like SCAN() and SUBSTR() to pull out the pieces.
So let's make some example data with an ID variable.
data have;
row+1;
infile cards truncover;
input family $200.;
cards4;
[ Wife: XXX XXXX (architect, m. 6-Oct-1924, d. 13-Jan-1949, 2 children) , Daughter: AAA (b. 1925) , Son: BBBB (b. 1928) , Wife: YYY YYYY (architect, m. 4-Oct-1952) ]
[ Wife: XXX1 XXXX1 (div., one son) , Son: AAA1, Wife: YYY1 YYYY1 (actress, m. 1986, sep. 2008, one daughter) , Daughter: BBBB1 ]
[ Father: XXX2 XXXX2 XXXXX2 , Wife: YYYY2 YYYY2(gymnast, one daughter) , Daughter: AAA2 ]
[ Father: XXXX3 XXXXX3 ("XX3", d. 1922) , Mother: ZZZ ZZZZZ (d. 1923) , Wife: YYYY3 YYY3 (m. 28-Jun-1926, three daughters) , Daughter: AAA , Daughter: BBB , Daughter: CCCCC]
[ Wife: YYYY4 (one daughter, one son) , Daughter: AAAA3 (ballerina, b. 1962) , Son: BBB2 (b. 1968) ]
[ Father: XXXX5 (lawyer/administrator) , Wife: (d. 1970, two daughters) ]
;;;;
Now we can clean it up and parse the strings into RELATIONSHIP, NAME and DETAILS.
data want;
set have;
copy=left(substr(family,2,length(family)-2));
copy=tranwrd(copy,'"','""');
copy=tranwrd(copy,'(',' "');
copy=tranwrd(copy,')','" ');
copy=compbl(copy);
do col=1 to countw(copy,',','q');
length relationship $10 name $20 details $100 ;
details=scan(copy,col,',','q');
relationship=scan(details,1,':');
name=left(scan(substr(details,length(relationship)+2),1,'"'));
if indexc(details,'"') then details = dequote(scan(details,-1,' ','q'));
else details=' ';
output;
end;
run;
So let's see what we get:
proc print;
by row family;
var col relationship name details ;
run;
Results
row=1 family=[ Wife: XXX XXXX (architect, m. 6-Oct-1924, d. 13-Jan-1949, 2 children) , Daughter: AAA (b. 1925) , Son: BBBB (b. 1928) Obs col relationship name details 1 1 Wife XXX XXXX architect, m. 6-Oct-1924, d. 13-Jan-1949, 2 children 2 2 Daughter AAA b. 1925 3 3 Son BBBB b. 1928 4 4 Wife YYY YYYY architect, m. 4-Oct-1952 row=2 Obs col relationship name details 5 1 Wife XXX1 XXXX1 div., one son 6 2 Son AAA1 7 3 Wife YYY1 YYYY1 actress, m. 1986, sep. 2008, one daughter 8 4 Daughter BBBB1 row=3 family=[ Father: XXX2 XXXX2 XXXXX2 , Wife: YYYY2 YYYY2(gymnast, one daughter) , Daughter: AAA2 ] Obs col relationship name details 9 1 Father XXX2 XXXX2 XXXXX2 10 2 Wife YYYY2 YYYY2 gymnast, one daughter 11 3 Daughter AAA2 row=4 Obs col relationship name details 12 1 Father XXXX3 XXXXX3 "XX3", d. 1922 13 2 Mother ZZZ ZZZZZ d. 1923 14 3 Wife YYYY3 YYY3 m. 28-Jun-1926, three daughters 15 4 Daughter AAA 16 5 Daughter BBB 17 6 Daughter CCCCC row=5 family=[ Wife: YYYY4 (one daughter, one son) , Daughter: AAAA3 (ballerina, b. 1962) , Son: BBB2 (b. 1968) ] Obs col relationship name details 18 1 Wife YYYY4 one daughter, one son 19 2 Daughter AAAA3 ballerina, b. 1962 20 3 Son BBB2 b. 1968 row=6 family=[ Father: XXXX5 (lawyer/administrator) , Wife: (d. 1970, two daughters) ] Obs col relationship name details 21 1 Father XXXX5 lawyer/administrator 22 2 Wife d. 1970, two daughters
Tom,
OP have already posted the real data here.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.