I am trying to get the given name and family name from the following string:
[{"name": [{"given": ["TESTG "], "family": "TESTF"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]
The following code works for the above string:
First_Name = compress(strip(scan(authors,5,':[]')),'"');
Last_Name = compress(strip(scan(authors,7,':[]}')),'"');
However, the characters are not consistent, so in some cases I have:
[{"name": [{"given": ["TESTG1 "], "given": ["TESTG2 "], "family": "TESTF1", "family": "TESTF2"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]
Is there any way I can extract the first and last names for the first person and then the first and last names for the second person?
Thanks
My take:
data HAVE;
A='[{"name": [{"given": ["TESTG1 "], "given": ["TESTG2 "], "family": "TESTF1", "family": "TESTF2"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]';
output;
A='[{"name": [{"given": ["TESTG "], "family": "TESTF"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]';
output;
run;
data WANT;
set HAVE;
B=compress(A,'"{}[] ');
POS1=1;
POS2=1;
do until(POS1=0);
POS1=find(B,'given:',POS1);
if POS1 then do;
FIRST=scan(substr(B,POS1),2,':,');
POS2 =find(B,'family:',POS2);
LAST =scan(substr(B,POS2),2,':,');
output;
POS1+1;
POS2+1;
end;
end;
keep FIRST LAST;
run;
1 | TESTG1 | TESTF1 |
2 | TESTG2 | TESTF2 |
3 | TESTG | TESTF |
I would key off of "given" and "family". See code, below. In the second type, there are two given names and two family names. I'm just giving you the first, but you could modify the code if you need both.
I went step by step. It's late in the day here, and I'm a bit tired. Brain not working well. You could nest the functions and make a much more compact program. No doubt @ChrisNZ or someone who knows regular expressions far better than I do will come along and do it all in 2 lines or less. 🙂
%LET Author1 = %BQUOTE([{"name": [{"given": ["TESTG "], "family": "TESTF"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]);
%LET Author2 = %BQUOTE([{"name": [{"given": ["TESTG1 "], "given": ["TESTG2 "], "family": "TESTF1", "family": "TESTF2"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]);
DATA Want;
DROP _:;
_Authors="&Author1";
_Temp = COMPRESS(_Authors,'[]{} :,');
_Start = INDEX(LOWCASE(STRIP(_Temp)), 'given');
_Start = _Start + 7;
_Temp2 = SUBSTR(_Temp, _Start);
_Length = INDEX(_Temp2, '"') - 1;
First_Name = SUBSTR(STRIP(_Temp), _Start, _Length);
_Start = INDEX(LOWCASE(STRIP(_Temp)), 'family');
_Start = _Start + 8;
_Temp2 = SUBSTR(_Temp, _Start);
_Length = INDEX(_Temp2, '"') - 1;
Last_Name = SUBSTR(STRIP(_Temp), _Start, _Length);
PUTLOG "NOTE: 1. " First_Name= Last_Name=;
OUTPUT;
_Authors="&Author2";
_Temp = COMPRESS(_Authors,'[]{} :,');
_Start = INDEX(LOWCASE(STRIP(_Temp)), 'given');
_Start = _Start + 7;
_Temp2 = SUBSTR(_Temp, _Start);
_Length = INDEX(_Temp2, '"') - 1;
First_Name = SUBSTR(STRIP(_Temp), _Start, _Length);
_Start = INDEX(LOWCASE(STRIP(_Temp)), 'family');
_Start = _Start + 8;
_Temp2 = SUBSTR(_Temp, _Start);
_Length = INDEX(_Temp2, '"') - 1;
Last_Name = SUBSTR(STRIP(_Temp), _Start, _Length);
PUTLOG "NOTE: 2. " First_Name= Last_Name=;
PUTLOG "NOTE- ";
OUTPUT;
RUN;
Results:
Jim
My take:
data HAVE;
A='[{"name": [{"given": ["TESTG1 "], "given": ["TESTG2 "], "family": "TESTF1", "family": "TESTF2"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]';
output;
A='[{"name": [{"given": ["TESTG "], "family": "TESTF"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]';
output;
run;
data WANT;
set HAVE;
B=compress(A,'"{}[] ');
POS1=1;
POS2=1;
do until(POS1=0);
POS1=find(B,'given:',POS1);
if POS1 then do;
FIRST=scan(substr(B,POS1),2,':,');
POS2 =find(B,'family:',POS2);
LAST =scan(substr(B,POS2),2,':,');
output;
POS1+1;
POS2+1;
end;
end;
keep FIRST LAST;
run;
1 | TESTG1 | TESTF1 |
2 | TESTG2 | TESTF2 |
3 | TESTG | TESTF |
Another way I remember reading about, that would be appropriate for parsing this string: Using the @'string' syntax of the input statement. It can probably be coded better.
data T;
infile cards truncover;
input @;
do until(LASTOBS);
set SASHELP.CLASS end=LASTOBS;
_INFILE_=catt('age:', AGE, ',name:', upcase(NAME));
input @1 @'name:' NM $ @;
output;
end;
keep NAME NM;
cards;
dummy
run;
The input statement is doing all the positioning and extracting work here.
Interesting technique using a "dummy" datalines (cards) to generate a test file. Kind of handy. I had not seen that before.
Also an interesting technique using the INPUT with @'literal'. I didn't even think of that although I have (rarely) seen it used before.
I noticed that you used a comma as a separator. I believe that if there were additional data beyond the name that you would pull in that data. For example:
I modified it to a space, and it worked as coded:
data T;
infile cards truncover;
input @;
do until(LASTOBS);
set SASHELP.CLASS end=LASTOBS;
LENGTH NM $32;
_INFILE_=catt('age:', AGE, ' name:', upcase(NAME), ' sex:', UPCASE(SEX));
PUTLOG "NOTE: " _INFILE_;
input @1 @'name:' NM $ @;
output;
end;
keep NAME NM;
LENGTH NM $32;
cards;
dummy
run;
Alternatively, one could specify a comma delimiter.
INFILE Cards DSD DLM=',';
Yes, sorry. Fussing over small details, I know, but I like to know how things work. I was curious about the comma and wondered if I understood the syntax of
input @1 @'name:' NM $ @;
so I just had to play with it. 🙂
Jim
data HAVE;
A='[{"name": [{"given": ["TESTG1 "], "given": ["TESTG2 "], "family": "TESTF1", "family": "TESTF2"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]';
output;
A='[{"name": [{"given": ["TESTG "], "family": "TESTF"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]';
output;
run;
data want;
set have;
id+1;
do i=1 to countw(a,'{},');
temp=scan(a,i,'{},');
vname=dequote(strip(scan(strip(temp),1,':[]')));
value=dequote(strip(scan(strip(temp),-1,':[]')));
if vname in ('given' 'family') then output;
end;
keep id vname value;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.