BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
monali
Obsidian | Level 7

I am trying to get the given name and family name from the following string:

 

[{"name": [{"given": ["TESTG "], "family": "TESTF"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]

 

The following code works for the above string: 

First_Name = compress(strip(scan(authors,5,':[]')),'"');
Last_Name = compress(strip(scan(authors,7,':[]}')),'"');

 

However, the characters are not consistent, so in some cases I have:

[{"name": [{"given": ["TESTG1 "], "given": ["TESTG2 "], "family": "TESTF1", "family": "TESTF2"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]

 

Is there any way I can extract the first and last names for the first person and then the first and last names for the second person? 

 

Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

My take:

data HAVE;
A='[{"name": [{"given": ["TESTG1 "], "given": ["TESTG2 "], "family": "TESTF1", "family": "TESTF2"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]';
output;
A='[{"name": [{"given": ["TESTG "], "family": "TESTF"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]';
output;
run;

data WANT;
  set HAVE;  
  B=compress(A,'"{}[] ');
  POS1=1;
  POS2=1;
  do until(POS1=0);
    POS1=find(B,'given:',POS1);
    if POS1 then do;
      FIRST=scan(substr(B,POS1),2,':,'); 
      POS2 =find(B,'family:',POS2);
      LAST =scan(substr(B,POS2),2,':,'); 
      output;
      POS1+1;
      POS2+1;
    end;
  end;
  keep FIRST LAST;
run;

 

 

View solution in original post

9 REPLIES 9
Reeza
Super User
Have you tried parsing your file with the XML or JSON libraries instead of manually?
monali
Obsidian | Level 7
Nope, this is already parsed and this is how I see it in my table.

jimbarbour
Meteorite | Level 14

I would key off of "given" and "family".  See code, below.  In the second type, there are two given names and two family names.  I'm just giving you the first, but you could modify the code if you need both.

 

I went step by step.  It's late in the day here, and I'm a bit tired.  Brain not working well.  You could nest the functions and make a much more compact program.  No doubt @ChrisNZ or someone who knows regular expressions far better than I do will come along and do it all in 2 lines or less.  🙂

%LET	Author1	=	%BQUOTE([{"name": [{"given": ["TESTG "], "family": "TESTF"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]);
%LET	Author2	=	%BQUOTE([{"name": [{"given": ["TESTG1 "], "given": ["TESTG2 "], "family": "TESTF1", "family": "TESTF2"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]);

DATA	Want;
	DROP	_:;

	_Authors="&Author1";
	_Temp		=	COMPRESS(_Authors,'[]{} :,');
	_Start		=	INDEX(LOWCASE(STRIP(_Temp)), 'given');
	_Start		=	_Start	+	7;
	_Temp2		=	SUBSTR(_Temp, _Start);
	_Length		=	INDEX(_Temp2, '"') - 1;
	First_Name	=	SUBSTR(STRIP(_Temp), _Start, _Length);
	_Start		=	INDEX(LOWCASE(STRIP(_Temp)), 'family');
	_Start		=	_Start	+	8;
	_Temp2		=	SUBSTR(_Temp, _Start);
	_Length		=	INDEX(_Temp2, '"') - 1;
	Last_Name	=	SUBSTR(STRIP(_Temp), _Start, _Length);
	PUTLOG	"NOTE:  1. "  First_Name=  Last_Name=;
	OUTPUT;

	_Authors="&Author2";
	_Temp		=	COMPRESS(_Authors,'[]{} :,');
	_Start		=	INDEX(LOWCASE(STRIP(_Temp)), 'given');
	_Start		=	_Start	+	7;
	_Temp2		=	SUBSTR(_Temp, _Start);
	_Length		=	INDEX(_Temp2, '"') - 1;
	First_Name	=	SUBSTR(STRIP(_Temp), _Start, _Length);
	_Start		=	INDEX(LOWCASE(STRIP(_Temp)), 'family');
	_Start		=	_Start	+	8;
	_Temp2		=	SUBSTR(_Temp, _Start);
	_Length		=	INDEX(_Temp2, '"') - 1;
	Last_Name	=	SUBSTR(STRIP(_Temp), _Start, _Length);
	PUTLOG	"NOTE:  2. "  First_Name=  Last_Name=;
	PUTLOG	"NOTE-  ";
	OUTPUT;
RUN;

Results:

jimbarbour_0-1628294574719.png

Jim

ChrisNZ
Tourmaline | Level 20

My take:

data HAVE;
A='[{"name": [{"given": ["TESTG1 "], "given": ["TESTG2 "], "family": "TESTF1", "family": "TESTF2"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]';
output;
A='[{"name": [{"given": ["TESTG "], "family": "TESTF"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]';
output;
run;

data WANT;
  set HAVE;  
  B=compress(A,'"{}[] ');
  POS1=1;
  POS2=1;
  do until(POS1=0);
    POS1=find(B,'given:',POS1);
    if POS1 then do;
      FIRST=scan(substr(B,POS1),2,':,'); 
      POS2 =find(B,'family:',POS2);
      LAST =scan(substr(B,POS2),2,':,'); 
      output;
      POS1+1;
      POS2+1;
    end;
  end;
  keep FIRST LAST;
run;

 

 

monali
Obsidian | Level 7
thank you! this worked perfectly!
ChrisNZ
Tourmaline | Level 20

Another way I remember reading about, that would be appropriate for parsing this string: Using the @'string' syntax of the input statement. It can probably be coded better.

data T;
   infile cards truncover;
   input @;
   do until(LASTOBS);
     set SASHELP.CLASS end=LASTOBS;
     _INFILE_=catt('age:', AGE, ',name:', upcase(NAME));
     input @1 @'name:' NM $ @;
     output;
   end;
   keep NAME NM;
cards;
dummy
run;

The input statement is doing all the positioning and extracting work here.

 

jimbarbour
Meteorite | Level 14

@ChrisNZ,

 

Interesting technique using a "dummy" datalines (cards) to generate a test file.  Kind of handy.  I had not seen that before.

Also an interesting technique using the INPUT with @'literal'.  I didn't even think of that although I have (rarely) seen it used before.

 

I noticed that you used a comma as a separator.  I believe that if there were additional data beyond the name that you would pull in that data.  For example:

 

jimbarbour_0-1628548115772.png

 

I modified it to a space, and it worked as coded:

data T;
   infile cards truncover;
   input @;
   do until(LASTOBS);
     set SASHELP.CLASS end=LASTOBS;
	 LENGTH	NM	$32;
     _INFILE_=catt('age:', AGE, ' name:', upcase(NAME), ' sex:', UPCASE(SEX));
	 PUTLOG	"NOTE:  "  _INFILE_;
     input @1 @'name:' NM $ @;
     output;
   end;
   keep NAME NM;
   LENGTH	NM	$32;
cards;
dummy
run;

Alternatively, one could specify a comma delimiter.

INFILE	Cards	DSD	DLM=',';

Yes, sorry.  Fussing over small details, I know, but I like to know how things work.  I was curious about the comma and wondered if I understood the syntax of

input @1 @'name:' NM $ @;

so I just had to play with it.  🙂

 

Jim

 

 

monali
Obsidian | Level 7
This solution works too!! thank you so much.
Ksharp
Super User
data HAVE;
A='[{"name": [{"given": ["TESTG1 "], "given": ["TESTG2 "], "family": "TESTF1", "family": "TESTF2"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]';
output;
A='[{"name": [{"given": ["TESTG "], "family": "TESTF"}], "identifier": [{"value": "TESTNO", "system": "TESTSTRING"}]}]';
output;
run;
data want;
 set have;
 id+1;
 do i=1 to countw(a,'{},');
  temp=scan(a,i,'{},');
  vname=dequote(strip(scan(strip(temp),1,':[]')));
  value=dequote(strip(scan(strip(temp),-1,':[]')));
  if vname in ('given' 'family') then output;
 end;
 keep id vname value;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2572 views
  • 10 likes
  • 5 in conversation