DATA Step, Macro, Functions and more

Pulling number from complex string and creating new numeric field containing stripped out number

Reply
Super Contributor
Posts: 259

Pulling number from complex string and creating new numeric field containing stripped out number

[ Edited ]

 

I have a dataset with 100000 rows and each row contains an extremely long string with lots of data. I'm want to create a new field with the "total payout value". This is the first number after the string "total_payout_value". The problem is the length of the string between the "total_payout_value" and the number can vary and also the number length can vary. Here is a sample from my dataset:

 

"total_payout_value" : { "asset" : "SBD", "amount" : 37.13 }, "pending_payout_value" : { "asset" : 

"total_payout_value" : {"amount" : 37.13, "asset" : "SBD"}, "pending_payout_value" : { "asset" : 

"total_payout_value" : {"amount" : 0, "asset" : "SBD"}, "pending_payout_value" : { "asset" : 

 

The actual sting is 1000s of characters long but each string will contain the string "total_payout_value" only once and the first number after this string is what I'm trying to extract. You will also see that sometimes the order of words between "total_payout_value" and the number can change so unfortuntately this length is not fixed. 

 

I tried this code but I don't know how to allow for the varying lengths. Really, what I need is is something that find the first number after the string "total_payout_value" 

 

data test1(keep = start end payout );
set data;
format payout $8.;
start = find(dataset,' "total_payout_value" : { "asset" : "SBD", "amount" : ')+54;
end = find(dataset,', "asset" : "SBD" }, "author"');
payout = substr(dataset,start,end-start));
run;

 

 

Super User
Super User
Posts: 7,997

Re: Pulling number from complex string and creating new numeric field containing stripped out number

What you have there is a JSon file, its a particular type of file structure.  Rather than trying to parse it yourself, use one of the methods given in your previous post on the subject:

https://communities.sas.com/t5/ODS-and-Base-Reporting/importing-JSON-data/m-p/346679/highlight/true#...

 

Or one of the many other posts on the subject:

https://communities.sas.com/t5/Base-SAS-Programming/Confusion-errors-in-libname-Json/m-p/319716/high...

https://communities.sas.com/t5/Base-SAS-Programming/Need-to-convert-a-JSON-file-to-SAS-dataset/m-p/3...

 

There are also several free tools out there, for example:

http://csvkit.readthedocs.io/en/1.0.2/

 

Super User
Posts: 7,868

Re: Pulling number from complex string and creating new numeric field containing stripped out number

How to retrieve data using name-value pairs, manually:

%let id1=total_payout_value;
%let id2=amount;

data test;
infile cards truncover;
input o_string $100.;
string = o_string;
i = index(string,"&id1");
string = substr(string,i);
string = scan(string,2,'{');
string = scan(string,1,'}');
i = index(string,"&id2");
string = substr(string,i+length("&id2"));
string = scan(string,2,':');
string = scan(string,1,',');
&id2 = input(string,best.);
drop i string;
cards;
"total_payout_value" : { "asset" : "SBD", "amount" : 37.13 }, "pending_payout_value" : { "asset" : 
"total_payout_value" : {"amount" : 37.13, "asset" : "SBD"}, "pending_payout_value" : { "asset" : 
"total_payout_value" : {"amount" : 0, "asset" : "SBD"}, "pending_payout_value" : { "asset" : 
;
run;

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 129

Re: Pulling number from complex string and creating new numeric field containing stripped out number

Posted in reply to KurtBremser

Hi,

 

what about regex?

data test;
infile cards truncover;
input string $100.;
cards;
"total_payout_value" : { "asset" : "SBD", "amount" : 37.13 }, "pending_payout_value" : { "asset" : 
"total_payout_value" : {"amount" : 37.13, "asset" : "SBD"}, "pending_payout_value" : { "asset" : 
"total_payout_value" : {"amount" : 0, "asset" : "SBD"}, "pending_payout_value" : { "asset" : 
;
run;

data test1;
   length var $200;
   set test;
   var=prxchange('s/.*total_payout_value\D*(\d*\.?\d*).*/$1/',-1,string);
run;

 

 

________________________

- Cheers -

Super User
Posts: 5,518

Re: Pulling number from complex string and creating new numeric field containing stripped out number

I think you are making this overly complex by searching for the wrong string.  The payout amount seems to follow the first instance of "amount" : so search for that instead.

 

This would create a character variable:

 

data want;

set have;

payout = scan( substr(dataset, index(dataset, '"amount" :' ) + 10), 1);

run;

 

Ask a Question
Discussion stats
  • 4 replies
  • 121 views
  • 2 likes
  • 5 in conversation