Pulling number from complex string and creating new numeric field cont...

brophymj · Posted 06-14-2017 06:46 AM

I have a dataset with 100000 rows and each row contains an extremely long string with lots of data. I'm want to create a new field with the "total payout value". This is the first number after the string "total_payout_value". The problem is the length of the string between the "total_payout_value" and the number can vary and also the number length can vary. Here is a sample from my dataset:

"total_payout_value" : { "asset" : "SBD", "amount" : 37.13 }, "pending_payout_value" : { "asset" :

"total_payout_value" : {"amount" : 37.13, "asset" : "SBD"}, "pending_payout_value" : { "asset" :

"total_payout_value" : {"amount" : 0, "asset" : "SBD"}, "pending_payout_value" : { "asset" :

The actual sting is 1000s of characters long but each string will contain the string "total_payout_value" only once and the first number after this string is what I'm trying to extract. You will also see that sometimes the order of words between "total_payout_value" and the number can change so unfortuntately this length is not fixed.

I tried this code but I don't know how to allow for the varying lengths. Really, what I need is is something that find the first number after the string "total_payout_value"

data test1(keep = start end payout );
set data;
format payout $8.;
start = find(dataset,' "total_payout_value" : { "asset" : "SBD", "amount" : ')+54;
end = find(dataset,', "asset" : "SBD" }, "author"');
payout = substr(dataset,start,end-start));
run;

RW9 · Posted 06-14-2017 07:14 AM

What you have there is a JSon file, its a particular type of file structure. Rather than trying to parse it yourself, use one of the methods given in your previous post on the subject:

https://communities.sas.com/t5/ODS-and-Base-Reporting/importing-JSON-data/m-p/346679/highlight/true#...

Or one of the many other posts on the subject:

https://communities.sas.com/t5/Base-SAS-Programming/Confusion-errors-in-libname-Json/m-p/319716/high...

https://communities.sas.com/t5/Base-SAS-Programming/Need-to-convert-a-JSON-file-to-SAS-dataset/m-p/3...

There are also several free tools out there, for example:

http://csvkit.readthedocs.io/en/1.0.2/

Kurt_Bremser · Posted 06-14-2017 07:56 AM

How to retrieve data using name-value pairs, manually:

%let id1=total_payout_value;
%let id2=amount;

data test;
infile cards truncover;
input o_string $100.;
string = o_string;
i = index(string,"&id1");
string = substr(string,i);
string = scan(string,2,'{');
string = scan(string,1,'}');
i = index(string,"&id2");
string = substr(string,i+length("&id2"));
string = scan(string,2,':');
string = scan(string,1,',');
&id2 = input(string,best.);
drop i string;
cards;
"total_payout_value" : { "asset" : "SBD", "amount" : 37.13 }, "pending_payout_value" : { "asset" : 
"total_payout_value" : {"amount" : 37.13, "asset" : "SBD"}, "pending_payout_value" : { "asset" : 
"total_payout_value" : {"amount" : 0, "asset" : "SBD"}, "pending_payout_value" : { "asset" : 
;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Oligolas · Posted 06-14-2017 08:52 AM

Hi,

what about regex?

data test;
infile cards truncover;
input string $100.;
cards;
"total_payout_value" : { "asset" : "SBD", "amount" : 37.13 }, "pending_payout_value" : { "asset" : 
"total_payout_value" : {"amount" : 37.13, "asset" : "SBD"}, "pending_payout_value" : { "asset" : 
"total_payout_value" : {"amount" : 0, "asset" : "SBD"}, "pending_payout_value" : { "asset" : 
;
run;

data test1;
   length var $200;
   set test;
   var=prxchange('s/.*total_payout_value\D*(\d*\.?\d*).*/$1/',-1,string);
run;

________________________
- Cheers -

Astounding · Posted 06-14-2017 09:36 AM

I think you are making this overly complex by searching for the wrong string. The payout amount seems to follow the first instance of "amount" : so search for that instead.

This would create a character variable:

data want;

set have;

payout = scan( substr(dataset, index(dataset, '"amount" :' ) + 10), 1);

run;

Pulling number from complex string and creating new numeric field containing stripped out number

Re: Pulling number from complex string and creating new numeric field containing stripped out number

Re: Pulling number from complex string and creating new numeric field containing stripped out number

Re: Pulling number from complex string and creating new numeric field containing stripped out number

Re: Pulling number from complex string and creating new numeric field containing stripped out number

Catch up on SAS Innovate 2026

Pulling number from complex string and creating new numeric field containing stripped out number

Re: Pulling number from complex string and creating new numeric field containing stripped out number

Re: Pulling number from complex string and creating new numeric field containing stripped out number

Re: Pulling number from complex string and creating new numeric field containing stripped out number

Re: Pulling number from complex string and creating new numeric field containing stripped out number

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away