I have a dataset with 100000 rows and each row contains an extremely long string with lots of data. I'm want to create a new field with the "total payout value". This is the first number after the string "total_payout_value". The problem is the length of the string between the "total_payout_value" and the number can vary and also the number length can vary. Here is a sample from my dataset:
"total_payout_value" : { "asset" : "SBD", "amount" : 37.13 }, "pending_payout_value" : { "asset" :
"total_payout_value" : {"amount" : 37.13, "asset" : "SBD"}, "pending_payout_value" : { "asset" :
"total_payout_value" : {"amount" : 0, "asset" : "SBD"}, "pending_payout_value" : { "asset" :
The actual sting is 1000s of characters long but each string will contain the string "total_payout_value" only once and the first number after this string is what I'm trying to extract. You will also see that sometimes the order of words between "total_payout_value" and the number can change so unfortuntately this length is not fixed.
I tried this code but I don't know how to allow for the varying lengths. Really, what I need is is something that find the first number after the string "total_payout_value"
data test1(keep = start end payout );
set data;
format payout $8.;
start = find(dataset,' "total_payout_value" : { "asset" : "SBD", "amount" : ')+54;
end = find(dataset,', "asset" : "SBD" }, "author"');
payout = substr(dataset,start,end-start));
run;
What you have there is a JSon file, its a particular type of file structure. Rather than trying to parse it yourself, use one of the methods given in your previous post on the subject:
Or one of the many other posts on the subject:
There are also several free tools out there, for example:
http://csvkit.readthedocs.io/en/1.0.2/
How to retrieve data using name-value pairs, manually:
%let id1=total_payout_value;
%let id2=amount;
data test;
infile cards truncover;
input o_string $100.;
string = o_string;
i = index(string,"&id1");
string = substr(string,i);
string = scan(string,2,'{');
string = scan(string,1,'}');
i = index(string,"&id2");
string = substr(string,i+length("&id2"));
string = scan(string,2,':');
string = scan(string,1,',');
&id2 = input(string,best.);
drop i string;
cards;
"total_payout_value" : { "asset" : "SBD", "amount" : 37.13 }, "pending_payout_value" : { "asset" :
"total_payout_value" : {"amount" : 37.13, "asset" : "SBD"}, "pending_payout_value" : { "asset" :
"total_payout_value" : {"amount" : 0, "asset" : "SBD"}, "pending_payout_value" : { "asset" :
;
run;
Hi,
what about regex?
data test;
infile cards truncover;
input string $100.;
cards;
"total_payout_value" : { "asset" : "SBD", "amount" : 37.13 }, "pending_payout_value" : { "asset" :
"total_payout_value" : {"amount" : 37.13, "asset" : "SBD"}, "pending_payout_value" : { "asset" :
"total_payout_value" : {"amount" : 0, "asset" : "SBD"}, "pending_payout_value" : { "asset" :
;
run;
data test1;
length var $200;
set test;
var=prxchange('s/.*total_payout_value\D*(\d*\.?\d*).*/$1/',-1,string);
run;
- Cheers -
I think you are making this overly complex by searching for the wrong string. The payout amount seems to follow the first instance of "amount" : so search for that instead.
This would create a character variable:
data want;
set have;
payout = scan( substr(dataset, index(dataset, '"amount" :' ) + 10), 1);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.