I have a dataset with 100000 rows and each row contains an extremely long string with lots of data. I'm want to create a new field with the "total payout value". This is the first number after the string "total_payout_value". The problem is the length of the string between the "total_payout_value" and the number can vary and also the number length can vary. Here is a sample from my dataset:
"total_payout_value" : { "asset" : "SBD", "amount" : 37.13 }, "pending_payout_value" : { "asset" :
"total_payout_value" : {"amount" : 37.13, "asset" : "SBD"}, "pending_payout_value" : { "asset" :
"total_payout_value" : {"amount" : 0, "asset" : "SBD"}, "pending_payout_value" : { "asset" :
The actual sting is 1000s of characters long but each string will contain the string "total_payout_value" only once and the first number after this string is what I'm trying to extract. You will also see that sometimes the order of words between "total_payout_value" and the number can change so unfortuntately this length is not fixed.
I tried this code but I don't know how to allow for the varying lengths. Really, what I need is is something that find the first number after the string "total_payout_value"
data test1(keep = start end payout );
set data;
format payout $8.;
start = find(dataset,' "total_payout_value" : { "asset" : "SBD", "amount" : ')+54;
end = find(dataset,', "asset" : "SBD" }, "author"');
payout = substr(dataset,start,end-start));
run;
What you have there is a JSon file, its a particular type of file structure. Rather than trying to parse it yourself, use one of the methods given in your previous post on the subject:
Or one of the many other posts on the subject:
There are also several free tools out there, for example:
http://csvkit.readthedocs.io/en/1.0.2/
How to retrieve data using name-value pairs, manually:
%let id1=total_payout_value;
%let id2=amount;
data test;
infile cards truncover;
input o_string $100.;
string = o_string;
i = index(string,"&id1");
string = substr(string,i);
string = scan(string,2,'{');
string = scan(string,1,'}');
i = index(string,"&id2");
string = substr(string,i+length("&id2"));
string = scan(string,2,':');
string = scan(string,1,',');
&id2 = input(string,best.);
drop i string;
cards;
"total_payout_value" : { "asset" : "SBD", "amount" : 37.13 }, "pending_payout_value" : { "asset" :
"total_payout_value" : {"amount" : 37.13, "asset" : "SBD"}, "pending_payout_value" : { "asset" :
"total_payout_value" : {"amount" : 0, "asset" : "SBD"}, "pending_payout_value" : { "asset" :
;
run;
Hi,
what about regex?
data test;
infile cards truncover;
input string $100.;
cards;
"total_payout_value" : { "asset" : "SBD", "amount" : 37.13 }, "pending_payout_value" : { "asset" :
"total_payout_value" : {"amount" : 37.13, "asset" : "SBD"}, "pending_payout_value" : { "asset" :
"total_payout_value" : {"amount" : 0, "asset" : "SBD"}, "pending_payout_value" : { "asset" :
;
run;
data test1;
length var $200;
set test;
var=prxchange('s/.*total_payout_value\D*(\d*\.?\d*).*/$1/',-1,string);
run;
- Cheers -
I think you are making this overly complex by searching for the wrong string. The payout amount seems to follow the first instance of "amount" : so search for that instead.
This would create a character variable:
data want;
set have;
payout = scan( substr(dataset, index(dataset, '"amount" :' ) + 10), 1);
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.