Hello All
I would like to sum up the numbers from the text below:
Kwota=237 + Kwota=30.87 + Kwota=292.74 + Kwota=292.74 + Kwota=300.87 that is:
Kwota=1154,22
In the example there are 5 phrases but their number may vary.
text:
[Forma sprzedazy=leasing, auto: Audi A 6, Kwota=237, Cena =123],[Forma sprzedazy=leasing, auto: Volvo, Kwota=30.87, Cena =10],[Forma sprzedazy=leasing, auto: VW Golf, Kwota=292.74, Cena =134],[Forma sprzedazy=leasing, auto: VW Golf, Kwota=292.74, Cena =134],[Forma sprzedazy=leasing, auto: Porche, Kwota=300.87, Cena =152]
Hello @Jacek2 and welcome to the SAS Support Communities!
This should get you started:
data have;
length text $1000;
text=cat('[Forma sprzedazy=leasing, auto: Audi A 6, Kwota=237, Cena =123],',
'[Forma sprzedazy=leasing, auto: Volvo, Kwota=30.87, Cena =10],',
'[Forma sprzedazy=leasing, auto: VW Golf, Kwota=292.74, Cena =134],',
'[Forma sprzedazy=leasing, auto: VW Golf, Kwota=292.74, Cena =134],',
'[Forma sprzedazy=leasing, auto: Porche, Kwota=300.87, Cena =152]');
output;
text=cat('[Forma sprzedazy=leasing, auto: Audi A 6, Kwota = 237, Cena =123],',
'[Forma sprzedazy=leasing, auto: Volvo, Kwota 30.87, Cena =10],',
'[Forma sprzedazy=leasing, auto: VW Golf, Kwota:292.74, Cena =134],',
'[Forma sprzedazy=leasing, auto: VW Golf Kwota=292.74 Cena =134],',
'[Forma sprzedazy=leasing, auto: Porsche, kwota=2,300.87, Cena =852],',
'[Forma sprzedazy=leasing, auto: Horch, Kwota=60,000],',
'[Forma sprzedazy=leasing, auto: N/A, Kwota=N/A, Cena =N/A],',
'[Forma sprzedazy=leasing, auto: N/A, Kwota=, Cena =]');
output;
run;
data want(keep=text total);
set have;
if ~id then id+prxparse('/Kwota[=: ]*(\d+,?\d*\.?\d*)/i');
start=1;
stop=length(text);
link add;
do while(pos);
link add;
end;
add:
call prxnext(id, start, stop, text, pos, len);
total=sum(total,input(prxposn(id, 1, text),?? comma32.));
return;
run;
I have added a second observation with eight "phrases" involving various deviations from the standard layout to demonstrate the robustness of the algorithm:
The Perl regular expression '/Kwota[=: ]*(\d+,?\d*\.?\d*)/i' describes a pattern consisting of the word "Kwota" (case-insensitive), followed by any number of equal signs, colons or blanks, at least one digit and possibly: a comma, more digts, a decimal point and decimals. The CALL PRXNEXT routine performs the search repeatedly as long as the pattern is found (condition pos>0). The relevant part of the string TEXT matching the pattern (i.e., the digits with thousands separator and decimal point, if any) is extracted by the PRXPOSN function, converted to a numeric value (if possible) by the INPUT function and finally added to variable TOTAL by the SUM function.
Note that a number format like 1.234,56 in the data would require the COMMAX. informat instead of the COMMA. informat. A mixture of different formats (e.g., 123.45 in some places and 123,45 in others) is not supported by the suggested code. Numbers with two thousands separators (like 1,199,000) are not supported either.
Do you have a SAS data set?
If so show us the results of Proc Contents on the data set.
If you do not have a SAS data set the first thing will be to create one and should show use some of the source file so we can help create one.
Almost certainly you have to do something to separate the text into meaningful variables possibly such as Model, Kwota and Cena (what ever those are). Then the sum would be easy.
Thank you for the answer, the problem has already been solved by colleagues below.
Alternatively:
data want; length kwota_total 8; set have; rxid = prxparse ('/kwota=(\d+\.?\d+)/i'); start = 1; stop = length(text); do while (_n_); call prxnext (rxid, start, stop, text, position, length); if position < 1 then leave; kwota_total = sum(kwota_total, input (prxposn(rxid,1,text), best32.)); end; drop rxid start stop position length; run;
If the text is pretty simple it is not hard to parse it in SAS code.
Looks like you have values in brackets (square brackets if you speak British) separated by commas. Within that you have name value pairs separated by commas. Strangely some of the name value pairs are separated by = and others by ':'.
You could build a regular expression to parse, but for your simple example you can just use SCAN(), especially if all you want is the KWOTA (?quota?) values.
Let's start by creating a dataset from the information in your post so we have something to code with.
data have;
text='[Forma sprzedazy=leasing, auto: Audi A 6, Kwota=237, Cena =123]'
||',[Forma sprzedazy=leasing, auto: Volvo, Kwota=30.87, Cena =10]'
||',[Forma sprzedazy=leasing, auto: VW Golf, Kwota=292.74, Cena =134]'
||',[Forma sprzedazy=leasing, auto: VW Golf, Kwota=292.74, Cena =134]'
||',[Forma sprzedazy=leasing, auto: Porche, Kwota=300.87, Cena =152]'
;
run;
Now we can loop over the name/value pairs in TEXT and when the name is KWOTA convert the value to a number and accumulate the amount.
data want;
set have;
dlm='[],=:';
do word = 1 to countw(text,dlm) by 2;
length name value $50 ;
name=left(scan(text,word,dlm));
value=left(scan(text,word+1,dlm));
if name='Kwota' then kwota=sum(kwota,input(value,??32.));
end;
drop word name value dlm ;
run;
Result
Obs kwota 1 1154.22
Hello @Jacek2 and welcome to the SAS Support Communities!
This should get you started:
data have;
length text $1000;
text=cat('[Forma sprzedazy=leasing, auto: Audi A 6, Kwota=237, Cena =123],',
'[Forma sprzedazy=leasing, auto: Volvo, Kwota=30.87, Cena =10],',
'[Forma sprzedazy=leasing, auto: VW Golf, Kwota=292.74, Cena =134],',
'[Forma sprzedazy=leasing, auto: VW Golf, Kwota=292.74, Cena =134],',
'[Forma sprzedazy=leasing, auto: Porche, Kwota=300.87, Cena =152]');
output;
text=cat('[Forma sprzedazy=leasing, auto: Audi A 6, Kwota = 237, Cena =123],',
'[Forma sprzedazy=leasing, auto: Volvo, Kwota 30.87, Cena =10],',
'[Forma sprzedazy=leasing, auto: VW Golf, Kwota:292.74, Cena =134],',
'[Forma sprzedazy=leasing, auto: VW Golf Kwota=292.74 Cena =134],',
'[Forma sprzedazy=leasing, auto: Porsche, kwota=2,300.87, Cena =852],',
'[Forma sprzedazy=leasing, auto: Horch, Kwota=60,000],',
'[Forma sprzedazy=leasing, auto: N/A, Kwota=N/A, Cena =N/A],',
'[Forma sprzedazy=leasing, auto: N/A, Kwota=, Cena =]');
output;
run;
data want(keep=text total);
set have;
if ~id then id+prxparse('/Kwota[=: ]*(\d+,?\d*\.?\d*)/i');
start=1;
stop=length(text);
link add;
do while(pos);
link add;
end;
add:
call prxnext(id, start, stop, text, pos, len);
total=sum(total,input(prxposn(id, 1, text),?? comma32.));
return;
run;
I have added a second observation with eight "phrases" involving various deviations from the standard layout to demonstrate the robustness of the algorithm:
The Perl regular expression '/Kwota[=: ]*(\d+,?\d*\.?\d*)/i' describes a pattern consisting of the word "Kwota" (case-insensitive), followed by any number of equal signs, colons or blanks, at least one digit and possibly: a comma, more digts, a decimal point and decimals. The CALL PRXNEXT routine performs the search repeatedly as long as the pattern is found (condition pos>0). The relevant part of the string TEXT matching the pattern (i.e., the digits with thousands separator and decimal point, if any) is extracted by the PRXPOSN function, converted to a numeric value (if possible) by the INPUT function and finally added to variable TOTAL by the SUM function.
Note that a number format like 1.234,56 in the data would require the COMMAX. informat instead of the COMMA. informat. A mixture of different formats (e.g., 123.45 in some places and 123,45 in others) is not supported by the suggested code. Numbers with two thousands separators (like 1,199,000) are not supported either.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.