Solved: Re: how to read numbers of the form '00000-34.56' ?

mediaeval · Posted 08-09-2013 05:29 AM

What informat should I use to read in these numbers? The number in the example is -34.56.

Tks

Joe

Vince28_Statcan · Posted 08-09-2013 09:05 AM

The PRX solution I posted will actually read 00005238.99 appropriately as well. All of the above work. If you want to test it against your entire dataset and find anomalies/different ways that your numbers are written, I can modify the PRX for you if it's a one-time job.

An alternative since there appears to be no case of 0000.dd.dd in your data (period twice) without PRX would be as follow:

data have;

input var1 $11.;

format var2 best12.;

var2=input(substr(var1, (findc(var1, "123456789.-")), 12-(findc(var1, "123456789.-"))),8.);

datalines;

00000000.00

00000142.31

00000-23.86

00005238.99

;

run;

Vince

View solution in original post

UrvishShah · Posted 08-09-2013 06:22 AM

INFORMAT 6.2 you can apply...

mediaeval · Posted 08-09-2013 06:46 AM

When I run the data step,

Data negs;

set maindata;

informat premium 11.2;

newpremium = Put(premium,11.2);

Run;

It generates the error, "Invalid numeric data '0000.-34.56' at line..." for the record in question.

The number is in a string, 11 characters in length, hence the "11.2"

LinusH · Posted 08-09-2013 08:00 AM

Please tell us more about the form.

Is always a negative number? If not, what does that look like then?

Is the number always prefixed with a constant of 00000? If not, which other appearances is there?

It seems that some kind of string manipulation is needed, such substr() and/or scan().

Data never sleeps

Vince28_Statcan · Posted 08-09-2013 08:01 AM

you can't input them directly as numeric if the - sign is in the middle.

read as character and then do a clever use of substr/if or PRX and use input on the produced substring to create a new numeric variable of the desired format.

example:

data have;

input var1 $12.;

format var2 best12.;

var2=input(prxchange("s/(0*\.?)(-?\[0-9.]\d*)/$2/",1,var1), 8.);

datalines;

0000.-34.56

00000-34.56

;

run;

Since regular expressions are not of common knowledge, I'll explain what this prxchange does that could be replicated with a mix of substrings/if conditions with some text functions

s indicates that PRX can/will do a substitution (hence prxchange and not prxmatch)

the 3 /// are actual regular expression delimiters. The first two delimitate start/end of the expression that is to be matched. The last 2 delimitate the substitution.

()() delimiter 2 distinct capturing groups that can be refererenced by $1$2 respectively in the substitution segment

(0*\.?)

captures as many 0s as possible at the start of the string, (zero or more 0s), the \.? captures 0 or 1 periods. The \ is used because . is a regex construct and needs to be escaped to represent the period character itself.

(-?\[0-9.]\d*)

-? captures 0 or one hyphen (or negative sign). [] represent a group of character, in this case, 0-9 as well as period within the group. Inside the group, the period does not need to be escaped with a backslash. There are no quantifiers constructs following the [] meaning it will match exactly one of these characters. \d represents any "digits" characters (effectively 0-9 period and possibly hyphen, I can't remember) * means match as many as possible of those digits characters.

the $2 means that the value returned by the prxchange function will be that of the 2nd capturing group in the regular expression, in this case (-?\[0-9.]\d*) which is then input as a 8. number. I've used format best12. statement as this is the default SAS number format.

Hope this helps

Vince

mediaeval · Posted 08-09-2013 08:53 AM

Thank you all. The input field is 11 characters long, with two decimal places. Some sample data:

00000000.00

00000142.31

00000-23.86

00005238.99

Note that for positive numbers, the plus sign is not there. If the minus sign were not appearing, with leading zeroes, the informat would simply be 11.2.

Vince28_Statcan · Posted 08-09-2013 09:05 AM

The PRX solution I posted will actually read 00005238.99 appropriately as well. All of the above work. If you want to test it against your entire dataset and find anomalies/different ways that your numbers are written, I can modify the PRX for you if it's a one-time job.

An alternative since there appears to be no case of 0000.dd.dd in your data (period twice) without PRX would be as follow:

data have;

input var1 $11.;

format var2 best12.;

var2=input(substr(var1, (findc(var1, "123456789.-")), 12-(findc(var1, "123456789.-"))),8.);

datalines;

00000000.00

00000142.31

00000-23.86

00005238.99

;

run;

Vince

Tom · Posted 08-09-2013 09:52 AM

In code SAS would be happy to resolve 00000-23.86 as (00000 - 23.86).

So here is one trick using RESOLVE() function to call %SYSEVALF() for every value.

data have;

input var1 $11.;

format var2 best12.;

var2=input(resolve('%sysevalf('||var1||')'),11.);

put var1= var2=;

cards;

00000000.00

00000142.31

00000-23.86

00005238.99

;

var1=00000000.00 var2=0

var1=00000142.31 var2=142.31

var1=00000-23.86 var2=-23.86

var1=00005238.99 var2=5238.99

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away