BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mediaeval
Calcite | Level 5

What informat should I use to read in these numbers? The number in the example is -34.56.

Tks

Joe

1 ACCEPTED SOLUTION

Accepted Solutions
Vince28_Statcan
Quartz | Level 8

The PRX solution I posted will actually read 00005238.99 appropriately as well. All of the above work. If you want to test it against your entire dataset and find anomalies/different ways that your numbers are written, I can modify the PRX for you if it's a one-time job.

An alternative since there appears to be no case of 0000.dd.dd in your data (period twice) without PRX would be as follow:

data have;

     input var1 $11.;

     format var2 best12.;

     var2=input(substr(var1, (findc(var1, "123456789.-")), 12-(findc(var1, "123456789.-"))),8.);

     datalines;

00000000.00

00000142.31

00000-23.86

00005238.99

;

run;

Vince

View solution in original post

7 REPLIES 7
UrvishShah
Fluorite | Level 6

INFORMAT 6.2 you can apply...

mediaeval
Calcite | Level 5

When I run the data step,

Data negs;

  set maindata;

  informat premium 11.2;

  newpremium = Put(premium,11.2);

Run;

It generates the error, "Invalid numeric data '0000.-34.56' at line..." for the record in question.

The number is in a string, 11 characters in length, hence the "11.2"

LinusH
Tourmaline | Level 20

Please tell us more about the form.

Is always a negative number? If not, what does that look like then?

Is the number always prefixed with a constant of 00000? If not, which other appearances is there?

It seems that some kind of string manipulation is needed, such substr() and/or scan().

Data never sleeps
Vince28_Statcan
Quartz | Level 8

you can't input them directly as numeric if the - sign is in the middle.

read as character and then do a clever use of substr/if or PRX and use input on the produced substring to create a new numeric variable of the desired format.

example:

data have;

     input var1 $12.;

     format var2 best12.;

     var2=input(prxchange("s/(0*\.?)(-?\[0-9.]\d*)/$2/",1,var1), 8.);

     datalines;

0000.-34.56

00000-34.56

;

run;

Since regular expressions are not of common knowledge, I'll explain what this prxchange does that could be replicated with a mix of substrings/if conditions with some text functions

s indicates that PRX can/will do a substitution (hence prxchange and not prxmatch)

the 3 /// are actual regular expression delimiters. The first two delimitate start/end of the expression that is to be matched. The last 2 delimitate the substitution.

()() delimiter 2 distinct capturing groups that can be refererenced by $1$2 respectively in the substitution segment

(0*\.?)

captures as many 0s as possible at the start of the string, (zero or more 0s), the \.? captures 0 or 1 periods. The \ is used because . is a regex construct and needs to be escaped to represent the period character itself.

(-?\[0-9.]\d*)

-? captures 0 or one hyphen (or negative sign). [] represent a group of character, in this case, 0-9 as well as period within the group. Inside the group, the period does not need to be escaped with a backslash. There are no quantifiers constructs following the [] meaning it will match exactly one of these characters. \d represents any "digits" characters (effectively 0-9 period and possibly hyphen, I can't remember) * means match as many as possible of those digits characters.

the $2 means that the value returned by the prxchange function will be that of the 2nd capturing group in the regular expression, in this case (-?\[0-9.]\d*) which is then input as a 8. number. I've used format best12. statement as this is the default SAS number format.

Hope this helps

Vince

mediaeval
Calcite | Level 5

Thank you all. The input field is 11 characters long, with two decimal places. Some sample data:

00000000.00

00000142.31

00000-23.86

00005238.99

Note that for positive numbers, the plus sign is not there. If the minus sign were not appearing, with leading zeroes, the informat would simply be 11.2.

Vince28_Statcan
Quartz | Level 8

The PRX solution I posted will actually read 00005238.99 appropriately as well. All of the above work. If you want to test it against your entire dataset and find anomalies/different ways that your numbers are written, I can modify the PRX for you if it's a one-time job.

An alternative since there appears to be no case of 0000.dd.dd in your data (period twice) without PRX would be as follow:

data have;

     input var1 $11.;

     format var2 best12.;

     var2=input(substr(var1, (findc(var1, "123456789.-")), 12-(findc(var1, "123456789.-"))),8.);

     datalines;

00000000.00

00000142.31

00000-23.86

00005238.99

;

run;

Vince

Tom
Super User Tom
Super User

In code SAS would be happy to resolve 00000-23.86 as (00000 - 23.86).

So here is one trick using RESOLVE() function to call %SYSEVALF() for every value.

data have;

  input var1 $11.;

  format var2 best12.;

  var2=input(resolve('%sysevalf('||var1||')'),11.);

  put var1= var2=;

cards;

00000000.00

00000142.31

00000-23.86

00005238.99

;


var1=00000000.00 var2=0

var1=00000142.31 var2=142.31

var1=00000-23.86 var2=-23.86

var1=00005238.99 var2=5238.99

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1268 views
  • 3 likes
  • 5 in conversation