Solved: Re: reading string using scan

epr · Posted 04-12-2013 01:19 PM

Hello All,

Need some help reading data using scan

data a;

input info $1-50;

cards;

101 pencils 35

102 parker pensl21

103 dellstudioipod 04

104 dellstudioipod 01

run;

Here I separated columns using scan function.

data cd;

set j;

prid=input(scan(info,1,' '),3.);

prname=put(scan(info,2,' '),$20.);

prqty=input(scan(info,3,' '),3.);

drop info;

run;

But for this data set scan function is not woking as product name it self has sepated with spaces.

data a;

input info $1-50;

cards;

101 pencils 35

102 parker pens 21

103 apple ipod touch n shuffle 04

104 dell studio ipod 01

run;

Thank you

Tom · Posted 04-12-2013 01:36 PM

If you have any control over the generation of the list you should request that they use a delimiter other than space. Or if they can insure that there are not double spaces embedded in the PRNAME field and that are two spaces before the PRQTY field. Or change the order of the fields so that the PRNAME field is at the end of the line. Then the INPUT statement could read it.

In your case it looks possible, as long as you can insure that all records have that field at the end.

data want ;

set have;

length prid 8 prname $20 prqty 8 ;

length x1 x2 $8 ;

x1=scan(info,1,' ');

x2=scan(info,-1,' ');

prid=input(X1,3.);

prqty=input(X2,3.);

prname=left(substr(info,length(X1)+1,length(info) - length(x1) - length(x2) -2 ));

drop info x1 x2 ;

run;

View solution in original post

Tom · Posted 04-12-2013 01:36 PM

If you have any control over the generation of the list you should request that they use a delimiter other than space. Or if they can insure that there are not double spaces embedded in the PRNAME field and that are two spaces before the PRQTY field. Or change the order of the fields so that the PRNAME field is at the end of the line. Then the INPUT statement could read it.

In your case it looks possible, as long as you can insure that all records have that field at the end.

data want ;

set have;

length prid 8 prname $20 prqty 8 ;

length x1 x2 $8 ;

x1=scan(info,1,' ');

x2=scan(info,-1,' ');

prid=input(X1,3.);

prqty=input(X2,3.);

prname=left(substr(info,length(X1)+1,length(info) - length(x1) - length(x2) -2 ));

drop info x1 x2 ;

run;

data_null__ · Posted 04-12-2013 01:55 PM

Like Tom's only different and fixed now.

17         data _null_;
18            input info $1-50; 
19            call scan(info,1 ,p1,l1,' ');
20            call scan(info,-1,p2,l2,' ');
21            w1 = substrn(info,p1,l1);
22            w2 = substrn(info,l1+2,p2-l1-2);
23            w3 = substrn(info,p2,l2);
24            put (w:)(=);
25            cards;

w1=101 w2=pencils w3=35
w1=101 w2=parker pens w3=214444444
w1=103 w2=apple ipod touch n shuffle w3=0444
w1=104 w2=dell studio ipod w3=01

Message was edited by: data _null_

Tom · Posted 04-12-2013 02:06 PM

Length of second field does not depend on length of the last field.

It worked for the example because they were all length of 2.

w2 = substrn(info,l1+2,p2-l1-2);

data_null__ · Posted 04-12-2013 02:07 PM

OOPS

epr · Posted 04-12-2013 02:05 PM

Thank you Tom and ballardw,

I Appreciate your help

Tom your code worked , I got right output.

ballardw · Posted 04-12-2013 01:40 PM

If I had this data, depending on the source, I would seriously consider rearring the input model. If data were entered in prid prqty and prname order then you wouldn't need to parse the string but just read the variables with suitable options.

With your example data I would input prid directly and then try to parse the remainder.

If none of your product names contain digits then you could search for the first digit that appears using the ANYDIGIT function and then use SUBSTR

input prid $ info $varying50. ;

then something like

prname = substr(info,anyalpha(info),anydigit(info)-1);

prqty = input((substr(info,anydigit(info)),3.);

ScottBass · Posted 04-13-2013 05:42 AM

I'm a regex fan, so here's another approach:

* your first example ;

data a;

input info $1-50;

cards;

101 pencils 35

102 parker pensl21

103 dellstudioipod 04

104 dellstudioipod 01

run;

* your second example ;

data b;

input info $1-50;

cards;

101 pencils 35

102 parker pens 21

103 apple ipod touch n shuffle 04

104 dell studio ipod 01

run;

data want;

set a; * or set b ;

rx=prxparse("/^(\d+)(\D*)(\d+)$/o");

rc=prxmatch(rx,trim(info));

prid=input(prxposn(rx,1,info),best.);

prname=prxposn(rx,2,info);

prqty=input(prxposn(rx,3,info),best.);

run;

This will assign:

1) prid = beginning of line, all digits up to

2) prname = all non-digits, up to

3) prqty = all digits at the end of line

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

Registration is open

SAS Training: Just a Click Away