DATA Step, Macro, Functions and more

Input data with embedded delimiter

Accepted Solution Solved
Reply
Contributor
Posts: 30
Accepted Solution

Input data with embedded delimiter

Hello,

 

I have been trying the following program to create the dataset avgcelcius by inputting the records mentioned in datalines section. But, since it contains embedded spaces I tried using "&" format modifier of list input but it doesn't work. 

 

When I manually adjusted the records in fixed columns, it worked! But, I want to know how can I do it by using any other way ?

 

Thank you!

 

 

DATA avgcelsius;
input City &$17. jan feb mar apr may jun
jul aug sep oct nov dec;
DATALINES;
State College, PA -2 -2 2 8 14 19 21 20 16 10 4 -1
Miami, FL 20 20 22 23 26 27 28 28 27 26 23 20
St. Louis, MO -1 1 6 13 18 23 26 25 21 15 7 1
New Orleans, LA 11 13 16 20 23 27 27 27 26 21 16 12
Madison, WI -8 -5 0 7 14 19 22 20 16 10 2 -5
Houston, TX 10 12 16 20 23 27 28 28 26 21 16 12
Phoenix, AZ 12 14 16 21 26 31 33 32 30 23 16 12
Seattle, WA 5 6 7 10 13 16 18 18 16 12 8 6
San Francisco, CA 10 12 12 13 14 15 15 16 17 16 14 11
San Diego, CA 13 14 15 16 17 19 21 22 21 19 16 14
;
RUN;

 


Accepted Solutions
Solution
‎01-17-2018 11:18 AM
PROC Star
Posts: 8,112

Re: Input data with embedded delimiter

[ Edited ]

The & will only do what you want if there are at least two spaces separating the fields. You can force an extra space as follows:

data avgcelsius (drop=x);
  informat City $17.;
  input @;
  x=anyalpha(_infile_,-50);
  _infile_=substr(_infile_,1,x)||' '||substr(_infile_,x+1);
  input City & jan feb mar apr may jun jul aug sep oct nov dec;
  DATALINES;
State College, PA -2 -2 2 8 14 19 21 20 16 10 4 -1
Miami, FL 20 20 22 23 26 27 28 28 27 26 23 20
St. Louis, MO -1 1 6 13 18 23 26 25 21 15 7 1
New Orleans, LA 11 13 16 20 23 27 27 27 26 21 16 12
Madison, WI -8 -5 0 7 14 19 22 20 16 10 2 -5
Houston, TX 10 12 16 20 23 27 28 28 26 21 16 12
Phoenix, AZ 12 14 16 21 26 31 33 32 30 23 16 12
Seattle, WA 5 6 7 10 13 16 18 18 16 12 8 6
San Francisco, CA 10 12 12 13 14 15 15 16 17 16 14 11
San Diego, CA 13 14 15 16 17 19 21 22 21 19 16 14
;
RUN;

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
PROC Star
Posts: 1,321

Re: Input data with embedded delimiter

fun stuff, just play:

 

DATA avgcelsius;

infile datalines ;

length city $30;

input @;

city=strip(substr(_infile_,1,anydigit(_infile_)-2)) ;

input @(length(city)+1) jan feb mar apr may jun

jul aug sep oct nov dec;

DATALINES;

State College, PA -2 -2 2 8 14 19 21 20 16 10 4 -1

Miami, FL 20 20 22 23 26 27 28 28 27 26 23 20

St. Louis, MO -1 1 6 13 18 23 26 25 21 15 7 1

New Orleans, LA 11 13 16 20 23 27 27 27 26 21 16 12

Madison, WI -8 -5 0 7 14 19 22 20 16 10 2 -5

Houston, TX 10 12 16 20 23 27 28 28 26 21 16 12

Phoenix, AZ 12 14 16 21 26 31 33 32 30 23 16 12

Seattle, WA 5 6 7 10 13 16 18 18 16 12 8 6

San Francisco, CA 10 12 12 13 14 15 15 16 17 16 14 11

San Diego, CA 13 14 15 16 17 19 21 22 21 19 16 14

;

RUN;

Solution
‎01-17-2018 11:18 AM
PROC Star
Posts: 8,112

Re: Input data with embedded delimiter

[ Edited ]

The & will only do what you want if there are at least two spaces separating the fields. You can force an extra space as follows:

data avgcelsius (drop=x);
  informat City $17.;
  input @;
  x=anyalpha(_infile_,-50);
  _infile_=substr(_infile_,1,x)||' '||substr(_infile_,x+1);
  input City & jan feb mar apr may jun jul aug sep oct nov dec;
  DATALINES;
State College, PA -2 -2 2 8 14 19 21 20 16 10 4 -1
Miami, FL 20 20 22 23 26 27 28 28 27 26 23 20
St. Louis, MO -1 1 6 13 18 23 26 25 21 15 7 1
New Orleans, LA 11 13 16 20 23 27 27 27 26 21 16 12
Madison, WI -8 -5 0 7 14 19 22 20 16 10 2 -5
Houston, TX 10 12 16 20 23 27 28 28 26 21 16 12
Phoenix, AZ 12 14 16 21 26 31 33 32 30 23 16 12
Seattle, WA 5 6 7 10 13 16 18 18 16 12 8 6
San Francisco, CA 10 12 12 13 14 15 15 16 17 16 14 11
San Diego, CA 13 14 15 16 17 19 21 22 21 19 16 14
;
RUN;

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 30

Re: Input data with embedded delimiter

Thank you so much for your help. It works fine now.
Contributor
Posts: 30

Re: Input data with embedded delimiter

I am facing same difficulty again -

I want to input following data into a dataset. The values have embedded spaces but in this case I am unable to apply a solution similar to what you suggested. Please help.

data multiple;
input city $30. state $2. zip $7.;
datalines;
Flemington NJ 08822
North City NY 11518
Atlantic city NJ 08823
Rancho Santa Margarita CA 78556
;
run;
PROC Star
Posts: 8,112

Re: Input data with embedded delimiter

Same suggestion, but just change where you're adding the extra space. e.g.:

data multiple (drop=x);
  input @;
  x=anyalpha(_infile_,-50)-2;
  _infile_=substr(_infile_,1,x)||' '||substr(_infile_,x+1);
  input city & $30. state $2. zip $7.;
datalines;
Flemington NJ 08822
North City NY 11518
Atlantic city NJ 08823
Rancho Santa Margarita CA 78556
;
run;

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 30

Re: Input data with embedded delimiter

Thank you sir!
With the help of what you told I was trying the following -

 

data multiple(drop=x);
input #1 @1 name $50.
#2 @1 address $50.;
input @;
x=anyalpha(_infile_,-50)-2;
_infile_=substr(_infile_,1,x)||' '||substr(_infile_,x+1);
input #3 @1 city $50.
state $3.
zip $7.;
name=compbl(name);
address=compbl(address);
city=compbl(city);
datalines;
Ron Cody
89 Lazy Brook Road
Flemington NJ 08822
Tim Brown
35 Peak Dr
New York City NY 19454
;
run;

 

The city, state, zip variables are not coming up correctly.

Could you guide me.

Thank you!

PROC Star
Posts: 8,112

Re: Input data with embedded delimiter

Here is one way:

data multiple(drop=x rawin);
  infile cards truncover;
  informat name address city rawin $50.;
  informat state $2.;
  informat zip $5.;
  input name & /
      address & /
      rawin & @;
  x=anyalpha(_infile_,-50)-2;
  _infile_=substr(_infile_,1,x)||' '||substr(_infile_,x+1);
  input @1 city & state zip;
  datalines;
Ron Cody
89 Lazy Brook Road
Flemington NJ 08822
Tim Brown
35 Peak Dr
New York City NY 19454
;
run;

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 30

Re: Input data with embedded delimiter

Sorry, I forgot to mention that there were additional spaces in between values as below -

data multiple(drop=x);
input #1 @1 name $50.
#2 @1 address $50.;
input @;
x=anyalpha(_infile_,-50)-2;
_infile_=substr(_infile_,1,x)||' '||substr(_infile_,x+1);
input #3 @1 city $50.
state $3.
zip $7.;
name=compbl(name);
address=compbl(address);
city=compbl(city);
datalines;
Ron Cody
89 Lazy Brook Road
Flemington NJ 08822
Tim Brown
35 Peak Dr
New York City NY 19454
;
run;

How to deal with these extra blanks in city value?
PROC Star
Posts: 8,112

Re: Input data with embedded delimiter

Your extra spaces were and again are lost in your post because you didn't past them in a code box (i.e., the {i} icon).

 

Regardless, here is one way:

data multiple(drop=x);
  infile cards truncover;
  informat name address city $50.;
  informat state $2.;
  informat zip $5.;
  retain name address;
  input @;
  _infile_=compbl(_infile_);
  input name &;
  input @;
  _infile_=compbl(_infile_);
  input address &;
  input @;
  _infile_=compbl(_infile_);
  x=anyalpha(_infile_,-50)-2;
  _infile_=substr(_infile_,1,x)||' '||substr(_infile_,x+1);
  input @1 city & state zip;
  output;
  datalines;
Ron     Cody
89   Lazy   Brook   Road
Flemington     NJ    08822
Tim   Brown
35     Peak Dr
New   York    City   NY    19454
;
run;

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 30

Re: Input data with embedded delimiter

Thank you so much sir!
Trusted Advisor
Posts: 1,318

Re: Input data with embedded delimiter

DATA avgcelsius;
length city $ 17;
infile cards col=col;
input @'2c'x + 4 @;
city = substr(_infile_,1,col-2);
input jan feb mar apr may jun jul aug sep oct nov dec;
DATALINES;
State College, PA -2 -2 2 8 14 19 21 20 16 10 4 -1
Miami, FL 20 20 22 23 26 27 28 28 27 26 23 20
St. Louis, MO -1 1 6 13 18 23 26 25 21 15 7 1
New Orleans, LA 11 13 16 20 23 27 27 27 26 21 16 12
Madison, WI -8 -5 0 7 14 19 22 20 16 10 2 -5
Houston, TX 10 12 16 20 23 27 28 28 26 21 16 12
Phoenix, AZ 12 14 16 21 26 31 33 32 30 23 16 12
Seattle, WA 5 6 7 10 13 16 18 18 16 12 8 6
San Francisco, CA 10 12 12 13 14 15 15 16 17 16 14 11
San Diego, CA 13 14 15 16 17 19 21 22 21 19 16 14
;
RUN;
Super User
Posts: 10,615

Re: Input data with embedded delimiter

data avgcelsius ;
infile datalines dlm=' ' truncover;
  input @;
  _infile_=prxchange('s/(\s[a-z][a-z]\s)/$1 /i',1,_infile_);
  input City & $20. jan feb mar apr may jun jul aug sep oct nov dec;
DATALINES;
State College, PA -2 -2 2 8 14 19 21 20 16 10 4 -1
Miami, FL 20 20 22 23 26 27 28 28 27 26 23 20
St. Louis, MO -1 1 6 13 18 23 26 25 21 15 7 1
New Orleans, LA 11 13 16 20 23 27 27 27 26 21 16 12
Madison, WI -8 -5 0 7 14 19 22 20 16 10 2 -5
Houston, TX 10 12 16 20 23 27 28 28 26 21 16 12
Phoenix, AZ 12 14 16 21 26 31 33 32 30 23 16 12
Seattle, WA 5 6 7 10 13 16 18 18 16 12 8 6
San Francisco, CA 10 12 12 13 14 15 15 16 17 16 14 11
San Diego, CA 13 14 15 16 17 19 21 22 21 19 16 14
;
RUN;
Contributor
Posts: 30

Re: Input data with embedded delimiter

Hi,

Thank you for your help. I have never used this function prxchange. I learnt about it a bit, but I am not able to understand what you have done here. Could you please explain this ?

Thank you!
PROC Star
Posts: 8,112

Re: Input data with embedded delimiter

Regular expressions are good to learn, but a bit complex (at first) to understand. @Ksharp's for example:

  _infile_=prxchange('s/(\s[a-z][a-z]\s)/$1 /i',1,_infile_);

reading from left to right, the function uses s/ to indicate that it is a search/replace instance

(\s[a-z][a-z]\s)

is looking for a string that begins with a space, followed by a character between a and z, followed by another character between a and z, followed by a space

/$1 

says that, when found add one character, namely a space

/i'

says to ignore case (thus search for either a thru z and/or A thru Z)

 

Finally, the 1 instructs the function to only do the operation once, and then the _infile_ says to do it on the _infile_ variable.

 

Art, CEO, AnalystFinder.com

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 217 views
  • 2 likes
  • 5 in conversation