BookmarkSubscribeRSS Feed
rossfo
Calcite | Level 5

I so far have the first three variables. I am struggling however to get the city and zip code only from the address. Anyone know how to? I'm pretty new to SAS as a student so I've been struggling.

4 REPLIES 4
ChrisBrooks
Ammonite | Level 13

Hi @rossfo and welcome to the community.

 

There are several ways of doing this so here's one - read the input into the automatically created variable _infile_ then parse that using a combination of the SCAN and SUBSTR functions (the inclusion of both spaces and commas in the address field is what causes the issues. Here's the code

 

data want(drop=address);
	length shipid $5 received 8 shipped 8 city $20 zipcode $5 address $30;
	format received yymmdd10. shipped yymmdd10.;
	infile datalines;
	input;
	shipid=scan(_infile_,1," ");
	received=input(scan(_infile_,2," "),yymmdd10.);
	shipped=input(scan(_infile_,3," "),yymmdd10.);
	zipcode=scan(_infile_,-1," ");
	address=substr(_infile_,29);
	city=scan(address,1,",");
	zipcode=scan(address,-1,",");
datalines;
X8742 2018/03/14 2018/03/17 Little River, KS, 67457
R9028 2018/06/12 2018/07/10 Bartlett, KS, 67332
O1800 2018/03/28 2018/03/29 Leawood, KS, 66211
V7235 2018/05/19 2018/05/31 Manter, KS, 67862
H5688 2018/02/05 2018/02/21 Goodland, KS, 67735
G2665 2018/06/16 2018/06/20 Randall, KS, 66963
N2219 2018/11/28 2018/12/24 Neal, KS, 66863
L9312 2018/12/04 2018/12/12 Wichita, KS, 67211
P4220 2018/04/25 2018/05/03 Miltonvale, KS, 67466
W8174 2018/10/10 2018/10/13 Burns, KS, 66840
L6161 2018/08/02 2018/08/26 Milford, KS, 66514
J1873 2018/01/16 2018/01/27 Lindsborg, KS, 67456
G9188 2018/09/13 2018/10/02 Olsburg, KS, 66520
P7794 2018/12/23 2019/01/08 Salina, KS, 67402
T2556 2018/09/03 2018/09/09 Wichita, KS, 67211
H2552 2018/12/23 2019/01/04 Burlingame, KS, 66413
H7170 2018/02/22 2018/03/10 Syracuse, KS, 67878
B5975 2018/03/13 2018/04/06 Wetmore, KS, 66550
I1771 2018/08/10 2018/08/21 Bird City, KS, 67731
Z3670 2018/03/02 2018/03/05 Overland Park, KS, 66225
Q3244 2018/05/10 2018/05/12 Clifton, KS, 66937
Q1765 2018/02/08 2018/02/16 Hazelton, KS, 67061
N9743 2018/11/27 2018/12/01 Geuda Springs, KS, 67051
G3305 2018/12/13 2019/01/10 Norton, KS, 67654
T3424 2018/06/21 2018/07/15 Kansas City, KS, 66101
;
run;
mkeintz
PROC Star

If you didn't have some cities with two words ("Little River") it would be very basic:

 

data want ;
	infile datalines dlm=', ';
	input shipid :$5. received :yymmdd10. shipped :yymmdd10.
          city :$20. st :$2.             zipcode :$5.;
	format received yymmdd10. shipped yymmdd10.;
datalines;
.....
.....
run;

The dlm=', ' option tells SAS to treat both commas and blanks as field separators which would address the use of commas for some separators and blanks for others.  However this would fail to detect city names with blanks in the middle, and would mess up subsequent variables as well.

 

 

But there is an informat modifier (the &) that tells SAS to ignore a blank internal to a field.  So you can use the   :$&20. informat for the city name:

data want ;
  infile datalines dlm=', ';
  input shipid :$5. received :yymmdd10. shipped :yymmdd10.
        city :&$20. st :$2.             zipcode :$5.;
  format received yymmdd10. shipped yymmdd10.;
datalines;
X8742 2018/03/14 2018/03/17 Little River, KS, 67457
R9028 2018/06/12 2018/07/10 Bartlett, KS, 67332
O1800 2018/03/28 2018/03/29 Leawood, KS, 66211
V7235 2018/05/19 2018/05/31 Manter, KS, 67862
H5688 2018/02/05 2018/02/21 Goodland, KS, 67735
G2665 2018/06/16 2018/06/20 Randall, KS, 66963
N2219 2018/11/28 2018/12/24 Neal, KS, 66863
L9312 2018/12/04 2018/12/12 Wichita, KS, 67211
P4220 2018/04/25 2018/05/03 Miltonvale, KS, 67466
W8174 2018/10/10 2018/10/13 Burns, KS, 66840
L6161 2018/08/02 2018/08/26 Milford, KS, 66514
J1873 2018/01/16 2018/01/27 Lindsborg, KS, 67456
G9188 2018/09/13 2018/10/02 Olsburg, KS, 66520
P7794 2018/12/23 2019/01/08 Salina, KS, 67402
T2556 2018/09/03 2018/09/09 Wichita, KS, 67211
H2552 2018/12/23 2019/01/04 Burlingame, KS, 66413
H7170 2018/02/22 2018/03/10 Syracuse, KS, 67878
B5975 2018/03/13 2018/04/06 Wetmore, KS, 66550
I1771 2018/08/10 2018/08/21 Bird City, KS, 67731
Z3670 2018/03/02 2018/03/05 Overland Park, KS, 66225
Q3244 2018/05/10 2018/05/12 Clifton, KS, 66937
Q1765 2018/02/08 2018/02/16 Hazelton, KS, 67061
N9743 2018/11/27 2018/12/01 Geuda Springs, KS, 67051
G3305 2018/12/13 2019/01/10 Norton, KS, 67654
T3424 2018/06/21 2018/07/15 Kansas City, KS, 66101
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

Mike,

Good Idea. 

What if there are more than two blank in "Little   River" .

I would suggest:

 

data want ;
  infile datalines truncover;
  input shipid :$5. received :yymmdd10. shipped :yymmdd10. temp $100.;
  city=scan(temp,1,','); st=scan(temp,2,','); zipcode=scan(temp,3,',');
  drop temp;
  format received yymmdd10. shipped yymmdd10.;
datalines;
X8742 2018/03/14 2018/03/17 Little River, KS, 67457
R9028 2018/06/12 2018/07/10 Bartlett, KS, 67332
O1800 2018/03/28 2018/03/29 Leawood, KS, 66211
V7235 2018/05/19 2018/05/31 Manter, KS, 67862
H5688 2018/02/05 2018/02/21 Goodland, KS, 67735
G2665 2018/06/16 2018/06/20 Randall, KS, 66963
N2219 2018/11/28 2018/12/24 Neal, KS, 66863
L9312 2018/12/04 2018/12/12 Wichita, KS, 67211
P4220 2018/04/25 2018/05/03 Miltonvale, KS, 67466
W8174 2018/10/10 2018/10/13 Burns, KS, 66840
L6161 2018/08/02 2018/08/26 Milford, KS, 66514
J1873 2018/01/16 2018/01/27 Lindsborg, KS, 67456
G9188 2018/09/13 2018/10/02 Olsburg, KS, 66520
P7794 2018/12/23 2019/01/08 Salina, KS, 67402
T2556 2018/09/03 2018/09/09 Wichita, KS, 67211
H2552 2018/12/23 2019/01/04 Burlingame, KS, 66413
H7170 2018/02/22 2018/03/10 Syracuse, KS, 67878
B5975 2018/03/13 2018/04/06 Wetmore, KS, 66550
I1771 2018/08/10 2018/08/21 Bird City, KS, 67731
Z3670 2018/03/02 2018/03/05 Overland Park, KS, 66225
Q3244 2018/05/10 2018/05/12 Clifton, KS, 66937
Q1765 2018/02/08 2018/02/16 Hazelton, KS, 67061
N9743 2018/11/27 2018/12/01 Geuda Springs, KS, 67051
G3305 2018/12/13 2019/01/10 Norton, KS, 67654
T3424 2018/06/21 2018/07/15 Kansas City, KS, 66101
;
run;
mkeintz
PROC Star

@Ksharp :

 

If I expected double blanks in a character field (thereby defeating the benefit of the ampersand in the informat for city) , then I would use DLM=','  dropping the blank as a field delimiter.  This prevents consecutive internal blanks being interpreted as delimiters.  However it would then force me to read in the first three fields as consistently formatted:

 

data want ;
  infile datalines dlm=',';
  input (shipid received shipped) ($5. +1 yymmdd10. +1 yymmdd10.)
        city :$20.  st :$2.  zipcode $5.;
  format received yymmdd10. shipped yymmdd10.;
datalines;
X8742 2018/03/14 2018/03/17 Little River City, KS, 67457
X8742 2018/03/14 2018/03/17 Little  River City, KS, 67457
R9028 2018/06/12 2018/07/10 Bartlett, KS, 67332
run;

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1778 views
  • 3 likes
  • 4 in conversation