DATA Step, Macro, Functions and more

Reading a "complex" data file

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

Reading a "complex" data file

Hi all,

I have a problem in reading the following data file in SAS. I have tried ... like for 2 hours but no luck. Any help will be appreciated.

https://www.census.gov/population/metro/files/lists/historical/cencty.txt

The data file and its layout are attached.

Bill

Attachment
Attachment

Accepted Solutions
Solution
‎04-28-2014 09:08 PM
Super User
Super User
Posts: 6,499

Re: Reading a "complex" data file

Does not look that hard. Just retain the MSA/CMSA level variables onto the records for the CITY/ENTITY level variables.

data msa_cty ;

  infile 'c:\downloads\cencty.txt' expandtabs  truncover;

  input @ ;

  if _infile_ ^=: ' ' then input MSANAME $1-48 MSA $49-52 CMSA $65-66 ;

  else input CCNAME $1-48 PMSA $57-60 state $73-74 entity $81-85 ;

  retain msaname msa cmsa ;

run;

View solution in original post


All Replies
Super User
Posts: 5,256

Re: Reading a "complex" data file

Why don't you attach what you achieved so far, with comments/log?

You can't expect the community to create a whole program for you from scratch...

Data never sleeps
Super User
Posts: 6,931

Re: Reading a "complex" data file

I'm working on it.

But if you are a US citizen, do something for your country: get ahold of the idiot who created that file and use a LART on him/her.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 7,395

Re: Reading a "complex" data file

Hi,

Seems pretty straight-forward, the file structure is given at the end of the document.  So read the whole thing in as $2000.  Then post process the string into the separate parts as given by (drop everything after File layout and before Metropolitan Area/Central City            :

File Layout:

Character  Length  Field

1-48       48      Metropolitan Area/Central City

49-52       4      4-Digit FIPS MSA/CMSA code

53-56       4      Blank

57-60       4      4-Digit FIPS PMSA code (blank for MSAs)

61-64       4      Blank

65-66       2      2-Digit FIPS Alternative CMSA code

67-72       6      Blank

73-74       2      2-Digit FIPS State code (blank for MSAs and CMSAs)

75-80       6      Blank

81-85       5      5-Digit FIPS Entity code (blank for MSAs and CMSAs)

Super User
Posts: 6,931

Re: Reading a "complex" data file

His problem seem to be the tabstops in the file. They need to be expanded to the next column where mod(column,8) = 0.

At least I don't know of a simple SAS mechanism to do that.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 7,395

Re: Reading a "complex" data file

Yep, can't think of anything myself right at the moment.  You could find replace in Word then save that.  So go into Word, show special characters, highlight the '->' and replace that with a pipe.  Then you could also remove the '.' at the same time, both special charcters.  Save it and it should look something like:

Abilene,TXMSA0040||

Abilene(part),TX||||48|01000

Aguadilla,PRMSA|0060||||

Aguadillazonaurbana,PR||||72|00745

Albany,GAMSA|0120||||

Albany,GA||||13|01052

Albany-Schenectady-Troy,NYMSA|0160

Albany,NY||||36|01000

Schenectady,NY||||36|65508

Troy,NY||||36|75484

SaratogaSprings,NY||||36|65255

Albuquerque,NMMSA|0200||||

Albuquerque,NM||||35|02000

Alexandria,LAMSA|0220||||

Alexandria,LA||||22|00975

Allentown-Bethlehem-Easton,PAMSA|0240||||

Allentown,PA||||42|02000

Bethlehem,PA||||42|06088

Altoona,PAMSA|0280||||

Super User
Posts: 6,931

Re: Reading a "complex" data file

Try this:

data cencty;

infile '$HOME/sascommunity/cencty.txt' truncover lrecl=100;

length inp_line $ 100;

input;

inp_line = _infile_; *have to do this, or the first 2 blanks are eaten;

*expand the tabs;

i = indexc(inp_line,byte(9));

do while (i ne 0);

  x = mod(i-1,8);

  inp_line = substr(inp_line,1,i-1) !! repeat(' ',7-x) !! substr(inp_line,i+1);

  i = indexc(inp_line,byte(9));

end;

area = substr(inp_line,1,48);

cmsa = substr(inp_line,49,4);

pmsa = substr(inp_line,57,4);

cm = substr(inp_line,65,2);

ss = substr(inp_line,73,2);

eeeee = substr(inp_line,81,5);

drop i x inp_line;

run;

You may want do some grouping along the cmsa-level cities

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Respected Advisor
Posts: 3,777

Re: Reading a "complex" data file

RTM.  Why don't you try the INFILE statement option EXPANDTABS.

Super User
Posts: 6,931

Re: Reading a "complex" data file

Because google didn't find it for me Smiley Wink

OTOH, my method can be adapted for different tab lengths. EXPANDTABS only does 8-character tabs.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Respected Advisor
Posts: 3,777

Re: Reading a "complex" data file

KurtBremser wrote:

Because google didn't find it for me

I reckon that's the new, "the dog ate my homework".

Super User
Posts: 6,931

Re: Reading a "complex" data file

Yeah right. But have you tried to search for "expand tabstops" in the SAS Product Documentation?

The result is revealing (empty).

Most of the time, it is easier to find the right SAS documents through google than through the search function on the SAS website.

On top of that, I entered the wrong phrase in google, searching for "replace tabs" instead of "expanding tabs" (I'm not a native English speaker).

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 117

Re: Reading a "complex" data file

Hi,

You can say this file as an unorganized or formatted instead of complex file. You can ask the data provider to give the data with a proper delimiter (tab, ^,* etc..). OR else you have to do the data manipulation on formatting this data and then you can use infile or import statements.

Regards,

Vish

Super User
Posts: 9,671

Re: Reading a "complex" data file

Sorry . I just realize you should use COLUMN input method.

data x;
infile 'c:\temp\cencty.txt' truncover expandtabs;
input v1 $ 1-48   
 v2 $ 49-52      
 v3 $ 57-60   
 v4 $ 65-66     
 v5 $ 73-74   
 v6 $ 81-85   ;
run;

Xia Keshan

Message was edited by: xia keshan

Message was edited by: xia keshan Really not easy.

Message was edited by: xia keshan

Super User
Posts: 6,931

Re: Reading a "complex" data file

Hmm. That puts a lot of values in the wrong columns. v3 and v6 are always empty.

Edit: saw you made some changes.

cm(v4)=07 is missing in the line for Boston_Worcester or Chicago-Gary...

pmsa (v3) is read into v2

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 10,483

Re: Reading a "complex" data file

My $.02: Blaim a webmaster. The file description is fixed column text but since the data doesn't match the descriptor someone has jiggered the data part of the file and I suspect a web publishing system.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 19 replies
  • 549 views
  • 10 likes
  • 9 in conversation