Solved: Re: Reading a "complex" data file

tesu · Posted 04-28-2014 01:13 AM

Hi all,

I have a problem in reading the following data file in SAS. I have tried ... like for 2 hours but no luck. Any help will be appreciated.

https://www.census.gov/population/metro/files/lists/historical/cencty.txt

The data file and its layout are attached.

Bill

Tom · Posted 04-28-2014 09:08 PM

Does not look that hard. Just retain the MSA/CMSA level variables onto the records for the CITY/ENTITY level variables.

data msa_cty ;

infile 'c:\downloads\cencty.txt' expandtabs truncover;

input @ ;

if _infile_ ^=: ' ' then input MSANAME $1-48 MSA $49-52 CMSA $65-66 ;

else input CCNAME $1-48 PMSA $57-60 state $73-74 entity $81-85 ;

retain msaname msa cmsa ;

run;

View solution in original post

LinusH · Posted 04-28-2014 02:45 AM

Why don't you attach what you achieved so far, with comments/log?

You can't expect the community to create a whole program for you from scratch...

Data never sleeps

Kurt_Bremser · Posted 04-28-2014 03:20 AM

I'm working on it.

But if you are a US citizen, do something for your country: get ahold of the idiot who created that file and use a LART on him/her.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RW9 · Posted 04-28-2014 04:50 AM

Hi,

Seems pretty straight-forward, the file structure is given at the end of the document. So read the whole thing in as $2000. Then post process the string into the separate parts as given by (drop everything after File layout and before Metropolitan Area/Central City :

File Layout:

Character Length Field

1-48 48 Metropolitan Area/Central City

49-52 4 4-Digit FIPS MSA/CMSA code

53-56 4 Blank

57-60 4 4-Digit FIPS PMSA code (blank for MSAs)

61-64 4 Blank

65-66 2 2-Digit FIPS Alternative CMSA code

67-72 6 Blank

73-74 2 2-Digit FIPS State code (blank for MSAs and CMSAs)

75-80 6 Blank

81-85 5 5-Digit FIPS Entity code (blank for MSAs and CMSAs)

Kurt_Bremser · Posted 04-28-2014 05:04 AM

His problem seem to be the tabstops in the file. They need to be expanded to the next column where mod(column,8) = 0.

At least I don't know of a simple SAS mechanism to do that.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RW9 · Posted 04-28-2014 06:04 AM

Yep, can't think of anything myself right at the moment. You could find replace in Word then save that. So go into Word, show special characters, highlight the '->' and replace that with a pipe. Then you could also remove the '.' at the same time, both special charcters. Save it and it should look something like:

Abilene,TXMSA0040||

Abilene(part),TX||||48|01000

Aguadilla,PRMSA|0060||||

Aguadillazonaurbana,PR||||72|00745

Albany,GAMSA|0120||||

Albany,GA||||13|01052

Albany-Schenectady-Troy,NYMSA|0160

Albany,NY||||36|01000

Schenectady,NY||||36|65508

Troy,NY||||36|75484

SaratogaSprings,NY||||36|65255

Albuquerque,NMMSA|0200||||

Albuquerque,NM||||35|02000

Alexandria,LAMSA|0220||||

Alexandria,LA||||22|00975

Allentown-Bethlehem-Easton,PAMSA|0240||||

Allentown,PA||||42|02000

Bethlehem,PA||||42|06088

Altoona,PAMSA|0280||||

Kurt_Bremser · Posted 04-28-2014 06:47 AM

Try this:

data cencty;

infile '$HOME/sascommunity/cencty.txt' truncover lrecl=100;

length inp_line $ 100;

input;

inp_line = _infile_; *have to do this, or the first 2 blanks are eaten;

*expand the tabs;

i = indexc(inp_line,byte(9));

do while (i ne 0);

x = mod(i-1,8);

inp_line = substr(inp_line,1,i-1) !! repeat(' ',7-x) !! substr(inp_line,i+1);

i = indexc(inp_line,byte(9));

end;

area = substr(inp_line,1,48);

cmsa = substr(inp_line,49,4);

pmsa = substr(inp_line,57,4);

cm = substr(inp_line,65,2);

ss = substr(inp_line,73,2);

eeeee = substr(inp_line,81,5);

drop i x inp_line;

run;

You may want do some grouping along the cmsa-level cities

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

data_null__ · Posted 04-28-2014 07:15 AM

RTM. Why don't you try the INFILE statement option EXPANDTABS.

Kurt_Bremser · Posted 04-28-2014 07:32 AM

Because google didn't find it for me

OTOH, my method can be adapted for different tab lengths. EXPANDTABS only does 8-character tabs.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

data_null__ · Posted 04-28-2014 08:46 AM

KurtBremser wrote:

Because google didn't find it for me

I reckon that's the new, "the dog ate my homework".

Kurt_Bremser · Posted 04-28-2014 09:00 AM

Yeah right. But have you tried to search for "expand tabstops" in the SAS Product Documentation?

The result is revealing (empty).

Most of the time, it is easier to find the right SAS documents through google than through the search function on the SAS website.

On top of that, I entered the wrong phrase in google, searching for "replace tabs" instead of "expanding tabs" (I'm not a native English speaker).

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Vish33 · Posted 04-28-2014 06:50 AM

Hi,

You can say this file as an unorganized or formatted instead of complex file. You can ask the data provider to give the data with a proper delimiter (tab, ^,* etc..). OR else you have to do the data manipulation on formatting this data and then you can use infile or import statements.

Regards,

Vish

Ksharp · Posted 04-28-2014 09:15 AM

Sorry . I just realize you should use COLUMN input method.

data x;
infile 'c:\temp\cencty.txt' truncover expandtabs;
input v1 $ 1-48   
 v2 $ 49-52      
 v3 $ 57-60   
 v4 $ 65-66     
 v5 $ 73-74   
 v6 $ 81-85   ;
run;

Xia Keshan

Message was edited by: xia keshan

Message was edited by: xia keshan Really not easy.

Message was edited by: xia keshan

Kurt_Bremser · Posted 04-28-2014 09:29 AM

Hmm. That puts a lot of values in the wrong columns. v3 and v6 are always empty.

Edit: saw you made some changes.

cm(v4)=07 is missing in the line for Boston_Worcester or Chicago-Gary...

pmsa (v3) is read into v2

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 04-28-2014 11:40 AM

My $.02: Blaim a webmaster. The file description is fixed column text but since the data doesn't match the descriptor someone has jiggered the data part of the file and I suspect a web publishing system.

Registration is open

SAS Training: Just a Click Away