Hi all,
I have a problem in reading the following data file in SAS. I have tried ... like for 2 hours but no luck. Any help will be appreciated.
https://www.census.gov/population/metro/files/lists/historical/cencty.txt
The data file and its layout are attached.
Bill
Does not look that hard. Just retain the MSA/CMSA level variables onto the records for the CITY/ENTITY level variables.
data msa_cty ;
infile 'c:\downloads\cencty.txt' expandtabs truncover;
input @ ;
if _infile_ ^=: ' ' then input MSANAME $1-48 MSA $49-52 CMSA $65-66 ;
else input CCNAME $1-48 PMSA $57-60 state $73-74 entity $81-85 ;
retain msaname msa cmsa ;
run;
Why don't you attach what you achieved so far, with comments/log?
You can't expect the community to create a whole program for you from scratch...
I'm working on it.
But if you are a US citizen, do something for your country: get ahold of the idiot who created that file and use a LART on him/her.
Hi,
Seems pretty straight-forward, the file structure is given at the end of the document. So read the whole thing in as $2000. Then post process the string into the separate parts as given by (drop everything after File layout and before Metropolitan Area/Central City :
File Layout:
Character Length Field
1-48 48 Metropolitan Area/Central City
49-52 4 4-Digit FIPS MSA/CMSA code
53-56 4 Blank
57-60 4 4-Digit FIPS PMSA code (blank for MSAs)
61-64 4 Blank
65-66 2 2-Digit FIPS Alternative CMSA code
67-72 6 Blank
73-74 2 2-Digit FIPS State code (blank for MSAs and CMSAs)
75-80 6 Blank
81-85 5 5-Digit FIPS Entity code (blank for MSAs and CMSAs)
His problem seem to be the tabstops in the file. They need to be expanded to the next column where mod(column,8) = 0.
At least I don't know of a simple SAS mechanism to do that.
Yep, can't think of anything myself right at the moment. You could find replace in Word then save that. So go into Word, show special characters, highlight the '->' and replace that with a pipe. Then you could also remove the '.' at the same time, both special charcters. Save it and it should look something like:
Abilene,TXMSA0040||
Abilene(part),TX||||48|01000
Aguadilla,PRMSA|0060||||
Aguadillazonaurbana,PR||||72|00745
Albany,GAMSA|0120||||
Albany,GA||||13|01052
Albany-Schenectady-Troy,NYMSA|0160
Albany,NY||||36|01000
Schenectady,NY||||36|65508
Troy,NY||||36|75484
SaratogaSprings,NY||||36|65255
Albuquerque,NMMSA|0200||||
Albuquerque,NM||||35|02000
Alexandria,LAMSA|0220||||
Alexandria,LA||||22|00975
Allentown-Bethlehem-Easton,PAMSA|0240||||
Allentown,PA||||42|02000
Bethlehem,PA||||42|06088
Altoona,PAMSA|0280||||
Try this:
data cencty;
infile '$HOME/sascommunity/cencty.txt' truncover lrecl=100;
length inp_line $ 100;
input;
inp_line = _infile_; *have to do this, or the first 2 blanks are eaten;
*expand the tabs;
i = indexc(inp_line,byte(9));
do while (i ne 0);
x = mod(i-1,8);
inp_line = substr(inp_line,1,i-1) !! repeat(' ',7-x) !! substr(inp_line,i+1);
i = indexc(inp_line,byte(9));
end;
area = substr(inp_line,1,48);
cmsa = substr(inp_line,49,4);
pmsa = substr(inp_line,57,4);
cm = substr(inp_line,65,2);
ss = substr(inp_line,73,2);
eeeee = substr(inp_line,81,5);
drop i x inp_line;
run;
You may want do some grouping along the cmsa-level cities
RTM. Why don't you try the INFILE statement option EXPANDTABS.
Because google didn't find it for me
OTOH, my method can be adapted for different tab lengths. EXPANDTABS only does 8-character tabs.
KurtBremser wrote:
Because google didn't find it for me
I reckon that's the new, "the dog ate my homework".
Yeah right. But have you tried to search for "expand tabstops" in the SAS Product Documentation?
The result is revealing (empty).
Most of the time, it is easier to find the right SAS documents through google than through the search function on the SAS website.
On top of that, I entered the wrong phrase in google, searching for "replace tabs" instead of "expanding tabs" (I'm not a native English speaker).
Hi,
You can say this file as an unorganized or formatted instead of complex file. You can ask the data provider to give the data with a proper delimiter (tab, ^,* etc..). OR else you have to do the data manipulation on formatting this data and then you can use infile or import statements.
Regards,
Vish
Sorry . I just realize you should use COLUMN input method.
data x; infile 'c:\temp\cencty.txt' truncover expandtabs; input v1 $ 1-48 v2 $ 49-52 v3 $ 57-60 v4 $ 65-66 v5 $ 73-74 v6 $ 81-85 ; run;
Xia Keshan
Message was edited by: xia keshan
Message was edited by: xia keshan Really not easy.
Message was edited by: xia keshan
Hmm. That puts a lot of values in the wrong columns. v3 and v6 are always empty.
Edit: saw you made some changes.
cm(v4)=07 is missing in the line for Boston_Worcester or Chicago-Gary...
pmsa (v3) is read into v2
My $.02: Blaim a webmaster. The file description is fixed column text but since the data doesn't match the descriptor someone has jiggered the data part of the file and I suspect a web publishing system.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.