Help using Base SAS procedures

Help Reading in Data

Reply
N/A
Posts: 0

Help Reading in Data

Seems like there should be a simple solution here, but I'm struggling to find it.

Trying to read in data from a .dat file. Here's an example of the first 2 entries:


Generic College
1234 Main St
New York, NY 12345
718 555-1234
One Star Liberal Arts
Another Generic College
567 Main St
Boston, MA 23456
617 555-5678
Three Star Engineering


Each entry has School Name on the first line, followed by street address then city, st & zip code followed by area code & phone # & then a generic rating scale (one-five star) & the focus of the school

How I tried to read in the data:
input name $ ;
input address $ ;
input city $ st $ zip $ ;
input areacode $ phone $ ;
input star $1-10 focus $ ;

I'm using $ for all since I'm not actually manipulating the area or zip codes as numerical factors. I'm also using input in this fashion as I thought it would input in each line just as I wrote it and then I do do some kind of do loop to repeat throughout the rest of the data (a few thousand entries). However, it's not quite outputting it as planned. Iit's only reading info on each line up to the first space (ie - 1234 as the address instead of 1234 main st).


I'm at a loss of how to read in the data when each entry is separated on multiple lines. Each address & name and city is differing in length, so it's not like I can just use "input name $1-34." I think if I can get over this small hurdle that I can use a do loop to read in successive entries. Every dataset I've used/seen with SAS has had info posted in a very vanilla fashion:
name address city state zip phone stars focus
Where all info for each entry is on the same line, making it easier to read in


Any suggestions/advice for me to follow?
Super Contributor
Posts: 266

Re: Help Reading in Data

Try using a line pointer (/);
input name $ / address $ / city $ / etc.

the slash skips it to the next line of data.
Super Contributor
Posts: 260

Re: Help Reading in Data

Gregg is right : use slashes to go and read the next line.
To avoid stopping on the first space, specify the informats this way (not just $ but $ followed by the maximum length you would expect). The : before the $ tells SAS to stop when it encounters the delimiter (specify one in your INFILE statement, because default is blank !) or the end of the line.
[pre]
input name :$100. / address :$20. / city :$50. st $2. +1 zip $5.
/ areacode $3. +1 phone $8. / star $10. focus :$50. ;[/pre]
In this code, I use : when I don't know how long the values will be, but if I know the exact length (as for State, Zipcode of phone numbers) I don't mention any so that SAS reads exactly that number of characters.

Hope it helps.
Olivier
N/A
Posts: 0

Re: Help Reading in Data

Thanks for the quick responses!

Quick question on the deilimiter. What should I specify it as? On the first & second line (name & address), it's just the end of the line that tells me it's time for the next entry, On the other 3rd line I'd have a comma to tell me the city is done and the last line has only a single space separating the star level with the school focus (on the final line,I'm thinking of counting the word "star" as it's own category and just dropping it so only the number shows in the output)
N/A
Posts: 0

Re: Help Reading in Data

Update

Code I've used to get to this point:
data school ;
infile 'school.dat' DLM=' ' dsd;
input name & :$100. / address & :$30. / city & :$20. st $ zip $ 31-35
/ areacode $3. phone $8. / star $ star2 $focus & :$50. ;
drop star2;
run ;
proc print data=school (obs=10);
title 'Generic Title' ;
run ;


This gives me everything from zip code to the focus without issue. It also gives me the address without any problems

However, the state is always connected to the city in the output (and so nothing shows up under state). Some of the schools listed also have nicknames attached (such as "The" Ohio State University). On these such schools, the only part of the name that shows up is The (without the quotation marks)

Any more ideas on how to handle these 2 smaller issues?
Super User
Posts: 10,466

Re: Help Reading in Data

You can have multiple delimiters specified. So DLM=' ,' should treat the comma between city and state as desired.
N/A
Posts: 0

Re: Help Reading in Data

Thanks ballardw, that did fix the issue with the states
N/A
Posts: 0

Re: Help Reading in Data

1 last question

I'm trying to turn the alphabetical spelling of the stars (one, two, etc) into numeric characters

If I don't input the star rating as a character variable, it doesn't show up in the output (specifically in proc print, the star column is all .'s)

If I try to do if thens (if star=one then stars =1, etc), then I'm told that variable one (all the way through five) are uninitialized


Seems like excel's find & replace function would be easier than using SAS here. What am I missing?
Regular Contributor
Posts: 165

Re: Help Reading in Data

You are missing quotes around one i.e. star="one". Keep in mind that matching in SAS is case sensitive. You might want to do if upcase(star)="ONE". There might also be a SAS informat to read character versions of numbers but I'm not as sure about that.
N/A
Posts: 0

Re: Help Reading in Data

Thanks RickM! Evidently I've been staring at these 10,000 entries far too long today
Super User
Posts: 9,662

Re: Help Reading in Data

If you want very five rows to be a observation, then would be easy.


[pre]
data temp(drop=_: position length);
infile datalines truncover;
input #1 name $100.
#2 address $100.
#3 _city $100.
#4 areacode : $10. phone : $20.
#5 _star $100.;
city=scan(_city,1,',');
st=scan(_city,-2,', ');
zip=scan(_city,-1,', ');
star=catx(' ',scan(_star,1),scan(_star,2));
call scan(_star,3,position,length);
focus=substr(_star,position);
datalines;
Generic College
1234 Main St
New York, NY 12345
718 555-1234
One Star Liberal Arts
Another Generic College
567 Main St
Boston, MA 23456
617 555-5678
Three Star Engineering
;
run;
[/pre]




Ksharp
Ask a Question
Discussion stats
  • 10 replies
  • 172 views
  • 0 likes
  • 6 in conversation