BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Leon27607
Fluorite | Level 6

For the most part I have been reading in data horizontally but I now have a data set that has data written vertically. I am having trouble reading this data correctly. This data file has 8000+ observations but I will simply list out the first few so you know what I'm talking about.

In the file, there are names, addresses, and telephone numbers of most of the child care centers in North Carolina (total ~ 8000).  Each center takes six records in the file.  The first record is the eight digit license number; the second record is the name; the third is the street address; the fourth record is the town, state (NC), and zip code in columns 31-35.  In the fifth record is the telephone number in the form (xxx) xxx-xxxx where the first field is the area code.  The sixth record gives what class of center (e.g. 'Five Star') and what type of license.  On this last record, we only need to read in the class, which is the first field in that record.  Most of the values of class are 'One' to 'Five' while other values include 'Temporary' and a statute designation. I need to write a SAS program to read in this information, where each center corresponds to an observation with (at least) variables for the license number, zip code, area code, and class of center (character variable). 

I have tried a good amount of things but still have not been able to get this to work correctly. I am mostly dealing with simply

data child;

infile 'file location' ;

input ______;

run;

I am allowed to use @@ or a do loop and also keep, drop commands. Some other things that may be mentioned here on the forums, I may not have learned yet.

EDIT: I have now attached the file to my question. Also, if you're interested in helping me well this is actually course work for a class found at http://www.stat.ncsu.edu/people/monahan/courses/st445/

if you want to see the things we've done so I don't step out my bounds. I know that I am supposed to "Figure it out by myself" but for these kind of things I learn best by example. I did my previous homework for this class by using the examples he showed us in class. However, for this assignment (Homework 3) I have spent a few hours on it but still cannot figure it out. I would appreciate it if someone could at least get me started, I understand that inputting the file is the hardest part. I can probably do the other parts by myself just fine.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Can't you just read those columns as extra variables?  Just cause you already read them doesn't mean you cant re-read them.

P.S.  Sorry to everyone for using up so much of our bandwidth, but I'd really like to see the OP solve this on his own and I only know how to accomplish that by providing answers to questions.

View solution in original post

39 REPLIES 39
Leon27607
Fluorite | Level 6

I just tried doing multiple input statements but my log still gives me errors. I tried this just to see what would happen.

data child;

infile 'E:\School Stuff\ST445\child4.txt' obs=2 pad;

input centerid;

input name $ 1-36;

input address $ 1-36;

input townstatezip $ 1-36 ;

input phone $ 1-36 ;

input class $ 1-36;

put _all_;

run;

Also the log will tell me that I have a "Lost card" what does this mean exactly? Log: "

NOTE: LOST CARD.

centerid=68000482 name=CHILDREN'S CAMPUS OF CHAPEL HILL II address=  townstatezip=  phone=  class=

_ERROR_=1 _N_=1

NOTE: 2 records were read from the infile 'E:\School Stuff\ST445\child4.txt'.

      The minimum record length was 8.

      The maximum record length was 35."

PGStats
Opal | Level 21

Start with Tom's suggestion below and build from there. Its the wise thing to do.

PG

PG
art297
Opal | Level 21

I disagree.  If his data are as clean as his example, the code I suggested witl do everything he needs in one pass:

data want;

  informat id $10.;

  informat org address1 citystate $60.;

  informat areacode $3.;

  informat phone $10.;

  informat license $50.;

  infile "c:\art\havedata.txt";

  input id

       /org &

       /address1 &

       /citystate & zipcode

       / @2 areacode  @7 phone

       /license;

run;

art297
Opal | Level 21

If you really have exactly 6 records per file, and each captures the same fields, you could use something like:

data have;

  informat stuff $80.;

  input stuff &;

  cards;

68000482

CHILDREN'S CAMPUS OF CHAPEL HILL II

1620 MLK JR BLVD

CHAPEL HILL, NC               27514

(919) 967-5020

Temporary License

92002687

"HAPPY FACES" CHILD CARE CENTER

4500 EMMIT DRIVE

RALEIGH, NC                   27604

(919) 231-0783

One Star Family CC Home License

60002518

'A' IS FOR APPLE

11919 PLANTERS ESTATES DRIVE

CHARLOTTE, NC                 28278

(704) 504-8977

Five Star Center License

32001484

'A' PRECIOUS ANGEL DAYCARE

916 SOUTH MINERAL SPRINGS ROAD

DURHAM, NC                    27703

(919) 957-8496

Three Star Family CC Home License

64000406

'AGAPE' CENTER OF LOVE'

203 N WALNUT STREET

SPRING HOPE, NC               27882

(252) 885-6249

GS 110-106

;

data want (drop=stuff);

  set have;

  retain id org add: phone;

  if mod(_n_,6) eq 1 then id=strip(stuff);

  else if mod(_n_,6) eq 2 then org=strip(stuff);

  else if mod(_n_,6) eq 3 then address1=strip(stuff);

  else if mod(_n_,6) eq 4 then address2=strip(stuff);

  else if mod(_n_,6) eq 5 then phone=strip(stuff);

  else do;

    license=stuff;

    output;

  end;

run;

Leon27607
Fluorite | Level 6

Well it's not exactly like that, It's one big file with 8000+ records. I just gave an example of what the file looks like. It's data in that format all the way down with a total of ~8000 "observations"(this is with all the 6 variables). I also have to somehow split the zip from the city and state, and split the area code from the phone number. I have to use an infile statement instead of cards because of such a huge data file. I don't believe I've learned what "informat" does and also what the & does in the input statement. I've done mostly either character variables(denoted by $) or numeric(no format or the default). But the code at the bottom does seem helpful and I think I might have to do something along those lines.

art297
Opal | Level 21

If your entire file is as clean as your example data, then you might be able to get away with code as simple as:

data want;

  informat id $10.;

  informat org address1 citystate $60.;

  informat areacode $3.;

  informat phone $10.;

  informat license $50.;

  infile "c:\art\havedata.txt";

  input id

       /org &

       /address1 &

       /citystate & zipcode

       / @2 areacode  @7 phone

       /license;

run;

Leon27607
Fluorite | Level 6

We haven't done anything with "informat" yet so I don't think we can use that, also we can't use formatting with the "/" command. By the way I tested your code to see if it would work but it still doesn't. I can attach the actual file to my original post if that would help.

Tom
Super User Tom
Super User

Start simple.  Read the 6 lines into 6 character variables so that you can make sure the format is consistent.

data want ;

   infile 'myfile' truncover ;

   input #1 line1 $200.

          #2 line2 $200.

          #3 line3 $200.

          #4 line4 $200.

          #5 line5 $200.

          #6 line6 $200.

  ;

run;

Leon27607
Fluorite | Level 6

I just added/edited some details to my original post, I don't know if doing so will let others know that I did so, so I'm making a post about it.

art297
Opal | Level 21

Using #1 etc., like Tom suggested, is the same thing as using / to change records.  Formally defining informats with informat statements is the same thing as putting them on the input line.

Try the following.  I believe it covers all of your exceptions:

data want;

  infile "c:\art\child4.txt" truncover;

  input #1 id $10.

        #2 org & : $60.

        #3 address1 & : $60.

        #4 citystate & : $60. zipcode $10.

        #5 @2 areacode $3. @7 phone $8.

        #6 license & : $50.;

run;

Leon27607
Fluorite | Level 6

Well this "works" but it uses things we haven't learned so I'm pretty sure the teacher would be suspicious about what I did lol...

Patrick
Opal | Level 21

The code you've previously posted would have worked.

The issue you had there was "obs=2" in the infile statement but at the same time 6 input statements (=trying to read 6 lines in one iteration of the datastep). That's where the "lostcard" came from.

Below code will run. I prefer "truncover" over "pad" - but both is o.k. in your case.

data want;
/*  infile 'C:\temp\child4.txt' truncover lrecl=100;*/
  infile 'C:\temp\child4.txt' /*obs=6*/ pad;
  input centerid 15.;
  input name $ 1-36;
  input address $ 1-36;
  input townstatezip $ 1-36 ;
  input phone $ 1-36 ;
  input class $ 1-36;
run;

proc print data=want;
run;

data want;
/*  infile 'C:\temp\child4.txt' truncover lrecl=100;*/
  infile 'C:\temp\child4.txt' /*obs=6*/ pad;
  input centerid 15.;
  input name $ 1-36;
  input address $ 1-36;
  input townstatezip $ 1-36 ;
  input phone $ 1-36 ;
  input class $ 1-36;
run;

proc print data=want;
run;

Leon27607
Fluorite | Level 6

Thanks, now I just have to figure out how to get the 4 he really wants, I know I need to use "drop" for some but I still have to somehow seperate the zip code from the city, state and the area code from the full phone number.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 39 replies
  • 1172 views
  • 3 likes
  • 6 in conversation