DATA Step, Macro, Functions and more

Reading in multiple data lines

Reply
Occasional Contributor nrb
Occasional Contributor
Posts: 11

Reading in multiple data lines

[ Edited ]

Hello, 

 

so I need help reading and formatting the output into the following. 

For example if i have these datalines

01 Alabama e 6 112 -27 3893888 77 35.600 60.035 4801
346 
02 Alaska w 9 100 -80 401851 1
0.400 64.344 6410 858 
04 Arizona w 8 127
-40 2718215 24 53.300 83.832 7614 257  

* THE DATA LINES ARE SUPPOSED TO BE READ IN LIKE ABOVE IN MULTIPLE LINES.

I want to assign it variable names and format the output so that it looks like this. 

stcod stname   east/west CBcode Hightemp Lowtemp Pop80        popsqm plandag purbpop crimerate BTU 

  01     Alabama   e              6            112          -27           3893888    77           35.600    60.035    4801      346           

02     Alaska      w              9           100           -80          401851      1              0.400       64.344    6410     858           

04    Arizona     w               8          127          -40            2718215    24            53.300     83.832    7614     257           

 

Please help! 

Valued Guide
Posts: 797

Re: Reading in multiple data lines

Assign an informat to the character variables.  Then just use an INPUT statement listing the vars in order.

 

Caveat: you MUST have complete raw data.  I.e., every variable must have a value in the raw data.  For numeric vars, this means you'd have to have a dot (.) as a missing value placeholder.  But using a blank as a missing value placeholder for the character vars would cause program failure.

 

Regards,

Mark

data want;
  informat stname $15.   ew $1.;
  input stcod stname   ew CBcode Hightemp Lowtemp Pop80        popsqm plandag purbpop crimerate BTU ;
datalines;
01 Alabama e 6 112 -27 3893888 77 35.600 60.035 4801
346 
02 Alaska w 9 100 -80 401851 1
0.400 64.344 6410 858 
04 Arizona w 8 127
-40 2718215 24 53.300 83.832 7614 257  
run;
Valued Guide
Posts: 797

Re: Reading in multiple data lines

nrb sent me this followup question:  How to deal with multi-part state names

 

so you just commented on one of the questions I posted, one issue I have is that i have other data lines who have spaces in the statesnames for example north carolina. So the stnames i provided were just single so how would I go about that? The problem now is the for the dataline below i'm only getting New and it's excluding the Hampshire part. Please let me know. THANK YOU SO MUCH!

For example if the dataline is 

 

33 New Hampshire e 1 106 -46 920610 102
9.400 52.175 4235 200

 

My suggestion is to read the STCOD and the next two text fields, to be named STNAME and _DUMMYTXT, and append a trailing "@" on the input statement, which tells SAS to hold the column pointer.  The result is that the next INPUT statement picks up on the same line at the column pointer.  Without the trailing "@" the next INPUT statement would read the following LINE, not the following FIELD.

 

Now _DUMMYTXT is either going to be the second word in a state name (e.g. "Hampshire" in the example above) or single letter ("e" or "w").  If it's the earlier, concatenate the txt to the state name, and read in a single text field again with the trailing "@", which will pick up the "e" or "w".  But if _DUMMYTXT is already an "e" or "w" assign it to variable EW.

 

At this point read the remaining vars as before.

 

regards,

Mark

Sample program:

data want (drop=_dummytxt);
  
  informat stcod 2. stname _dummytxt $15.   ew $1.;

  /* Input the first 3 fields and hold the colum pointer. */
  /* pointing right after field 3 ... ready for next INPUT*/
  input stcod stname _dummytxt @;

  /* If two-part statename, concatenate them and input ew */
  if not(_dummytxt in ('e','w')) then do;
    stname=catx(' ',stname,_dummytxt);
    input ew @;
  end;
  /* Otherwise copy ew*/
  else ew=_dummytxt;

  /* Read the rest of the variables */
  input CBcode Hightemp Lowtemp Pop80 popsqm plandag purbpop crimerate BTU ;
datalines;
01 Alabama  e 6 112 -27 3893888 77 35.600 60.035 
4801 346 
02 Alaska  w 9 100 -80 401851   1  0.400 
64.344 6410 858 
04 Arizona  w 8 127 -40 2718215 24 
53.300 83.832 7614 257
33 New Hampshire e 1 106 -46 920610 102
9.400 52.175 4235 200 
run;

 

Frequent Contributor
Posts: 127

Re: Reading in multiple data lines

Since your data has length less than 8 you can simply run this code.

 

data one;

infile datalines;

input stcod stname $ east_west $ CBcode Hightemp Lowtemp Pop80 popsqm plandag purbpop crimerate BTU ;

datalines;

01 Alabama e 6 112 -27 3893888 77 35.600 60.035 4801

346

02 Alaska w 9 100 -80 401851 1

0.400 64.344 6410 858

04 Arizona w 8 127

-40 2718215 24 53.300 83.832 7614 257

run;    

 

Input statement looks for the data in 1st row. Since the row dosen't have all the values it jumps to next line.

In the log you can see:

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

 

 

Thanks,
Suryakiran
Super User
Posts: 10,521

Re: Reading in multiple data lines

I suspect that we are missing part of the puzzle hear. Is the original text file supposed to be used for data interchange or was it something that was printed or formatted in a different manner and possibly pasted into another application (Excel perhaps) that has now removed some of the formatting information? Almost every data interchange format would have some way of identifying the end of a field especially one that has embedded blanks. Typically the methods would be a delimiter such as tab or comma, fixed column widths and or quotes around the text that contains spaces.

 

The irregular number of variables appearing on the example lines is another indicator of something "odd" having possibly been done with this file somewhere in its history.

Ask a Question
Discussion stats
  • 4 replies
  • 184 views
  • 0 likes
  • 4 in conversation