Re: Reading in multiple data lines

nrb · Posted 11-14-2016 01:22 PM

Hello,

so I need help reading and formatting the output into the following.

For example if i have these datalines

01 Alabama e 6 112 -27 3893888 77 35.600 60.035 4801
346
02 Alaska w 9 100 -80 401851 1
0.400 64.344 6410 858
04 Arizona w 8 127
-40 2718215 24 53.300 83.832 7614 257

* THE DATA LINES ARE SUPPOSED TO BE READ IN LIKE ABOVE IN MULTIPLE LINES.

I want to assign it variable names and format the output so that it looks like this.

stcod stname east/west CBcode Hightemp Lowtemp Pop80 popsqm plandag purbpop crimerate BTU

01 Alabama e 6 112 -27 3893888 77 35.600 60.035 4801 346

02 Alaska w 9 100 -80 401851 1 0.400 64.344 6410 858

04 Arizona w 8 127 -40 2718215 24 53.300 83.832 7614 257

Please help!

mkeintz · Posted 11-14-2016 01:38 PM

Assign an informat to the character variables. Then just use an INPUT statement listing the vars in order.

Caveat: you MUST have complete raw data. I.e., every variable must have a value in the raw data. For numeric vars, this means you'd have to have a dot (.) as a missing value placeholder. But using a blank as a missing value placeholder for the character vars would cause program failure.

Regards,

Mark

data want;
  informat stname $15.   ew $1.;
  input stcod stname   ew CBcode Hightemp Lowtemp Pop80        popsqm plandag purbpop crimerate BTU ;
datalines;
01 Alabama e 6 112 -27 3893888 77 35.600 60.035 4801
346 
02 Alaska w 9 100 -80 401851 1
0.400 64.344 6410 858 
04 Arizona w 8 127
-40 2718215 24 53.300 83.832 7614 257  
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 11-14-2016 06:12 PM

nrb sent me this followup question: How to deal with multi-part state names

so you just commented on one of the questions I posted, one issue I have is that i have other data lines who have spaces in the statesnames for example north carolina. So the stnames i provided were just single so how would I go about that? The problem now is the for the dataline below i'm only getting New and it's excluding the Hampshire part. Please let me know. THANK YOU SO MUCH!

For example if the dataline is

33 New Hampshire e 1 106 -46 920610 102
9.400 52.175 4235 200

My suggestion is to read the STCOD and the next two text fields, to be named STNAME and _DUMMYTXT, and append a trailing "@" on the input statement, which tells SAS to hold the column pointer. The result is that the next INPUT statement picks up on the same line at the column pointer. Without the trailing "@" the next INPUT statement would read the following LINE, not the following FIELD.

Now _DUMMYTXT is either going to be the second word in a state name (e.g. "Hampshire" in the example above) or single letter ("e" or "w"). If it's the earlier, concatenate the txt to the state name, and read in a single text field again with the trailing "@", which will pick up the "e" or "w". But if _DUMMYTXT is already an "e" or "w" assign it to variable EW.

At this point read the remaining vars as before.

regards,

Mark

Sample program:

data want (drop=_dummytxt);
  
  informat stcod 2. stname _dummytxt $15.   ew $1.;

  /* Input the first 3 fields and hold the colum pointer. */
  /* pointing right after field 3 ... ready for next INPUT*/
  input stcod stname _dummytxt @;

  /* If two-part statename, concatenate them and input ew */
  if not(_dummytxt in ('e','w')) then do;
    stname=catx(' ',stname,_dummytxt);
    input ew @;
  end;
  /* Otherwise copy ew*/
  else ew=_dummytxt;

  /* Read the rest of the variables */
  input CBcode Hightemp Lowtemp Pop80 popsqm plandag purbpop crimerate BTU ;
datalines;
01 Alabama  e 6 112 -27 3893888 77 35.600 60.035 
4801 346 
02 Alaska  w 9 100 -80 401851   1  0.400 
64.344 6410 858 
04 Arizona  w 8 127 -40 2718215 24 
53.300 83.832 7614 257
33 New Hampshire e 1 106 -46 920610 102
9.400 52.175 4235 200 
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SuryaKiran · Posted 11-14-2016 01:58 PM

Since your data has length less than 8 you can simply run this code.

data one;

infile datalines;

input stcod stname $ east_west $ CBcode Hightemp Lowtemp Pop80 popsqm plandag purbpop crimerate BTU ;

datalines;

01 Alabama e 6 112 -27 3893888 77 35.600 60.035 4801

346

02 Alaska w 9 100 -80 401851 1

0.400 64.344 6410 858

04 Arizona w 8 127

-40 2718215 24 53.300 83.832 7614 257

run;

Input statement looks for the data in 1st row. Since the row dosen't have all the values it jumps to next line.

In the log you can see:

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

Thanks,
Suryakiran

ballardw · Posted 11-16-2016 04:40 PM

I suspect that we are missing part of the puzzle hear. Is the original text file supposed to be used for data interchange or was it something that was printed or formatted in a different manner and possibly pasted into another application (Excel perhaps) that has now removed some of the formatting information? Almost every data interchange format would have some way of identifying the end of a field especially one that has embedded blanks. Typically the methods would be a delimiter such as tab or comma, fixed column widths and or quotes around the text that contains spaces.

The irregular number of variables appearing on the example lines is another indicator of something "odd" having possibly been done with this file somewhere in its history.

Reading in multiple data lines