Hello,
so I need help reading and formatting the output into the following.
For example if i have these datalines
01 Alabama e 6 112 -27 3893888 77 35.600 60.035 4801
346
02 Alaska w 9 100 -80 401851 1
0.400 64.344 6410 858
04 Arizona w 8 127
-40 2718215 24 53.300 83.832 7614 257
* THE DATA LINES ARE SUPPOSED TO BE READ IN LIKE ABOVE IN MULTIPLE LINES.
I want to assign it variable names and format the output so that it looks like this.
stcod stname east/west CBcode Hightemp Lowtemp Pop80 popsqm plandag purbpop crimerate BTU
01 Alabama e 6 112 -27 3893888 77 35.600 60.035 4801 346
02 Alaska w 9 100 -80 401851 1 0.400 64.344 6410 858
04 Arizona w 8 127 -40 2718215 24 53.300 83.832 7614 257
Please help!
Assign an informat to the character variables. Then just use an INPUT statement listing the vars in order.
Caveat: you MUST have complete raw data. I.e., every variable must have a value in the raw data. For numeric vars, this means you'd have to have a dot (.) as a missing value placeholder. But using a blank as a missing value placeholder for the character vars would cause program failure.
Regards,
Mark
data want;
informat stname $15. ew $1.;
input stcod stname ew CBcode Hightemp Lowtemp Pop80 popsqm plandag purbpop crimerate BTU ;
datalines;
01 Alabama e 6 112 -27 3893888 77 35.600 60.035 4801
346
02 Alaska w 9 100 -80 401851 1
0.400 64.344 6410 858
04 Arizona w 8 127
-40 2718215 24 53.300 83.832 7614 257
run;
nrb sent me this followup question: How to deal with multi-part state names
so you just commented on one of the questions I posted, one issue I have is that i have other data lines who have spaces in the statesnames for example north carolina. So the stnames i provided were just single so how would I go about that? The problem now is the for the dataline below i'm only getting New and it's excluding the Hampshire part. Please let me know. THANK YOU SO MUCH!
For example if the dataline is
33 New Hampshire e 1 106 -46 920610 102
9.400 52.175 4235 200
My suggestion is to read the STCOD and the next two text fields, to be named STNAME and _DUMMYTXT, and append a trailing "@" on the input statement, which tells SAS to hold the column pointer. The result is that the next INPUT statement picks up on the same line at the column pointer. Without the trailing "@" the next INPUT statement would read the following LINE, not the following FIELD.
Now _DUMMYTXT is either going to be the second word in a state name (e.g. "Hampshire" in the example above) or single letter ("e" or "w"). If it's the earlier, concatenate the txt to the state name, and read in a single text field again with the trailing "@", which will pick up the "e" or "w". But if _DUMMYTXT is already an "e" or "w" assign it to variable EW.
At this point read the remaining vars as before.
regards,
Mark
Sample program:
data want (drop=_dummytxt);
informat stcod 2. stname _dummytxt $15. ew $1.;
/* Input the first 3 fields and hold the colum pointer. */
/* pointing right after field 3 ... ready for next INPUT*/
input stcod stname _dummytxt @;
/* If two-part statename, concatenate them and input ew */
if not(_dummytxt in ('e','w')) then do;
stname=catx(' ',stname,_dummytxt);
input ew @;
end;
/* Otherwise copy ew*/
else ew=_dummytxt;
/* Read the rest of the variables */
input CBcode Hightemp Lowtemp Pop80 popsqm plandag purbpop crimerate BTU ;
datalines;
01 Alabama e 6 112 -27 3893888 77 35.600 60.035
4801 346
02 Alaska w 9 100 -80 401851 1 0.400
64.344 6410 858
04 Arizona w 8 127 -40 2718215 24
53.300 83.832 7614 257
33 New Hampshire e 1 106 -46 920610 102
9.400 52.175 4235 200
run;
Since your data has length less than 8 you can simply run this code.
data one;
infile datalines;
input stcod stname $ east_west $ CBcode Hightemp Lowtemp Pop80 popsqm plandag purbpop crimerate BTU ;
datalines;
01 Alabama e 6 112 -27 3893888 77 35.600 60.035 4801
346
02 Alaska w 9 100 -80 401851 1
0.400 64.344 6410 858
04 Arizona w 8 127
-40 2718215 24 53.300 83.832 7614 257
run;
Input statement looks for the data in 1st row. Since the row dosen't have all the values it jumps to next line.
In the log you can see:
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
I suspect that we are missing part of the puzzle hear. Is the original text file supposed to be used for data interchange or was it something that was printed or formatted in a different manner and possibly pasted into another application (Excel perhaps) that has now removed some of the formatting information? Almost every data interchange format would have some way of identifying the end of a field especially one that has embedded blanks. Typically the methods would be a delimiter such as tab or comma, fixed column widths and or quotes around the text that contains spaces.
The irregular number of variables appearing on the example lines is another indicator of something "odd" having possibly been done with this file somewhere in its history.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.