BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
maroulator
Obsidian | Level 7
I have data that looks like this (the bold font is for effect and ease of reading):

AK 1.0021709654943634 1.004141858960591 1.0059412003301735 1.0075022638405597 1.0089031671832394 1.010313115448636 1.011885064154461 1.0136625876807632 1.0156754189935162 1.0177783819813138 1.0197049657091506 1.0214077118629967 1.023113899728484 1.024961812812696 1.0270842976676144AL 1.0020384552635933 1.004089941400261 1.0060428591630741 1.0078924898011783 1.0097674543713253 1.0116008228367985 1.0134279114567353 1.0152999141462276 1.0173619485116154 1.0194078638786026 1.0213379906824893 1.023262316198122 1.0251279177514947 1.0270029541971306 1.0289116457759255 1.030797300045038 1.0326897516732034AR 1.0018708370386706 1.003678770371349 1.0053816820500132 1.0069865858906413 1.0086739002665075 1.0103784241907938 1.0121787224179806 1.0140531846329055 1.015902586876032 1.0177222600477227 1.019508177957654 1.0212688572482178 1.0229772574375544 1.0247276546883293 1.026411549887851 1.0281958089836294 1.0301388665899276 And I am trying to make it look like this, instead. AK 1.0021709654943634 1.004141858960591 1.0059412003301735 1.0075022638405597 1.0089031671832394 1.010313115448636 1.011885064154461 1.0136625876807632 1.0156754189935162 1.0177783819813138 1.0197049657091506 1.0214077118629967 1.023113899728484 1.024961812812696 1.0270842976676144 AL 1.0020384552635933 1.004089941400261 1.0060428591630741 1.0078924898011783 1.0097674543713253 1.0116008228367985 1.0134279114567353 1.0152999141462276 1.0173619485116154 1.0194078638786026 1.0213379906824893 1.023262316198122 1.0251279177514947 1.0270029541971306 1.0289116457759255 1.030797300045038 1.0326897516732034 AR 1.0018708370386706 1.003678770371349 1.0053816820500132 1.0069865858906413 1.0086739002665075 1.0103784241907938 1.0121787224179806 1.0140531846329055 1.015902586876032 1.0177222600477227 1.019508177957654 1.0212688572482178 1.0229772574375544 1.0247276546883293 1.026411549887851 1.0281958089836294 1.0301388665899276

 Can someone please propose code to accomplish what I am looking for above? I am at a loss.

 

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Cynthia_sas
SAS Super FREQ

Hi:

  Interesting that the OP program did not work with termstr=LF, but my program DID work with termstr=LF. When I tried my simple program, to capture the length of each data line and count the number of decimals:

data explore;
  infile 'c:\temp\HPA.txt' length=lg lrecl=32767 missover termstr=LF;
  ** read state and hold the data line;
  input state $ @;
     numdec = count(_infile_,'.');
     putlog _n_= lg= state= numdec=;
run;

here are the line lengths I got for each data line:

read_hpa.png

At least one of my data lines (for state=AR had a length of 13493, which agrees with Roger's length of one of his datalines.

 

I used termstr=LF in my code. So I think that is delimiting the records. I tried other values for termstr as shown below and none of these worked as well as termstr=LF:

 

--tried termstr=CRLF and termstr=CR neither of them did as well as termstr=LF

other_termstr.png

also tried termstr='0D0A'x and that did not work either:

term_0D0A.png

 

But the OP has not come back to answer Tom's question about whether there are always 18 numbers after every STATE value or to comment on the number of decimal points that I found on every data line -- it seems unusual to me that there are exactly 720 periods for every data line -- that seems too regular a pattern.

 

And my final program

data explore;
  infile 'c:\temp\HPA.txt' length=lg lrecl=32767 missover termstr=LF;
  ** read state and hold the data line;
  input state $ @;
  numdec = count(_infile_,'.');
  if state gt ' ' and lg gt 0 and numdec gt 0 then do;
     input val1-val720;
     putlog _n_= lg= state= val1= val2= val3= numdec=;
	 output;
  end;
run;

seemed to produce the desired SAS dataset....but required some exploration to come up with the right logic and variable counts.

 

cynthia

View solution in original post

16 REPLIES 16
ballardw
Super User

Is that first posted data in a SAS data set? If so lets see the results of proc contents.

Or is that the result of sending the data from SAS to text? If so show the code or describe the process used.. That result appears as if you are looking at a file formatted for one operating system with a different operating sytem viewer.

maroulator
Obsidian | Level 7

This might help clarify matters; what I posted earlier was a representation of a larger dataset. I attach this larger dataset below; I am trying to convert the attached .txt file into a SAS dataset that has the format I describe, in order to for it better feed into some other downstream processes that I am dealing with.

Tom
Super User Tom
Super User

You text file is easy to read with PC SAS. There is nothing wrong with the end of lines or the format (other than three blank lines at the end of the file).   There is no need to use PROC IMPORT or other tools to ask SAS to guess at what is in the file.  Just write a data step to read it.

 

Since you didn't specify what names you wanted for the variables I just made some up. Also your data file has three blank lines at the end so I added code to delete those.

%let path= C:\Downloads ;

data want;
  infile "&path\HPA.txt" truncover;
  length state $2 value1-value720 8 ;
  input state value1-value720 ;
  if state=' ' then delete ;
run;
Cynthia_sas
SAS Super FREQ
Hi:
Is your data in a SAS dataset? In an ASCII text file? In a CSV file? It looks like your data might not have LINE FEEDS in it.

What code have you tried? Where does this data come from? What do you need to do with it (other than make it look like there's only 1 state per row)?

cynthia
maroulator
Obsidian | Level 7

Cynthia,

 

Thanks for reaching out. See the response that I have to your colleague above; I will add that this dataset was generated by an internal system and I am trying to get it into the format I described in my original post to make some downstream processes run more smoothly. Hope this clarifies matters.

ballardw
Super User

Proc contents on the data set used and please describe how you generated the output text file that looks like one row, Code would be best.

Options might include:

Data step with PUT statement (s)

Proc Export (though I would expect some column header)

Proc Print, again would expect column headers

 

Or is that one row the result of the internal process and you have not done anything in SAS with it?

To reformat a file with SAS it has to be a data set at one point for most purposes.

 

If you attempted to read the file into SAS and didn't get the result you need, show the code with the log.

Also, what operating system did the process that generated that file use? Which operating system are you running SAS on?

Cynthia_sas
SAS Super FREQ

I am not sure that the file is just one row. When I open the HPA.TXT file with NoteTab Light, it looks like this: look_data_notetab.png

Note how AK, AL, and AR all start at the beginning of a new line. So I wonder if this is an issue of no CR or LF at the end of a row of data or whether this was generated on some system and then FTP'd incorrectly so the CR/LF did not get translated correctly. Or something.

 

Anyway, I -could- read the file. There are a LOT of numbers. I just did a simple

to count the number of decimals in the INFILE buffer and each "row" for each "state" had 720 decimals, which is how I came up with the ending number for the INPUT statement.

 

But there is still strangeness in the file. I used this to start:


data explore;
  infile 'c:\temp\HPA.txt' length=lg lrecl=32767 missover termstr=LF;
  ** read state and hold the data line;
  input state $ @;
     numdec = count(_infile_,'.');
     putlog _n_= lg= state= numdec=;
run;

just to get the number of decimal places (in order to read the variables after STATE) and this is what I got in the log:

show_log.png

So it might be that there is something weird at the end of the file, but given that STATE has the correct value for _n_ 1, 2, and 3, I think that it should be possible to read the file and just skip over the lg=1 or any lg=0 data lines that might be coming from odd control characters in the file.

 

This is what I tried

read_big_lines.png

and the proc print looks OK, but I did not even try to show all 720 numbers.

 

And, of course, this is only a sample of 3, but I would be tempted to go back to the folks who designed this process and have them check what is being put at the bottom of the file. And, possibly sweet talk them into using normal CR/LF. Bottom line, is that I don't think this ASCII text file is ever going to be "readable" in Notepad in any meaningful way.

 

cynthia

 

cynthia

maroulator
Obsidian | Level 7

Hi, sorry I was out of pocket for most of the weekend, so I got to this only just now.

 

I generated the output file that looks like one row (a.k.a "Output File") from an internal process; SAS has nothing to do with it, all I can see from this internal process is the Output File. A new file that I attach below is the log file generated by SAS after I attempted to import the Output File into PC SAS by using the import wizard. I am running PC SAS on Windows 7.

 

Tom
Super User Tom
Super User

Did you generate the file on your PC?  If not then how did you transfer the file to your PC?

Perhaps your file transfer method is what is removing the end of lines.  Or inserting extra end of lines after about 13K characters.  Perhaps your original file was generated without any characters between lines.  Of if you make it on an IBM mainframe then perhaps it used a file format that does not store the end of line characters in the data and your transfer method did not insert them in the right places.

 

There probably is no advantage to trying to use the guessing tool to import the data, but if you do use it you should check the box that says there is not a line with variable names at the top of the file.

 

Also it looks like the PROC IMPORT did not see any split between the state name and the first value so it tried to use them both as the name for the first column.  This means you probably told it to use the wrong character as the delimiter.

rogerjdeangelis
Barite | Level 11
SAS Forum: Splitting a long single record into 50 state lines

inspired by
https://goo.gl/yxQTBh
https://communities.sas.com/t5/General-SAS-Programming/Formating-txt-files-to-specification/m-p/342127

There are several places in SAS where you need a 'null' like
dataset or flatfile.
This is epecially true when you want to create
a dataset with 'proc sql' but you do not need or have a dataset.
or you are using 'input @@'.

I expect this type of problem to be a one liner in Perl or Python?

I copied your long record to flatfile

HAVE ( Your one line file and my 'null' flatfile )
=================================================

 'null' file d:\txt\empty.txt

      d:/txt/fylfat.txt  (one reord 3 bytes 'EOF')

   INPUTS

      Filename=d:\txt\empty.txt,
      File List=('d:\txt\fylfat.txt' 'd:\txt\empty.txt'),
      RECFM=V,LRECL=4096

      Filename=d:\txt\fylfat.txt,
      File List=('d:\txt\fylfat.txt' 'd:\txt\empty.txt'),
      RECFM=V,LRECL=4096

NOTE: 1 record was read from the infile FYLFAT.
      The minimum record length was 926.
      The maximum record length was 926.


AK 1.0021709654943634 1.004141858960591 ... 1.005941200330173592AL ... 1.0059412003301735
==                                                              =

WANT
====

Up to 40 obs from sasnrm total obs=3

Obs                       LYN

 1  AK 1.0021709654943634 1.004141858960591 ... 1.0059412003301735
 2  AL 1.0020384552635933 1.004089941400261 ... 1.0060428591630741
 3  AR 1.0018708370386706 1.003678770371349 ... 1.0053816820500132


Middle Observation(1 ) of sasnrm - Total Obs 3

 -- CHARACTER --
LYN             C    512     AK 1.00217096549

WORKING CODE
============

   filename fylfat ("d:/txt/fylfat.txt","d:/txt/empty.txt" ) lrecl=4096;
   input num @@;
   lyn=catx(' ',lyn,num);

FULL SOLUTION
=============

* create 'null' like dataset;
data _null_;
 file "d:/txt/empty.txt";
 put 'EOF';
run;quit;


filename fylfat ("d:/txt/fylfat.txt","d:/txt/empty.txt" ) lrecl=4096;

* create the input fat file;
data sasnrm;
length lyn $512;
retain lyn;
informat num $21.;
infile fylfat;
input num @@;
if lengthn(compress(num,'.','d'))>0 and _n_ > 1 then do;
   output;
   lyn='';
   num=compress(num,'.','d');
   lyn=catx(' ',lyn,num);
   num=compress(num,'.','kd');
   lyn=catx(' ',lyn,num);
end;
else do;
   lyn=catx(' ',lyn,num);
end;
if num=:'EOF' then output;
drop num;
run;quit;

Patrick
Opal | Level 21

Just opening the sample text file you've attached with Notepad++ reveals that your lines with numbers end with a UNIX style LF

Capture.PNG

 

Could it be that your downstream process expects a Windows style CRLF as end of record indicator? If this process is in SAS then use TERMSTR as part of your Infile statement. http://support.sas.com/kb/14/178.html 

 

Comment:

I'm referring to sample data HPA.txt. The larger sample file others refer to hasn't been available anymore in this thread.

 

 

 

 

 

 

 

 

rogerjdeangelis
Barite | Level 11

The larger file seems to have several issues

 


I adjusted my code to handle the full file. It seems to work.

filename fylfat ("d:/txt/txtbig.txt","d:/txt/empty.txt" ) lrecl=32756 recfm=v;
length lyn $32756;


I can simplify the code if I can be sure the CRLF are between all states.

Here is a proc contents showing 3 lines with more numbers.

R, Python and Perl allow entire files to to loaded into one variable(myfile) and
strsplit(myfile) would create a vector of 'words' for the entire file which you
could resphape into lines of x numbers and output or convert to numeric arrays of rows.
WPS does not limit the size of the SAS dataset coming out of R (or Python?) Observations 3 Alphabetic List of Variables and Attributes # Variable Type Len 1 LYN Char 13493 Note the 0D0A(crlf) in the 168th 80 byte chunk Another one for AL at 340th 80 byte chunk.

FYI this is output from by utlrulr macro, which I have posted several times.
A better way ti view hes id FSLIST, but you need full base SAS for that (not EG) --- Record Number --- 169 --- Record Length ---- 80 56006 2.2640534503322765 2.2666845677593237..AL 1.0020384552635933 1.00408994140 1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 33333232333333333333333323233333333333333330044232333333333333333323233333333333 5600602E264053450332276502E2666845677593237DA1C01E002038455263593301E00408994140
Tom
Super User Tom
Super User

If the lines are really that regular then it might be simple to fix.

If the lines actually do have LF at the ends of each record (which I doubt because your original program should have worked) then just read it and re-write it using CR-LF at the ends of the lines.

data _null_;
   infile 'oldfile' termstr=lf ;
   file 'newfile' termstr=crlf ;
   input;
   put _infile_;
run;

The first few numbers make it looks like the numbers are always 18 characters long.  Are there always the same numbers after each state? So if there are 18 numbers for each state then perhaps you can just read it like this?

data _null_;
   infile 'oldfile' recfm=n ;
   file 'newfile' termstr=crlf ;
   input state  ;
   put state $2. @ ;
   do i=1 to 15 ;
     input number ;
     put  +1 number best18. @;
  end;
  put;
run;

Otherwise read the thing character by character and when you see a letter then spit out an end of line.

data _null_;
   infile 'oldfile' recfm=f lrecl=1 end=eof;
   file 'newfile' termstr=crlf ;
   input ch $char1.  ;
   retain numch 0;
   numch = mod(numch + anyalpha(ch),3);
   if numch=1 and _n_> 1 then put ;
   if ch not in ('0A'x,'0D'x) then put ch $char1. @;
   if eof then put ;
run;
Cynthia_sas
SAS Super FREQ

Hi:

  Interesting that the OP program did not work with termstr=LF, but my program DID work with termstr=LF. When I tried my simple program, to capture the length of each data line and count the number of decimals:

data explore;
  infile 'c:\temp\HPA.txt' length=lg lrecl=32767 missover termstr=LF;
  ** read state and hold the data line;
  input state $ @;
     numdec = count(_infile_,'.');
     putlog _n_= lg= state= numdec=;
run;

here are the line lengths I got for each data line:

read_hpa.png

At least one of my data lines (for state=AR had a length of 13493, which agrees with Roger's length of one of his datalines.

 

I used termstr=LF in my code. So I think that is delimiting the records. I tried other values for termstr as shown below and none of these worked as well as termstr=LF:

 

--tried termstr=CRLF and termstr=CR neither of them did as well as termstr=LF

other_termstr.png

also tried termstr='0D0A'x and that did not work either:

term_0D0A.png

 

But the OP has not come back to answer Tom's question about whether there are always 18 numbers after every STATE value or to comment on the number of decimal points that I found on every data line -- it seems unusual to me that there are exactly 720 periods for every data line -- that seems too regular a pattern.

 

And my final program

data explore;
  infile 'c:\temp\HPA.txt' length=lg lrecl=32767 missover termstr=LF;
  ** read state and hold the data line;
  input state $ @;
  numdec = count(_infile_,'.');
  if state gt ' ' and lg gt 0 and numdec gt 0 then do;
     input val1-val720;
     putlog _n_= lg= state= val1= val2= val3= numdec=;
	 output;
  end;
run;

seemed to produce the desired SAS dataset....but required some exploration to come up with the right logic and variable counts.

 

cynthia

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 4900 views
  • 2 likes
  • 6 in conversation