I'm parsing data from a txt file and formatting into a csv. Part of that, is to combine multiple observations into a single row. There is no common identifier between the two rows, so I'm trying to use the difference in the number of fields present(https://communities.sas.com/t5/General-SAS-Programming/Merging-observations-into-a-single-one/m-p/41... )
Some of the lines I'm looking to output, have some of their fields missing(those prefixed with NODE or INT in my code). Which is fine, as I don't need them for the final csv file, but I can't seem to work around how to handle that missing data.
If both NODE and the INT fields are missing, it returns data, however if just NODE is missing, that line doesn't get returned.
Its also truncating the leading spaces caused by these missing fields, throwing my other fields out of whack. I've tried using $CHAR132; in my length statement, but get his error –
ERROR 391-185: Expecting a variable length specification.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
Ultimately my output csv should look like –
Dataset, HeldNAME, HeldNUM, HeldDate, HeldTIME, HeldJESID, HeldFROM, HeldFOR, HolderNAME, HolderJESID, HolderAT
With the Dataset value being the third value(i.e. the ??? in this example) on the line that begins with "DATASET NAME: ???"
This is where I'm at thus far -
data File1;
retain dataset;
infile IFILE missover;
length NODE IntDATE IntFROM IntFOR
HeldNAME HeldNUM HeldDate HeldTIME HeldJESID
HeldFROM HeldFOR HeldF HeldS HeldC
HolderNAME HolderT HolderJESID HolderH HolderAT
$132;
input NODE -- HolderAT;
if find(IntDATE,':','i') ge 1 OR
NODE = '--NODE--' OR
NODE = 'NAME ' OR
NODE = '--------' then delete;
if missing(IntFOR) then do;
dataset = IntFROM;
end;
else output;
proc print;
Attached is a sample of my input file.
I think you're going about this in a difficult way. There should be no need to assign a length or informat of $132 to every variable. In the example below, I've assigned an INFORMAT specific to each variable.
And you can use a naked INPUT statement to populate the _INFILE_ automatic variable. It has a trailing @, for re-reading purposes. Items 1 and 2 below is a bit like your logic, but the rest basically does conditional reading of subgroups of variables.
data want;
infile 'c:\temp\t.txt' firstobs=2 truncover;
informat dsname $40. node $8. intdate $5. intfrom hhmmss8. intfor stimer6.
HeldNAME $8. HeldNUM 5.0 HeldDate $5. HeldTIME time8.0 HeldJESID $8.
HeldFROM time8.0 HeldFOR stimer6. HeldF HeldS HeldC $1.
HolderNAME $8. HolderT $1. HolderJESID $8. HolderH $1. HolderAT time8.0
;
retain dsname -- intfor;
input @;
if _infile_=' ' then delete;
else if substr(_infile_,2,8) in ('--NODE--',' NAME ','--------') then delete;
else if _infile_=: ' DATASET NAME:' then do;
call missing (of dsname--intfor);
input @'DATASET NAME:' dsname $40.;
delete;
end;
if not (_infile_=:' ') then input @2 node @;
if not (_infile_=:' ') then input intdate intfrom intfor @;
input heldname--holderat;
format intfrom time8.0 intfor mmss5.0
heldtime time8.0 heldfrom time8.0 heldfor mmss5.0
holderat time8.0;
run;
Notes:
If you really want a CSV file, then set up a FILE statement and a PUT statement at the bottom of the data step. Also you can change DATA WANT to DATA _NULL_.
Considering your note:
I've tried using $CHAR132; in my length statement ...
pay attention, you shall define:
length var_name $132;
format var_name $132. ;
/* informat var_name $132. ; ??? */
As to your code, you can read just first 8 charachers to test it and continue according to result:
data File1;
retain dataset;
infile IFILE missover;
length NODE IntDATE IntFROM IntFOR
HeldNAME HeldNUM HeldDate HeldTIME HeldJESID
HeldFROM HeldFOR HeldF HeldS HeldC
HolderNAME HolderT HolderJESID HolderH HolderAT
$132;
input @2 test $8. @; /* read first 8 characters and keep stay on line */
if test in ('--NODE--' 'NAME ' '--------') then input; /* skip line */
if test = 'DATASET ' then
input @14 dataset $132. ;
else do;
input @1 test $1. @2 NODE -- HolderAT;
output;
end;
drop test;
run;
You may need to refine my code, as I have not tested it.
Some progress, thanks. Though its is picking up lines it shouldn't and I can't see why.
This statement seems to not work for the ' NAME '. Its not skipping this line.
if test in ('--NODE--' ' NAME ' '--------') then input; /* skip line */
I've tried playing around with different variations of the spaces within it, and yet it always gets included when it shouldn't.
Thanks
Maybe you need change line to:
input @1 test $8. @; /* instead @2 */
The text will be left justified.
and change line:
if test in ('--NODE--' 'NAME ' '--------') then input; /* skip line */
where 'NAME '
- is left justified, up to 8 characters.
You need also retain NODE INTDATE INTFROM INTFOR variables as they exist only on first data line.
I think you're going about this in a difficult way. There should be no need to assign a length or informat of $132 to every variable. In the example below, I've assigned an INFORMAT specific to each variable.
And you can use a naked INPUT statement to populate the _INFILE_ automatic variable. It has a trailing @, for re-reading purposes. Items 1 and 2 below is a bit like your logic, but the rest basically does conditional reading of subgroups of variables.
data want;
infile 'c:\temp\t.txt' firstobs=2 truncover;
informat dsname $40. node $8. intdate $5. intfrom hhmmss8. intfor stimer6.
HeldNAME $8. HeldNUM 5.0 HeldDate $5. HeldTIME time8.0 HeldJESID $8.
HeldFROM time8.0 HeldFOR stimer6. HeldF HeldS HeldC $1.
HolderNAME $8. HolderT $1. HolderJESID $8. HolderH $1. HolderAT time8.0
;
retain dsname -- intfor;
input @;
if _infile_=' ' then delete;
else if substr(_infile_,2,8) in ('--NODE--',' NAME ','--------') then delete;
else if _infile_=: ' DATASET NAME:' then do;
call missing (of dsname--intfor);
input @'DATASET NAME:' dsname $40.;
delete;
end;
if not (_infile_=:' ') then input @2 node @;
if not (_infile_=:' ') then input intdate intfrom intfor @;
input heldname--holderat;
format intfrom time8.0 intfor mmss5.0
heldtime time8.0 heldfrom time8.0 heldfor mmss5.0
holderat time8.0;
run;
Notes:
If you really want a CSV file, then set up a FILE statement and a PUT statement at the bottom of the data step. Also you can change DATA WANT to DATA _NULL_.
This is great. It pretty much gives me what I'm after. Just one thing I didn't account for in the example I provided. There are instances where only the HOLDER values are present(i.e. there could be multiple HOLDER values for a single HELD value).
These are getting displayed with incorrect values. Not sure how I can account for that?
I've attached a new sample input file which reflects this(Contentions02.txt). Doing a find on JAST0064 takes you to a line I'm referring to.
Thanks….
Apply the same principles I described earlier.
Great. This is exactly what I was after. Appreciate all your assistance.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.