My situation is similar but the output needs to be rows. All my data is in one giant line where each value is separated by comma, and I should only have one column of data, where each of these are rows. Also my one line is extremely long greater than 32K+ in length.
["T_283_20220605_73_283A0000006757_830010", "T_283_20220605_75_283A0000012029_830010", "T_283_20220605_77_283A0000017945_Ach_OPull", "T_283_20220605_79_283A0000011229_Ach_OPull"] |
How to get it to transpose when i import it:
ColumnA |
T_283_20220605_73_283A0000006757_830010 |
T_283_20220605_75_283A0000012029_830010 |
T_283_20220605_77_283A0000017945_Ach_OPull |
T_283_20220605_79_283A0000011229_Ach_OPull |
Attached is full sample.
You cannot use the END= option with RECFM=N. (You also don't need LRECL= option with RECFM=N).
So you will need another test to see if the you have reached the end of the file.
You can try just stopping when you get a missing value, but that might not really stop in time.
do while (missing(TRANSACTION_ID));
input TRANSACTION_ID @@;
if not missing(TRANSACTION_ID) then output;
end;
Otherwise you might need to try the EOF= option instead. That wants a LABEL to jump to and not a VARIABLE as the value. You can probably just jump past the end of the DO loop. Now the DO loop is really an infinite loop. But perhaps instead you should put some upper limit on the looping just in case.
So perhaps something like:
data dsn;
length location $1000;
set new_files;
location=cats("&path/", file_name);
infile dummy filevar=location dlm=',[]' dsd recfm=n eof=done;
length transaction_id $50;
do _n_=1 to 1E8;
input transaction_id @@;
if not missing(transaction_id) then output;
end;
done:
drop file_date file_name;
run;
PS There is no need to attach an INFORMAT to a simple character variable. If you want to tell SAS how the variable should be DEFINED then just do that with the LENGTH statement. The same as you did for the LOCATION variable.
Otherwise since you don't seem to care which file the transaction id came from you can just use your list of files to define ONE fileref and then use that in the data step.
filename code temp;
data _null_;
file code ;
length location $1000;
set new_files end=eof;
location=cats("&path/", file_name);
if _n_=1 then put 'filename allfiles (' ;
put location :$quote. ;
if eof then put ');' ;
run;
data dsn;
infile allfiles dlm=',[]' dsd recfm=n ;
length transaction_id $50;
input transaction_id @@;
if not missing(transaction_id) then output;
run;
First thing; it is a good idea to start your own thread. If you think think that your question is related to another then post a link to the related thread(s). As the starter of a thread you have the ability to mark responses as an accepted solution.
You do not explain how the [ or ] characters in your example are to be treated.
This example treats the [ and ] as delimiters. The IF is because the structure of that line means you get blank values as read. You would replace "infile datalines" with "infile "yourfilenamegoes here".
data example; infile datalines dlm=',[]' dsd; informat value $50.; input value @@; if not missing(value) then output; datalines; ["T_283_20220605_73_283A0000006757_830010", "T_283_20220605_75_283A0000012029_830010", "T_283_20220605_77_283A0000017945_Ach_OPull", "T_283_20220605_79_283A0000011229_Ach_OPull"] ;
The @@ on the input statement says hold the line and keep reading until you run out of information. You will likely see a note in the Log about reading to next line. That is normal.
Thanks @ballardw each file has a open and close bracket, one in the beginning and one at the end, I was just going to replace these if I found them in output. So that I understand the code better how or where in the code does it know these are to be rows of data and not 4 different columns? Is it the double at sign?
Also how do I deal with a file that has length greater than 32,768?, when I apply your logic to a file it stops at 750 records, where the last one is partial, and I've been told the file should have 10,000 records in it.
Increase the LRECL=. On UNIX, the maximum value for this is 1G.
You need to provide a better example of the file you are reading. Make sure to use the Insert Code button to get a pop-up window to paste in the example text.
Is the file all one line or are there line breaks that indicate when a "ROW" ends? Do you want to treat the closing bracket as marking the end of a "ROW"?
You can use the LRECL= option on the INFILE statement to tell SAS that the lines are longer then 32767 bytes. If the lines are larger than the LRECL= option supports then use RECFM=N instead and treat the file as one long line. In that case you might need to remove any LF and/or CR characters that you end up reading in.
So something like this will read all of the "words" out of the file.
data want;
infile 'myfile.txt' recfm=n dlm=' ,"' ;
wordno+1;
input word :$80. @@ ;
run;
If you want to remember what LINE the WORD was read from then read the file as variable length instead you can use the LENGTH= and COLUMN= option to tell when you reach the end of a line.
data want;
infile 'myfile.txt' recfm=n dlm=' ,"' truncover lrecl=10000000 length=ll column=cc;
row+1;
do wordno=1 by 1 until(cc > ll);
input word :$80. @ ;
output;
end;
run;
If you need check which of the different [ ] blocks the words are in then perhaps add something like:
data want;
infile 'myfile.txt' recfm=n dlm=' ,"' truncover lrecl=10000000 length=ll column=cc;
row+1;
do block=1 by 1 until(cc>ll);
do wordno=1 by 1 until(cc>ll or word=']');
input word :$80. @ ;
output;
end;
end;
run;
I've attached one of the text file, does it look like a xml or json kurt? @Kurt_Bremser, if so is there another way to import this?
lrecl=10000000 recfm=n truncover adding this seems to work @Tom @Kurt_Bremser @ballardw
I wanted to loop this through multiple files so I took some existing code I had that read in multiple files, however it stops after reading just one file, looking at the log I see this NOTE: Unexpected end of file for binary input. Which I'm thinking is why it stops. Is there an option I'm missing to keep reading thru my file list
DATA dsn;
length location $1000;
set NEW_FILES;
location=cats("&path/", FILE_NAME);
infile dummy filevar=location end=done DLM=',[]' DSD lrecl=10000000 recfm=n;
informat TRANSACTION_ID $50.;
Do while (not done);
input TRANSACTION_ID @@;
if not missing(TRANSACTION_ID) then output;
end;
drop file_date file_name;
RUN;
Thanks any help is appreciated
You cannot use the END= option with RECFM=N. (You also don't need LRECL= option with RECFM=N).
So you will need another test to see if the you have reached the end of the file.
You can try just stopping when you get a missing value, but that might not really stop in time.
do while (missing(TRANSACTION_ID));
input TRANSACTION_ID @@;
if not missing(TRANSACTION_ID) then output;
end;
Otherwise you might need to try the EOF= option instead. That wants a LABEL to jump to and not a VARIABLE as the value. You can probably just jump past the end of the DO loop. Now the DO loop is really an infinite loop. But perhaps instead you should put some upper limit on the looping just in case.
So perhaps something like:
data dsn;
length location $1000;
set new_files;
location=cats("&path/", file_name);
infile dummy filevar=location dlm=',[]' dsd recfm=n eof=done;
length transaction_id $50;
do _n_=1 to 1E8;
input transaction_id @@;
if not missing(transaction_id) then output;
end;
done:
drop file_date file_name;
run;
PS There is no need to attach an INFORMAT to a simple character variable. If you want to tell SAS how the variable should be DEFINED then just do that with the LENGTH statement. The same as you did for the LOCATION variable.
Otherwise since you don't seem to care which file the transaction id came from you can just use your list of files to define ONE fileref and then use that in the data step.
filename code temp;
data _null_;
file code ;
length location $1000;
set new_files end=eof;
location=cats("&path/", file_name);
if _n_=1 then put 'filename allfiles (' ;
put location :$quote. ;
if eof then put ');' ;
run;
data dsn;
infile allfiles dlm=',[]' dsd recfm=n ;
length transaction_id $50;
input transaction_id @@;
if not missing(transaction_id) then output;
run;
Awesome Thanks @Tom that works. Appreciate your help
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.