BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jimbobob
Quartz | Level 8

My situation is similar but the output needs to be rows. All my data is in one giant line where each value is separated by comma, and I should only have one column of data, where each of these are rows. Also my one line is extremely long greater than 32K+ in length. 

 

["T_283_20220605_73_283A0000006757_830010", "T_283_20220605_75_283A0000012029_830010", "T_283_20220605_77_283A0000017945_Ach_OPull", "T_283_20220605_79_283A0000011229_Ach_OPull"]

 

How to get it to transpose when i import it:

 

ColumnA
T_283_20220605_73_283A0000006757_830010
T_283_20220605_75_283A0000012029_830010
T_283_20220605_77_283A0000017945_Ach_OPull
T_283_20220605_79_283A0000011229_Ach_OPull

 

Attached is full sample.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You cannot use the END= option with RECFM=N.  (You also don't need LRECL= option with RECFM=N).

 

So you will need another test to see if the you have reached the end of the file.

 

You can try just stopping when you get a missing value, but that might not really stop in time.

do while (missing(TRANSACTION_ID));
  input TRANSACTION_ID @@;
  if not missing(TRANSACTION_ID) then output;
end;

Otherwise you might need to try the EOF= option instead.  That wants a LABEL to jump to and not a VARIABLE as the value.  You can probably just jump past the end of the DO loop.  Now the DO loop is really an infinite loop.  But perhaps instead you should put some upper limit on the looping just in case.

So perhaps something like:

data dsn;
  length location $1000;
  set new_files;
  location=cats("&path/", file_name);
  infile dummy filevar=location dlm=',[]' dsd recfm=n eof=done;
  length transaction_id  $50;
  do _n_=1 to 1E8;
    input transaction_id @@;
    if not missing(transaction_id) then output;
  end;
done:
  drop file_date file_name;
run;

PS There is no need to attach an INFORMAT to a simple character variable. If you want to tell SAS how the variable should be DEFINED then just do that with the LENGTH statement. The same as you did for the LOCATION variable.

 

Otherwise since you don't seem to care which file the transaction id came from you can just use your list of files to define ONE fileref and then use that in the data step.

filename code temp;
data _null_;
  file code ;
  length location $1000;
  set new_files end=eof;
  location=cats("&path/", file_name);
  if _n_=1 then put 'filename allfiles (' ;
  put location :$quote. ;
  if eof then put ');' ;
run;

data dsn;
  infile allfiles dlm=',[]' dsd recfm=n ;
  length transaction_id  $50;
  input transaction_id @@;
  if not missing(transaction_id) then output;
run;

View solution in original post

11 REPLIES 11
ballardw
Super User

First thing; it is a good idea to start your own thread. If you think think that your question is related to another then post a link to the related thread(s). As the starter of a thread you have the ability to mark responses as an accepted solution.

 

You do not explain how the [ or ] characters in your example are to be treated.

This example treats the [ and ] as delimiters. The IF is because the structure of that line means you get blank values as read. You would replace "infile datalines" with "infile "yourfilenamegoes here".

data example;
   infile datalines dlm=',[]' dsd;
   informat value $50.;
   input value @@;
   if not missing(value) then output;
datalines;
["T_283_20220605_73_283A0000006757_830010", "T_283_20220605_75_283A0000012029_830010", "T_283_20220605_77_283A0000017945_Ach_OPull", "T_283_20220605_79_283A0000011229_Ach_OPull"]
;

The @@ on the input statement says hold the line and keep reading until you run out of information. You will likely see a note in the Log about reading to next line. That is normal.

jimbobob
Quartz | Level 8

Thanks @ballardw each file has a open and close bracket, one in the beginning and one at the end, I was just going to replace these if I found them in output. So that I understand the code better how or where in the code does it know these are to be rows of data and not 4 different columns? Is it the double at sign?

jimbobob
Quartz | Level 8

Also how do I deal with a file that has length greater than 32,768?, when I apply your logic to a file it stops at 750 records, where the last one is partial, and I've been told the file should have 10,000 records in it.  

jimbobob_0-1680032619438.png

 

Reeza
Super User
Do you have a JSON or XML file you're attempting to parse here?

As mentioned, please start your own thread.
Tom
Super User Tom
Super User

You need to provide a better example of the file you are reading.  Make sure to use the Insert Code button to get a pop-up window to paste in the example text.

 

Is the file all one line or are there line breaks that indicate when a "ROW" ends?  Do you want to treat the closing bracket as marking the end of a "ROW"?

 

You can use the LRECL= option on the INFILE statement to tell SAS that the lines are longer then 32767 bytes.  If the lines are larger than the LRECL= option supports then use RECFM=N instead and treat the file as one long line.  In that case you might need to remove any LF and/or CR characters that you end up reading in.

 

So something like this will read all of the "words" out of the file.

data want;
  infile 'myfile.txt' recfm=n dlm=' ,"' ;
  wordno+1;
  input word :$80. @@ ;
run;

If you want to remember what LINE the WORD was read from then read the file as variable length instead you can use the LENGTH= and COLUMN= option to tell when you reach the end of a line.

data want;
  infile 'myfile.txt' recfm=n dlm=' ,"' truncover lrecl=10000000 length=ll column=cc;
  row+1;
  do wordno=1 by 1 until(cc > ll);
    input word :$80. @ ;
    output;
  end;
run;

If you need check which of the different [ ] blocks the words are in then perhaps add something like:

data want;
  infile 'myfile.txt' recfm=n dlm=' ,"' truncover lrecl=10000000 length=ll column=cc;
  row+1;
  do block=1 by 1 until(cc>ll);
    do wordno=1 by 1 until(cc>ll or word=']');
      input word :$80. @ ;
      output;
   end;
  end;
run;

 

jimbobob
Quartz | Level 8

I've attached one of the text file, does it look like a xml or json kurt?  @Kurt_Bremser, if so is there another way to import this?

jimbobob
Quartz | Level 8

lrecl=10000000 recfm=n truncover  adding this seems to work @Tom  @Kurt_Bremser  @ballardw 

jimbobob
Quartz | Level 8

I wanted to loop this through multiple files so I took some existing code I had that read in multiple files, however it stops after reading just one file, looking at the log I see this NOTE: Unexpected end of file for binary input. Which I'm thinking is why it stops. Is there an option I'm missing to keep reading thru my file list

 

	DATA dsn;
		length location $1000;
		set NEW_FILES;
		location=cats("&path/", FILE_NAME);
		infile dummy filevar=location end=done DLM=',[]' DSD lrecl=10000000 recfm=n;
	    informat TRANSACTION_ID  $50.;
		Do while (not done);
			input TRANSACTION_ID @@;
   			if not missing(TRANSACTION_ID) then output;
		end;
		drop file_date file_name;
	RUN;

Thanks any help is appreciated

Tom
Super User Tom
Super User

You cannot use the END= option with RECFM=N.  (You also don't need LRECL= option with RECFM=N).

 

So you will need another test to see if the you have reached the end of the file.

 

You can try just stopping when you get a missing value, but that might not really stop in time.

do while (missing(TRANSACTION_ID));
  input TRANSACTION_ID @@;
  if not missing(TRANSACTION_ID) then output;
end;

Otherwise you might need to try the EOF= option instead.  That wants a LABEL to jump to and not a VARIABLE as the value.  You can probably just jump past the end of the DO loop.  Now the DO loop is really an infinite loop.  But perhaps instead you should put some upper limit on the looping just in case.

So perhaps something like:

data dsn;
  length location $1000;
  set new_files;
  location=cats("&path/", file_name);
  infile dummy filevar=location dlm=',[]' dsd recfm=n eof=done;
  length transaction_id  $50;
  do _n_=1 to 1E8;
    input transaction_id @@;
    if not missing(transaction_id) then output;
  end;
done:
  drop file_date file_name;
run;

PS There is no need to attach an INFORMAT to a simple character variable. If you want to tell SAS how the variable should be DEFINED then just do that with the LENGTH statement. The same as you did for the LOCATION variable.

 

Otherwise since you don't seem to care which file the transaction id came from you can just use your list of files to define ONE fileref and then use that in the data step.

filename code temp;
data _null_;
  file code ;
  length location $1000;
  set new_files end=eof;
  location=cats("&path/", file_name);
  if _n_=1 then put 'filename allfiles (' ;
  put location :$quote. ;
  if eof then put ');' ;
run;

data dsn;
  infile allfiles dlm=',[]' dsd recfm=n ;
  length transaction_id  $50;
  input transaction_id @@;
  if not missing(transaction_id) then output;
run;
jimbobob
Quartz | Level 8

Awesome Thanks @Tom that works. Appreciate your help

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1306 views
  • 4 likes
  • 5 in conversation