I am trying to concatenate multiple JSON files and then converting the resulting file into SAS format. How can I include the source filename as a new variable in the concatenated file?
You could try adding the DSD option, that will allow the quotes to protect the delimiters.
data tall ;
length row col 8 name $32 value $256 ;
length filen filename $256 ;
infile "c:\downloads\MyDrug*.json" dlm='[{,:}]' dsd truncover length=ll column=cc filename=filen;
input @;
if filen ne lag(filen) then row=0;
filename=filen;
row+1;
do col=1 by 1 until(cc>ll);
input name value @;
name=dequote(name);
if value='null' then value=' ';
else value=dequote(value);
if name ne ' ' then output;
end;
run;
The [{ or ,{ at the start of each line will cause there to be two extra empty values, but the IF statement that ignores the pairs without a name will cause those to be ignored.
Can you show sample code you are using to read and concatenate two JSON files? And perhaps attach two small sample JSON files (or include code to generate the two sample JSON files).
data _null_;
/* Iterate over all the files input */
infile '\json_test\MyDrug*.json' end=eof;
/* Open the file to write the output to */
file '\MyDrug_all.json';
/* Read a line of input */
input;
/* If this is the first line in the file print it to the output file verbatim */
if _n_ = 1 then
put _infile_;
/* If this is the last line print it out verbatim */
else if eof then
put _infile_;
/* Otherwise we are in the middle somewhere */
else do
/* Check what the first character of the line is */
/* Its either [, ], or , */
first_character = substr(_infile_,1,1);
select (first_character);
/* If its [ we must remove it */
when ("[") do;
line_without_first_character = substr(_infile_,2);
put line_without_first_character;
end;
/* If its ] we must swap it to a , to allow more objects in the array */
when ("]") put "," @@;
otherwise put _infile_;
end;
end;
run;
Your JSON is sooooooo simple you could just read it as name/value pairs directly.
data tall ;
length row col 8 name $32 value $256 ;
length filen filename $256 ;
infile "c:\downloads\MyDrug*.json" dlm='[{,:}]' truncover length=ll column=cc filename=filen;
input @;
if filen ne lag(filen) then row=0;
filename=filen;
row+1;
do col=1 by 1 until(cc>ll);
input name value @ ;
name=dequote(name);
if value='null' then value=' ';
else value=dequote(value);
if name ne ' ' then output;
end;
run;
proc transpose data=tall out=wide(drop=_name_);
by filename row ;
id name;
var value;
run;
proc print data=wide ;
where row=1;
run;
Result
Hi Tom,
That's excellent. I am almost there. With my complete data pool (>83k JSON files into 21 groups) the tall is being created without any issue but the transpose proc is not working. It's returning the following error message.
NOTE: The above message was for the following BY group:
filename=C:\...ts\JSON_1st_0203
ERROR: The ID value "null" occurs twice in the same BY group.
ERROR: The ID value "false" occurs twice in the same BY group.
Hi Tom,
I found the reason for the error message. You are using comma as a delimiter but it also appears in some of the values. Then the problem happens.
Is there a way to get around this problem easily?
You could try adding the DSD option, that will allow the quotes to protect the delimiters.
data tall ;
length row col 8 name $32 value $256 ;
length filen filename $256 ;
infile "c:\downloads\MyDrug*.json" dlm='[{,:}]' dsd truncover length=ll column=cc filename=filen;
input @;
if filen ne lag(filen) then row=0;
filename=filen;
row+1;
do col=1 by 1 until(cc>ll);
input name value @;
name=dequote(name);
if value='null' then value=' ';
else value=dequote(value);
if name ne ' ' then output;
end;
run;
The [{ or ,{ at the start of each line will cause there to be two extra empty values, but the IF statement that ignores the pairs without a name will cause those to be ignored.
Hi Tom,
That's fantastic. It worked.
Hi Tom,
With one of the data files i am recieving the following error message.
NOTE: 29 records were read from the infile "O:\All Staff\BESTMED\Backup\final\Resident*.json".
The minimum record length was 194.
The maximum record length was 866.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TALL may be incomplete. When this step was stopped there were
35926321 observations and 5 variables.
I changed the length of the value to 900 but still didn't work.
The Note doesn't give exact file name. I have thousands of files as resident*.json.
Check your log, there should be an ERROR message, and probably an informative dump of data.
You might try adding something like:
if _error_ then putlog "Uh-oh " filename= ;
after the input statement, if it looks like that is what is causing the problem.
I put the statement as below. But it didn't create any text document.
input @;
if _error_ then putlog "Uh-oh " filename="O:\All Staff\BESTMED\Backup\final\error.txt" ;
The putlog statement won't write a file, it writes a message to the log. I was thinking place:
if _error_ then putlog "Uh-oh " filename= ;
After second input statement, to see if it ever detects that SAS has set the _error_ variable to true.
But I agree with Tom, since you're reading values as character, I don't see anything that could be causing an error.
Can you run the step and save the log, and attach the full log. There should be a more informative error message somewhere.
Tom's idea about a file lock/permissions issue is a good one. If the json files are sitting on some remote drive, maybe try copying them to a local drive that only you can access, and see if the program works then. If it does, that would suggest a permissions issue, or some other network issue. If it still errors locally, the next question would be does it always error at the same point (same record count in the output file)?
Another trouble-shooting step: take Tom's code, and turn it into a macro which will accept the name of a single JSON file as a parameter, and process a single file. Then use CALL EXECUTE or whatever to call the macro once per file, and see which (if any) file causes it to error.
The problem is with the delimiters. Some of the delimiters are also in some values.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.