BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
bayzid
Obsidian | Level 7

I am trying to concatenate multiple JSON files and then converting the resulting file into SAS format. How can I include the source filename as a new variable in the concatenated file?

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You could try adding the DSD option, that will allow the quotes to protect the delimiters.

data tall ;
  length row col 8 name $32 value $256 ;
  length filen filename $256 ;
  infile "c:\downloads\MyDrug*.json" dlm='[{,:}]' dsd truncover length=ll column=cc filename=filen;
  input @;
  if filen ne lag(filen) then row=0;
  filename=filen;
  row+1;
  do col=1 by 1 until(cc>ll);
    input name value @;
    name=dequote(name);
    if value='null' then value=' ';
    else value=dequote(value);
    if name ne ' ' then output;
  end;
run;

The [{ or ,{ at the start of each line will cause there to be two extra empty values, but the IF statement that ignores the pairs without a name will cause those to be ignored.

 

View solution in original post

56 REPLIES 56
Quentin
Super User

Can you show sample code you are using to read and concatenate two JSON files?  And perhaps attach two small sample JSON files (or include code to generate the two sample JSON files).

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
bayzid
Obsidian | Level 7

data _null_;
/* Iterate over all the files input */
infile '\json_test\MyDrug*.json' end=eof;
/* Open the file to write the output to */
file '\MyDrug_all.json';

/* Read a line of input */
input;

/* If this is the first line in the file print it to the output file verbatim */
if _n_ = 1 then
put _infile_;
/* If this is the last line print it out verbatim */
else if eof then
put _infile_;
/* Otherwise we are in the middle somewhere */
else do
/* Check what the first character of the line is */
/* Its either [, ], or , */
first_character = substr(_infile_,1,1);
select (first_character);
/* If its [ we must remove it */
when ("[") do;
line_without_first_character = substr(_infile_,2);
put line_without_first_character;
end;
/* If its ] we must swap it to a , to allow more objects in the array */
when ("]") put "," @@;
otherwise put _infile_;
end;
end;
run;

bayzid
Obsidian | Level 7
 
Tom
Super User Tom
Super User

Your JSON is sooooooo simple you could just read it as name/value pairs directly.

data tall ;
  length row col 8 name $32 value $256 ;
  length filen filename $256 ;
  infile "c:\downloads\MyDrug*.json" dlm='[{,:}]' truncover length=ll column=cc filename=filen;
  input @;
  if filen ne lag(filen) then row=0;
  filename=filen;
  row+1;
  do col=1 by 1 until(cc>ll);
    input name value @ ;
    name=dequote(name);
    if value='null' then value=' ';
    else value=dequote(value);
    if name ne ' ' then output;
  end;
run;

proc transpose data=tall out=wide(drop=_name_);
  by filename row ;
  id name;
  var value;
run;

proc print data=wide ;
 where row=1;
run;

Result

Tom_0-1681942571760.png

 

bayzid
Obsidian | Level 7

Hi Tom,
That's excellent. I am almost there. With my complete data pool (>83k JSON files into 21 groups) the tall is being created without any issue but the transpose proc is not working. It's returning the following error message.
NOTE: The above message was for the following BY group:
filename=C:\...ts\JSON_1st_0203
ERROR: The ID value "null" occurs twice in the same BY group.
ERROR: The ID value "false" occurs twice in the same BY group.

bayzid
Obsidian | Level 7

Hi Tom,
I found the reason for the error message. You are using comma as a delimiter but it also appears in some of the values. Then the problem happens.

bayzid_0-1681949526977.pngbayzid_1-1681949603832.png

Is there a way to get around this problem easily?

Tom
Super User Tom
Super User

You could try adding the DSD option, that will allow the quotes to protect the delimiters.

data tall ;
  length row col 8 name $32 value $256 ;
  length filen filename $256 ;
  infile "c:\downloads\MyDrug*.json" dlm='[{,:}]' dsd truncover length=ll column=cc filename=filen;
  input @;
  if filen ne lag(filen) then row=0;
  filename=filen;
  row+1;
  do col=1 by 1 until(cc>ll);
    input name value @;
    name=dequote(name);
    if value='null' then value=' ';
    else value=dequote(value);
    if name ne ' ' then output;
  end;
run;

The [{ or ,{ at the start of each line will cause there to be two extra empty values, but the IF statement that ignores the pairs without a name will cause those to be ignored.

 

bayzid
Obsidian | Level 7

Hi Tom,

That's fantastic. It worked.

bayzid
Obsidian | Level 7

Hi Tom,
With one of the data files i am recieving the following error message.
NOTE: 29 records were read from the infile "O:\All Staff\BESTMED\Backup\final\Resident*.json".
The minimum record length was 194.
The maximum record length was 866.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TALL may be incomplete. When this step was stopped there were
35926321 observations and 5 variables.
I changed the length of the value to 900 but still didn't work.
The Note doesn't give exact file name. I have thousands of files as resident*.json.

Quentin
Super User

Check your log, there should be an ERROR message, and probably an informative dump of data.

 

You might try adding something like:

if _error_ then putlog "Uh-oh " filename= ;

after the input statement, if it looks like that is what is causing the problem.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
bayzid
Obsidian | Level 7

I put the statement as below. But it didn't create any text document.

input @;
if _error_ then putlog "Uh-oh " filename="O:\All Staff\BESTMED\Backup\final\error.txt" ;

 

Quentin
Super User

The putlog statement won't write a file, it writes a message to the log.  I was thinking place:

if _error_ then putlog "Uh-oh " filename= ;

After second input statement, to see if it ever detects that SAS has set the _error_ variable to true.  

 

But I agree with Tom, since you're reading values as character, I don't see anything that could be causing an error.

 

Can you run the step and save the log, and attach the full log.  There should be a more informative error message somewhere.

 

Tom's idea about a file lock/permissions issue is a good one.  If the json files are sitting on some remote drive, maybe try copying them to a local drive that only you can access, and see if the program works then.  If it does, that would suggest a permissions issue, or some other network issue.  If it still errors locally, the next question would be does it always error at the same point (same record count in the output file)?  

 

Another trouble-shooting step: take Tom's code, and turn it into a macro which will accept the name of a single JSON file as a parameter, and process a single file.  Then use CALL EXECUTE or whatever to call the macro once per file, and see which (if any) file causes it to error.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
BayzidurRahman
Obsidian | Level 7

The problem is with the delimiters. Some of the delimiters are also in some values.

BayzidurRahman_0-1682219987480.pngBayzidurRahman_1-1682220034345.png

 

Reeza
Super User
Try reading the file with the JSON libname instead then.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 56 replies
  • 1986 views
  • 7 likes
  • 6 in conversation