I am trying to loop over all the files in an Azure Data Lake and want to read specific .json files so that I may do something with the data in those files. I have access to the data lake and proved I can see the files stored there from within SAS. I only want to read files with a specific name (in this example, “process.json”).
I can access the ADLS location and see the files. I can filter out the files I don’t want. I cannot successfully use the “filename” statement to read the files. I am unsure how to get the actual name of the file into the filename statement within the loop. How do I accomplish this? I can't seem to get the filename statement to incorporate the actual value of "name". Code and log output is below.
Note that I have to read json files. I cannot convert to any other file type prior to reading into SAS.
options azuretenantid = "###-###-###-"; %let appId="###-###-###-"; %let acctName="###"; %let filesys="###"; filename d adls "/" applicationId=&appId accountname=&acctName filesystem=&filesys; data _null_; folder_id = dopen("d"); put folder_id=; num_files = dnum(folder_id); put num_files=; do i = 1 to num_files; name=dread(folder_id,i); if prxmatch('/(process)\.(json)/',name) then do; put name=; filename myjson adls "&name" applicationId=&appId accountname=&acctName filesystem=&filesys; libname temp json fileref=myjson; *do something with data in temp; end; end; closerc = dclose(folder_id); put closerc=; run;
Log output:
data _null_;
80
81 folder_id = dopen("d");
82 put folder_id=;
83
84 num_files = dnum(folder_id);
85 put num_files=;
86
87 do i = 1 to num_files;
88 name=dread(folder_id,i);
89
90 if prxmatch('/(process)\.(json)/',name) then do;
91 put name=;
92
93 filename myjson adls "&name"
WARNING: Apparent symbolic reference NAME not resolved.
94 applicationId=&appId
95 accountname=&acctName
96 filesystem=&filesys;
97
98 libname temp json fileref=myjson;
NOTE: JSON data is only read once. To read the JSON again, reassign the JSON LIBNAME.
ERROR: The AZURE file &name was not found.
ERROR: The endpoint within the Credential Service, azureDeviceCode, could not be found
ERROR: The endpoint within the Credential Service, azureDeviceCode, could not be found
ERROR: The specified path does not exist.
ERROR: Error in the LIBNAME statement.
99 *do something with data in temp;
100
101 end;
102 end;
103
104 closerc = dclose(folder_id);
105 put closerc=;
106 run;
folder_id=1
num_files=6
name=process.json
closerc=0
NOTE: DATA statement used (Total process time):
real time 1.78 seconds
cpu time 0.12 seconds
107
108 %studio_hide_wrapper;
118
119
You appear to have two issues.
First it does not look like your attempt to use the DREAD() function to get the list of names worked. You have PUT statements in the code, but no names where written to the log. I would recommend skipping the filtering (at least until you know if you can get the names) so you can see what the names retrieved look like.
Second you have logic issue in the code. You have embedded the global FILENAME and LIBNAME statements into the middle of your DATA step. That will not really work right as the global statements will run while the data step compiler is figuring out how to run the code and before it actually starts running the code that is going to try to retrieve the names. Plus you are referencing a macro variable NAME that you never assigned any value to.
I recommend solving the first problem first and see if you can actually get the list of files.
data files;
folder_id = dopen("d");
put folder_id=;
num_files = dnum(folder_id);
put num_files=;
do i = 1 to num_files;
length name $256 ;
name=dread(folder_id,i);
output;
end;
closerc = dclose(folder_id);
put closerc=;
run;
Then figure out whether what you retrieved can be used to generate valid FILENAME and LIBNAME statements. Then figure out if you can actually retrieve data from the JSON file using the libref created by the LIBNAME statement.
Once you have it working for one file convert it into code you call with different input file names. For example as a macro.
%readone(name);
filename myjson adls "&name"
applicationId=&appId
accountname=&acctName
filesystem=&filesys
;
libname myjson json ;
proc append base=all data=myjson.mydata force;
run;
%mend readone;
Then use the list of files to generate calls to the macro.
data _null_;
set files;
if lowcase(scan(name,-1,'.'))='json' then
call execute(cats('%nrstr(%mymacro)(',name,')'))
;
run;
You appear to have two issues.
First it does not look like your attempt to use the DREAD() function to get the list of names worked. You have PUT statements in the code, but no names where written to the log. I would recommend skipping the filtering (at least until you know if you can get the names) so you can see what the names retrieved look like.
Second you have logic issue in the code. You have embedded the global FILENAME and LIBNAME statements into the middle of your DATA step. That will not really work right as the global statements will run while the data step compiler is figuring out how to run the code and before it actually starts running the code that is going to try to retrieve the names. Plus you are referencing a macro variable NAME that you never assigned any value to.
I recommend solving the first problem first and see if you can actually get the list of files.
data files;
folder_id = dopen("d");
put folder_id=;
num_files = dnum(folder_id);
put num_files=;
do i = 1 to num_files;
length name $256 ;
name=dread(folder_id,i);
output;
end;
closerc = dclose(folder_id);
put closerc=;
run;
Then figure out whether what you retrieved can be used to generate valid FILENAME and LIBNAME statements. Then figure out if you can actually retrieve data from the JSON file using the libref created by the LIBNAME statement.
Once you have it working for one file convert it into code you call with different input file names. For example as a macro.
%readone(name);
filename myjson adls "&name"
applicationId=&appId
accountname=&acctName
filesystem=&filesys
;
libname myjson json ;
proc append base=all data=myjson.mydata force;
run;
%mend readone;
Then use the list of files to generate calls to the macro.
data _null_;
set files;
if lowcase(scan(name,-1,'.'))='json' then
call execute(cats('%nrstr(%mymacro)(',name,')'))
;
run;
You gave me the answer I needed! I am getting the filtering correct (see after line 106 - one file name gets printed "name=process.json").
What I wasn't taking into account that FILENAME and LIBNAME are global. The code below is what I am using and it works. Thanks for the help!
filename d adls "/"
applicationId=&appId
accountname=&acctname
filesystem=&filesys;
data files;
folder_id = dopen("d");
put folder_id=;
num_files = dnum(folder_id);
put num_files=;
do i = 1 to num_files;
length name $256 ;
name=dread(folder_id,i);
if prxmatch('/(process)\.(json)/',name) then do;
output;
end;
end;
closerc = dclose(folder_id);
put closerc=;
run;
%macro readone(name);
filename myjson adls "&name"
applicationId=&appId
accountname=&acctName
filesystem=&filesys
;
libname myjson json ;
proc append base=all data=myjson.mydata force;
run;
%mend;
data _null_;
set files;
call execute(cats('%nrstr(%readone)(',name,')'))
;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.