BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
92568466
Fluorite | Level 6

Hi,

I'm using the following program to download xmls from a list of urls that I've stored in a dataset called test. 'file' is the destination (a folder name with unique file name), and 'url' is the address to xmls that I intend to download.


filename code temp;
data _null_;
set test;
file code;
put '
filename out ' file :$quote. ';'
/ ' proc http method="get"'
/ ' url=' url :$quote.
/ ' out=out'
/ '; run;'
;
run;

%include code / source2;

 

 

The code works great. However, I download all xmls, if valid or not. As an example, url A below is a valid xml, whereas url B is an invalid xml. I'm wondering if I can condition to download only the valid ones, and not download the invalid ones (like url B). I'm just concerned as I might need to download about a million files, and unnecessary/invalid ones make further data process cumbersome.

 

A: https://s3.amazonaws.com/irs-form-990/201313169349300441_public.xml

B: https://s3.amazonaws.com/irs-form-990/201103159349302715_public.xml

 

 

Thank you.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Figure out what code you want to run.  Then convert it to a macro that takes as input the URL and target filename. Then use the data step from the other answer to generate one macro call per observation in your data.

 

PROC HTTP does set macro variables. https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p0mwmz1upde0tqn1ptt5rnlly0tc.htm

 

So the basic structure of your macro might be something like:

%macro getxml(url,file);
filename out &file ;
proc http method='get' out=out url=&url;
run;
%if "&SYS_PROCHTTP_STATUS_PHRASE" ne "OK" %then %do;
  %put ERROR: Unable to retrieve &=url.;
  %put ERROR: &=SYS_PROCHTTP_STATUS_PHRASE;
  %put NOTE: Removing &=file.  %sysfunc(fdelete(out));
%end;
%mend;

 

View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

If you cannot figure out if PROC HTTP sets some status flag you could always just check the beginning of the file.

For example this code will create a macro variable named STATUS with a value or either 0 or 1.

%let status=0;
data _null_;
  infile out ;
  input;
  if _infile_^=:'<Error' then call symputx('status','1');
  stop;
run;  
92568466
Fluorite | Level 6

Hi Tom,

I'm not sure where I should place your code. I tried wrapping it inside my code, and running after my code. Looks like it reads only the first file. Can you help?

Tom
Super User Tom
Super User

Figure out what code you want to run.  Then convert it to a macro that takes as input the URL and target filename. Then use the data step from the other answer to generate one macro call per observation in your data.

 

PROC HTTP does set macro variables. https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p0mwmz1upde0tqn1ptt5rnlly0tc.htm

 

So the basic structure of your macro might be something like:

%macro getxml(url,file);
filename out &file ;
proc http method='get' out=out url=&url;
run;
%if "&SYS_PROCHTTP_STATUS_PHRASE" ne "OK" %then %do;
  %put ERROR: Unable to retrieve &=url.;
  %put ERROR: &=SYS_PROCHTTP_STATUS_PHRASE;
  %put NOTE: Removing &=file.  %sysfunc(fdelete(out));
%end;
%mend;

 

92568466
Fluorite | Level 6

Hi,

This is very helpful. I tried to include add this to my existing code, but unfortunately, I wasn't able to do it properly and get anything out of it. Here's the code I'm using. I've a list of urls in my file (named 'url') from which I need xmls (each url generates one xml). Destinations are named as 'file'. Each xml are saved in an unique name in the same folder. Both columns (url and file) are in data set test1.

 

 

filename code temp;
data _null_;
  set test1;
  file code;
  put  '
    filename out ' file :$quote. ';'
     /  ' proc http method="get"'
     /  ' url='  url :$quote.
     /  ' out=out'
     /  '; run;'
  ;
run;

%include code / source2;

 

Tom
Super User Tom
Super User

Once you get the macro to work you can then use a similar process to generate the macro calls instead of generating the code it is generating now.

filename code temp;
data _null_;
  set test1;
  file code;
  put  '%getxml('  url= :$quote. ',' file= :$quote. ')';
run; %include code / source2;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1736 views
  • 1 like
  • 2 in conversation