BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
92568466
Fluorite | Level 6

Hi,

I'm using the following program to download xmls from a list of urls that I've stored in a dataset called test. 'file' is the destination (a folder name with unique file name), and 'url' is the address to xmls that I intend to download.


filename code temp;
data _null_;
set test;
file code;
put '
filename out ' file :$quote. ';'
/ ' proc http method="get"'
/ ' url=' url :$quote.
/ ' out=out'
/ '; run;'
;
run;

%include code / source2;

 

 

The code works great. However, I download all xmls, if valid or not. As an example, url A below is a valid xml, whereas url B is an invalid xml. I'm wondering if I can condition to download only the valid ones, and not download the invalid ones (like url B). I'm just concerned as I might need to download about a million files, and unnecessary/invalid ones make further data process cumbersome.

 

A: https://s3.amazonaws.com/irs-form-990/201313169349300441_public.xml

B: https://s3.amazonaws.com/irs-form-990/201103159349302715_public.xml

 

 

Thank you.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Figure out what code you want to run.  Then convert it to a macro that takes as input the URL and target filename. Then use the data step from the other answer to generate one macro call per observation in your data.

 

PROC HTTP does set macro variables. https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p0mwmz1upde0tqn1ptt5rnlly0tc.htm

 

So the basic structure of your macro might be something like:

%macro getxml(url,file);
filename out &file ;
proc http method='get' out=out url=&url;
run;
%if "&SYS_PROCHTTP_STATUS_PHRASE" ne "OK" %then %do;
  %put ERROR: Unable to retrieve &=url.;
  %put ERROR: &=SYS_PROCHTTP_STATUS_PHRASE;
  %put NOTE: Removing &=file.  %sysfunc(fdelete(out));
%end;
%mend;

 

View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

If you cannot figure out if PROC HTTP sets some status flag you could always just check the beginning of the file.

For example this code will create a macro variable named STATUS with a value or either 0 or 1.

%let status=0;
data _null_;
  infile out ;
  input;
  if _infile_^=:'<Error' then call symputx('status','1');
  stop;
run;  
92568466
Fluorite | Level 6

Hi Tom,

I'm not sure where I should place your code. I tried wrapping it inside my code, and running after my code. Looks like it reads only the first file. Can you help?

Tom
Super User Tom
Super User

Figure out what code you want to run.  Then convert it to a macro that takes as input the URL and target filename. Then use the data step from the other answer to generate one macro call per observation in your data.

 

PROC HTTP does set macro variables. https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p0mwmz1upde0tqn1ptt5rnlly0tc.htm

 

So the basic structure of your macro might be something like:

%macro getxml(url,file);
filename out &file ;
proc http method='get' out=out url=&url;
run;
%if "&SYS_PROCHTTP_STATUS_PHRASE" ne "OK" %then %do;
  %put ERROR: Unable to retrieve &=url.;
  %put ERROR: &=SYS_PROCHTTP_STATUS_PHRASE;
  %put NOTE: Removing &=file.  %sysfunc(fdelete(out));
%end;
%mend;

 

92568466
Fluorite | Level 6

Hi,

This is very helpful. I tried to include add this to my existing code, but unfortunately, I wasn't able to do it properly and get anything out of it. Here's the code I'm using. I've a list of urls in my file (named 'url') from which I need xmls (each url generates one xml). Destinations are named as 'file'. Each xml are saved in an unique name in the same folder. Both columns (url and file) are in data set test1.

 

 

filename code temp;
data _null_;
  set test1;
  file code;
  put  '
    filename out ' file :$quote. ';'
     /  ' proc http method="get"'
     /  ' url='  url :$quote.
     /  ' out=out'
     /  '; run;'
  ;
run;

%include code / source2;

 

Tom
Super User Tom
Super User

Once you get the macro to work you can then use a similar process to generate the macro calls instead of generating the code it is generating now.

filename code temp;
data _null_;
  set test1;
  file code;
  put  '%getxml('  url= :$quote. ',' file= :$quote. ')';
run; %include code / source2;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 521 views
  • 1 like
  • 2 in conversation