- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This code used to work and no longer does. I'm not familiar with parsing/accessing url results so cannot even figure out how to debug. Can anyone offer any insights to see what is being returned, to debug, to fix?! Thank you.
filename y url "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=PNAS[ta]+AND+97[vi]&retsta...";
data pubmedinfo;
length countfound count loc locend 8. countchar $30.;
retain count countfound 0 countchar "TTTT";
infile y lrecl=32000 pad;
input;
if countfound = 0 then do;
loc = find(_infile_,'<Count>');
locend = find(_infile_,'</Count>');
end;
if loc not in (0 .) then do;
countchar = substr(_infile_,loc+7,locend - (loc+7));
count = countchar;
countfound = 1;
output;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
your statement that it used to work and no longer does is not clear. How do you know that it was working if you are unfamiliar with parsing?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What does working mean?
If I try to open that URL I get this XML displayed. Notice that it has multiple <COUNT></COUNT> tags.
<eSearchResult> <Count>2651</Count> <RetMax>6</RetMax> <RetStart>6</RetStart> <IdList> <Id>11121077</Id> <Id>11121076</Id> <Id>11121075</Id> <Id>11121074</Id> <Id>11121073</Id> <Id>11121072</Id> </IdList> <TranslationSet> <Translation> <From>PNAS[ta]</From> <To>"Proc Natl Acad Sci U S A"[Journal]</To> </Translation> </TranslationSet> <TranslationStack> <TermSet> <Term>"Proc Natl Acad Sci U S A"[Journal]</Term> <Field>Journal</Field> <Count>137446</Count> <Explode>N</Explode> </TermSet> <TermSet> <Term>97[vi]</Term> <Field>vi</Field> <Count>93303</Count> <Explode>N</Explode> </TermSet> <OP>AND</OP> </TranslationStack> <QueryTranslation>"Proc Natl Acad Sci U S A"[Journal] AND 97[vi]</QueryTranslation> </eSearchResult>
If I read it with SAS data step I see that it is actually only 10 list of text. The Browser was just breaking it into multiple lines to make it easier for humans to look at.
543 data _null_; 544 infile y ; 545 input ; 546 put _infile_; 547 run; NOTE: The infile Y is: Filename=https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=PNAS[ta]+AND+97[vi ]&retstart=6&retmax=6&tool=biomed3, Local Host Name=AMRL20L6F1E4992, Local Host IP addr=fe80::c856:15a5:8ebd:8fdc%7, Service Hostname Name=www.ncbi.nlm.nih.gov, Service IP addr=130.14.29.110,Service Name=N/A, Service Portno=443,Lrecl=32767,Recfm=Variable <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD esearch 20060628//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/d td/20060628/esearch.dtd"> <eSearchResult><Count>2651</Count><RetMax>6</RetMax><RetStart>6</RetStart><IdList> <Id>11121077</Id> <Id>11121076</Id> <Id>11121075</Id> <Id>11121074</Id> <Id>11121073</Id> <Id>11121072</Id> </IdList><TranslationSet><Translation> <From>PNAS[ta]</From> <To>"Proc Natl Acad Sci U S A"[Journal ]</To> </Translation></TranslationSet><TranslationStack> <TermSet> <Term>"Proc Natl Acad Sci U S A" [Journal]</Term> <Field>Journal</Field> <Count>137446</Count> <Explode>N</Explode> </TermSet> <TermSet> <Term>97[vi]</Term> <Field>vi</Field> <Count>93303</Count> <Explode>N</Explode> </T ermSet> <OP>AND</OP> </TranslationStack><QueryTranslation>"Proc Natl Acad Sci U S A"[Journal] AND 97[vi] </QueryTranslation></eSearchResult> NOTE: 10 records were read from the infile Y. The minimum record length was 17. The maximum record length was 570.
And if I run your program it "works" and finds the first COUNT value.
596 data pubmedinfo; 597 length countfound count loc locend 8. countchar $30.; 598 retain count countfound 0 countchar "TTTT"; 599 infile y lrecl=32000 pad; 600 input; 601 if countfound = 0 then do; 602 loc = find(_infile_,'<Count>'); 603 locend = find(_infile_,'</Count>'); 604 end; 605 if loc not in (0 .) then do; 606 countchar = substr(_infile_,loc+7,locend - (loc+7)); 607 count = countchar; 608 countfound = 1; 609 output; 610 end; 611 run; NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 607:19 NOTE: The infile Y is: Filename=https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=PNAS[ta]+AND+97[vi ]&retstart=6&retmax=6&tool=biomed3, Local Host Name=AMRL20L6F1E4992, Local Host IP addr=fe80::c856:15a5:8ebd:8fdc%7, Service Hostname Name=www.ncbi.nlm.nih.gov, Service IP addr=130.14.29.110,Service Name=N/A, Service Portno=443,Lrecl=32000,Recfm=Variable NOTE: 10 records were read from the infile Y. The minimum record length was 17. The maximum record length was 570. NOTE: The data set WORK.PUBMEDINFO has 1 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.19 seconds cpu time 0.04 seconds 612 613 data _null_; 614 set pubmedinfo; 615 put (_all_) (=/); 616 run; countfound=1 count=2651 loc=16 locend=27 countchar=2651 NOTE: There were 1 observations read from the data set WORK.PUBMEDINFO. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
What do you want to do differently?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this instead.
data pubmedinfo;
infile y dlm='<';
input @'<Count>' @;
input count @@ ;
run;
data _null_;
set pubmedinfo;
put (_all_) (=/);
run;
count=2651 count=137446 count=93303 NOTE: There were 3 observations read from the data set WORK.PUBMEDINFO.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Note:
Don't place URL's like that with &'s in them inside of double quotes. The SAS macro processor will try to convert anything that looks like a macro variable reference, like &term into the value of the macro variable. Use single quotes instead and the macro processor will ignore the &'s and %'s in the string.
If you need to build the URL from pieces then do so in a data step. You could then use the FILENAME() function to generate your fileref.
data _null_;
url='https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=PNAS[ta]+AND+97[vi]&retstart=6&retmax=6&tool=biomed3';
rc=filename('Y',url,'url');
run;