SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
bhallissey
Calcite | Level 5

 

This code used to work and no longer does.  I'm not familiar with parsing/accessing url results so cannot even figure out how to debug.  Can anyone offer any insights to see what is being returned, to debug, to fix?!  Thank you.

 

filename y url "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=PNAS[ta]+AND+97[vi]&retsta...";


data pubmedinfo;
length countfound count loc locend 8. countchar $30.;
retain count countfound 0 countchar "TTTT";
infile y lrecl=32000 pad;
input;
     if countfound = 0 then do;
          loc = find(_infile_,'<Count>');
          locend = find(_infile_,'</Count>');
     end;
     if loc not in (0 .) then do;
          countchar = substr(_infile_,loc+7,locend - (loc+7));
          count = countchar;
          countfound = 1;
          output;
     end;
run;

4 REPLIES 4
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

your statement that it used to work and no longer does is not clear.  How do you know that it was working if you are unfamiliar with parsing?

Tom
Super User Tom
Super User

What does working mean?

If I try to open that URL I get this XML displayed. Notice that it has multiple <COUNT></COUNT> tags.

<eSearchResult>
<Count>2651</Count>
<RetMax>6</RetMax>
<RetStart>6</RetStart>
<IdList>
<Id>11121077</Id>
<Id>11121076</Id>
<Id>11121075</Id>
<Id>11121074</Id>
<Id>11121073</Id>
<Id>11121072</Id>
</IdList>
<TranslationSet>
<Translation>
<From>PNAS[ta]</From>
<To>"Proc Natl Acad Sci U S A"[Journal]</To>
</Translation>
</TranslationSet>
<TranslationStack>
<TermSet>
<Term>"Proc Natl Acad Sci U S A"[Journal]</Term>
<Field>Journal</Field>
<Count>137446</Count>
<Explode>N</Explode>
</TermSet>
<TermSet>
<Term>97[vi]</Term>
<Field>vi</Field>
<Count>93303</Count>
<Explode>N</Explode>
</TermSet>
<OP>AND</OP>
</TranslationStack>
<QueryTranslation>"Proc Natl Acad Sci U S A"[Journal] AND 97[vi]</QueryTranslation>
</eSearchResult>

If I read it with SAS data step I see that it is actually only 10 list of text.  The Browser was just breaking it into multiple lines to make it easier for humans to look at.

543   data _null_;
544     infile y ;
545     input ;
546     put _infile_;
547   run;

NOTE: The infile Y is:

      Filename=https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=PNAS[ta]+AND+97[vi
      ]&retstart=6&retmax=6&tool=biomed3,
      Local Host Name=AMRL20L6F1E4992,
      Local Host IP addr=fe80::c856:15a5:8ebd:8fdc%7,
      Service Hostname Name=www.ncbi.nlm.nih.gov,
      Service IP addr=130.14.29.110,Service Name=N/A,
      Service Portno=443,Lrecl=32767,Recfm=Variable

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD esearch 20060628//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/d
td/20060628/esearch.dtd">
<eSearchResult><Count>2651</Count><RetMax>6</RetMax><RetStart>6</RetStart><IdList>
<Id>11121077</Id>
<Id>11121076</Id>
<Id>11121075</Id>
<Id>11121074</Id>
<Id>11121073</Id>
<Id>11121072</Id>
</IdList><TranslationSet><Translation>     <From>PNAS[ta]</From>     <To>"Proc Natl Acad Sci U S A"[Journal
]</To>    </Translation></TranslationSet><TranslationStack>   <TermSet>    <Term>"Proc Natl Acad Sci U S A"
[Journal]</Term>    <Field>Journal</Field>    <Count>137446</Count>    <Explode>N</Explode>   </TermSet>
<TermSet>    <Term>97[vi]</Term>    <Field>vi</Field>    <Count>93303</Count>    <Explode>N</Explode>   </T
ermSet>   <OP>AND</OP>  </TranslationStack><QueryTranslation>"Proc Natl Acad Sci U S A"[Journal] AND 97[vi]
</QueryTranslation></eSearchResult>
NOTE: 10 records were read from the infile Y.
      The minimum record length was 17.
      The maximum record length was 570.

And if I run your program it "works" and finds the first COUNT value.

596   data pubmedinfo;
597   length countfound count loc locend 8. countchar $30.;
598   retain count countfound 0 countchar "TTTT";
599   infile y lrecl=32000 pad;
600   input;
601        if countfound = 0 then do;
602             loc = find(_infile_,'<Count>');
603             locend = find(_infile_,'</Count>');
604        end;
605        if loc not in (0 .) then do;
606             countchar = substr(_infile_,loc+7,locend - (loc+7));
607             count = countchar;
608             countfound = 1;
609             output;
610        end;
611   run;

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
      607:19
NOTE: The infile Y is:

      Filename=https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=PNAS[ta]+AND+97[vi
      ]&retstart=6&retmax=6&tool=biomed3,
      Local Host Name=AMRL20L6F1E4992,
      Local Host IP addr=fe80::c856:15a5:8ebd:8fdc%7,
      Service Hostname Name=www.ncbi.nlm.nih.gov,
      Service IP addr=130.14.29.110,Service Name=N/A,
      Service Portno=443,Lrecl=32000,Recfm=Variable

NOTE: 10 records were read from the infile Y.
      The minimum record length was 17.
      The maximum record length was 570.
NOTE: The data set WORK.PUBMEDINFO has 1 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.19 seconds
      cpu time            0.04 seconds


612
613   data _null_;
614    set pubmedinfo;
615    put (_all_) (=/);
616   run;


countfound=1
count=2651
loc=16
locend=27
countchar=2651
NOTE: There were 1 observations read from the data set WORK.PUBMEDINFO.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

What do you want to do differently?

Tom
Super User Tom
Super User

Try this instead.

data pubmedinfo;
  infile y dlm='<';
  input @'<Count>' @;
  input count @@ ;
run;

data _null_;
 set pubmedinfo;
 put (_all_) (=/);
run;

 

count=2651

count=137446

count=93303
NOTE: There were 3 observations read from the data set WORK.PUBMEDINFO.

 

Tom
Super User Tom
Super User

Note: 

 

Don't place URL's like that with &'s in them inside of double quotes. The SAS macro processor will try to convert anything that looks like a macro variable reference, like &term into the value of the macro variable.  Use single quotes instead and the macro processor will ignore the &'s and %'s in the string.

 

If you need to build the URL from pieces then do so in a data step.  You could then use the FILENAME() function to generate your fileref.

data _null_;
  url='https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=PNAS[ta]+AND+97[vi]&retstart=6&retmax=6&tool=biomed3';
  rc=filename('Y',url,'url');
run;

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1015 views
  • 0 likes
  • 3 in conversation