BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mathias
Quartz | Level 8

Hello,

I need to import google trends data from an url

http://www.google.com/trends/fetchComponent?q=sas&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=today%...

I found on the web a method using a data step :

filename mydata url 'http://www.google.com/trends/fetchComponent?q=sas&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=today%...' ;

data XXXX;

     infile mydata scanover;

          input

               @'"v":' v $8.

               @'"f":"' f $3. ;

     run;

However, because the file has all the records on one line, the data step create only one record.

If I manually enter carriage returns in the json file, I get the intended result.

Does anyone has a solution for this particular problem ?

Or maybe another method to parse the date from this json text file ?

Many thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
Peter_C
Rhodochrosite | Level 12

filename mydata url 'http://www.google.com/trends/fetchComponent?q=sas&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=today%...' ;

data XXXX;

     infile mydata ;

Length f $3 ;

Do cases =1 to 1000 until( f= ' ' ) ;

          input

               @'"v":' v $8.

               @'"f":"' f $3.  @;

         output ;

End ;

Stop ;

     run;

View solution in original post

15 REPLIES 15
data_null__
Jade | Level 19

I specified LRECL and use @@ and got something but I don't know if your input after the @'text" is reading what you need.


filename mydata url 'http://www.google.com/trends/fetchComponent?q=sas&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=today%...' ;
data test;
   infile mydata lrecl=32767;* scanover;
  
input @'"v":' v $8. @'"f":"' f $3. @@;
   list;
  
run;

33        
34         filename mydata url
34       ! 'http://www.google.com/trends/fetchComponent?q=sas&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=today%...' ;
35         data test;
36            infile mydata lrecl=32767;* scanover;
37            input @'"v":' v $8. @'"f":"' f $3. @@;
38            list;
39            run;

NOTE:
The infile MYDATA is:
      Filename=http://www.google.com/trends/fetchComponent?q=sas&cid=TIMESERIES_GRAPH_0&export=
3&geo=BE&date=today%2024-m,
      Local Host Name=...,
      Local Host IP addr=
...,
      Service Hostname Name=lax17s02-in-
f19.1e100.net,
      Service IP addr=
74.125.224.83,
      Service Name=httpd,Service Portno=
80,
      Lrecl=
32767,Recfm=Variable

RULE:     ----+----
1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                     
2         google.visualization.Query.setResponse({"version":"0.6","status":"ok","sig":"976347108","table":{"co
     101  ls"
:[{"id":"date","label":"Date","type":"date","pattern":""},{"id":"query0","label":"sas","type":"nu
     201  mber"
,"pattern":""}],"rows":[{"c":[{"v":new Date(2012,8,30),"f":"Sep 30 \u2013 Oct 6, 2012"},{"v":82
    
301  .0,"f":"82"}]},{"c":[{"v":new Date(2012,9,7),"f":"Oct 7 \u2013 13, 2012"},{"v":78.0,"f":"78"}]},{"c"
    
401  :[{"v":new Date(2012,9,14),"f":"Oct 14 \u2013 20, 2012"},{"v":77.0,"f":"77"}]},{"c":[{"v":new Date(2
2                                                          The SAS System                          02:38 Tuesday, September 30, 2014

RULE:     ----+----
1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                     
    
501  012,9,21),"f":"Oct 21 \u2013 27, 2012"},{"v":75.0,"f":"75"}]},{"c":[{"v":new Date(2012,9,28),"f":"Oc
     601  t 28 \u2013 Nov 3, 2012"
},{"v":72.0,"f":"72"}]},{"c":[{"v":new Date(2012,10,4),"f":"Nov 4 \u2013 10,
     701   2012"
},{"v":65.0,"f":"65"}]},{"c":[{"v":new Date(2012,10,11),"f":"Nov 11 \u2013 17, 2012"},{"v":78.
    
801  0,"f":"78"}]},{"c":[{"v":new Date(2012,10,18),"f":"Nov 18 \u2013 24, 2012"},{"v":83.0,"f":"83"}]},{"
     901  c"
:[{"v":new Date(2012,10,25),"f":"Nov 25 \u2013 Dec 1, 2012"},{"v":72.0,"f":"72"}]},{"c":[{"v":new
   
1001  Date(2012,11,2),"f":"Dec 2 \u2013 8, 2012"},{"v":70.0,"f":"70"}]},{"c":[{"v":new Date(2012,11,9),"f"
   
1101  :"Dec 9 \u2013 15, 2012"},{"v":54.0,"f":"54"}]},{"c":[{"v":new Date(2012,11,16),"f":"Dec 16 \u2013 2
    1201  2, 2012"
},{"v":62.0,"f":"62"}]},{"c":[{"v":new Date(2012,11,23),"f":"Dec 23 \u2013 29, 2012"},{"v":4
   
1301  6.0,"f":"46"}]},{"c":[{"v":new Date(2012,11,30),"f":"Dec 30, 2012 \u2013 Jan 5, 2013"},{"v":59.0,"f"
   
1401  :"59"}]},{"c":[{"v":new Date(2013,0,6),"f":"Jan 6 \u2013 12, 2013"},{"v":77.0,"f":"77"}]},{"c":[{"v"
   
1501  :new Date(2013,0,13),"f":"Jan 13 \u2013 19, 2013"},{"v":66.0,"f":"66"}]},{"c":[{"v":new Date(2013,0,
   
1601  20),"f":"Jan 20 \u2013 26, 2013"},{"v":73.0,"f":"73"}]},{"c":[{"v":new Date(2013,0,27),"f":"Jan 27 \
    1701  u2013 Feb 2, 2013"
},{"v":78.0,"f":"78"}]},{"c":[{"v":new Date(2013,1,3),"f":"Feb 3 \u2013 9, 2013"},
   
1801  {"v":67.0,"f":"67"}]},{"c":[{"v":new Date(2013,1,10),"f":"Feb 10 \u2013 16, 2013"},{"v":70.0,"f":"70
    1901  "
}]},{"c":[{"v":new Date(2013,1,17),"f":"Feb 17 \u2013 23, 2013"},{"v":75.0,"f":"75"}]},{"c":[{"v":n
   
2001  ew Date(2013,1,24),"f":"Feb 24 \u2013 Mar 2, 2013"},{"v":78.0,"f":"78"}]},{"c":[{"v":new Date(2013,2
   
2101  ,3),"f":"Mar 3 \u2013 9, 2013"},{"v":83.0,"f":"83"}]},{"c":[{"v":new Date(2013,2,10),"f":"Mar 10 \u2
    2201  013 17, 2013"
},{"v":66.0,"f":"66"}]},{"c":[{"v":new Date(2013,2,17),"f":"Mar 17 \u2013 23, 2013"},{"
    2301  v"
:72.0,"f":"72"}]},{"c":[{"v":new Date(2013,2,24),"f":"Mar 24 \u2013 30, 2013"},{"v":82.0,"f":"82"}
   
2401  ]},{"c":[{"v":new Date(2013,2,31),"f":"Mar 31 \u2013 Apr 6, 2013"},{"v":65.0,"f":"65"}]},{"c":[{"v":
   
2501  new Date(2013,3,7),"f":"Apr 7 \u2013 13, 2013"},{"v":68.0,"f":"68"}]},{"c":[{"v":new Date(2013,3,14)
   
2601  ,"f":"Apr 14 \u2013 20, 2013"},{"v":94.0,"f":"94"}]},{"c":[{"v":new Date(2013,3,21),"f":"Apr 21 \u20
    2701  13 27, 2013"
},{"v":70.0,"f":"70"}]},{"c":[{"v":new Date(2013,3,28),"f":"Apr 28 \u2013 May 4, 2013"},
   
2801  {"v":72.0,"f":"72"}]},{"c":[{"v":new Date(2013,4,5),"f":"May 5 \u2013 11, 2013"},{"v":59.0,"f":"59"}
   
2901  ]},{"c":[{"v":new Date(2013,4,12),"f":"May 12 \u2013 18, 2013"},{"v":59.0,"f":"59"}]},{"c":[{"v":new
   
3001   Date(2013,4,19),"f":"May 19 \u2013 25, 2013"},{"v":61.0,"f":"61"}]},{"c":[{"v":new Date(2013,4,26),
   
3101  "f":"May 26 \u2013 Jun 1, 2013"},{"v":73.0,"f":"73"}]},{"c":[{"v":new Date(2013,5,2),"f":"Jun 2 \u20
    3201  13 8, 2013"
},{"v":72.0,"f":"72"}]},{"c":[{"v":new Date(2013,5,9),"f":"Jun 9 \u2013 15, 2013"},{"v":7
   
3301  8.0,"f":"78"}]},{"c":[{"v":new Date(2013,5,16),"f":"Jun 16 \u2013 22, 2013"},{"v":80.0,"f":"80"}]},{
   
3401  "c":[{"v":new Date(2013,5,23),"f":"Jun 23 \u2013 29, 2013"},{"v":70.0,"f":"70"}]},{"c":[{"v":new Dat
   
3501  e(2013,5,30),"f":"Jun 30 \u2013 Jul 6, 2013"},{"v":71.0,"f":"71"}]},{"c":[{"v":new Date(2013,6,7),"f
    3601  "
:"Jul 7 \u2013 13, 2013"},{"v":77.0,"f":"77"}]},{"c":[{"v":new Date(2013,6,14),"f":"Jul 14 \u2013 2
    3701  0, 2013"
},{"v":96.0,"f":"96"}]},{"c":[{"v":new Date(2013,6,21),"f":"Jul 21 \u2013 27, 2013"},{"v":85
   
3801  .0,"f":"85"}]},{"c":[{"v":new Date(2013,6,28),"f":"Jul 28 \u2013 Aug 3, 2013"},{"v":82.0,"f":"82"}]}
   
3901  ,{"c":[{"v":new Date(2013,7,4),"f":"Aug 4 \u2013 10, 2013"},{"v":83.0,"f":"83"}]},{"c":[{"v":new Dat
   
4001  e(2013,7,11),"f":"Aug 11 \u2013 17, 2013"},{"v":81.0,"f":"81"}]},{"c":[{"v":new Date(2013,7,18),"f":
   
4101  "Aug 18 \u2013 24, 2013"},{"v":77.0,"f":"77"}]},{"c":[{"v":new Date(2013,7,25),"f":"Aug 25 \u2013 31
    4201  , 2013"
},{"v":81.0,"f":"81"}]},{"c":[{"v":new Date(2013,8,1),"f":"Sep 1 \u2013 7, 2013"},{"v":87.0,"
    4301  f"
:"87"}]},{"c":[{"v":new Date(2013,8,8),"f":"Sep 8 \u2013 14, 2013"},{"v":79.0,"f":"79"}]},{"c":[{"
    4401  v"
:new Date(2013,8,15),"f":"Sep 15 \u2013 21, 2013"},{"v":81.0,"f":"81"}]},{"c":[{"v":new Date(2013,
   
4501  8,22),"f":"Sep 22 \u2013 28, 2013"},{"v":74.0,"f":"74"}]},{"c":[{"v":new Date(2013,8,29),"f":"Sep 29
    4601   \u2013 Oct 5, 2013"
},{"v":82.0,"f":"82"}]},{"c":[{"v":new Date(2013,9,6),"f":"Oct 6 \u2013 12, 2013
    4701  "
},{"v":68.0,"f":"68"}]},{"c":[{"v":new Date(2013,9,13),"f":"Oct 13 \u2013 19, 2013"},{"v":68.0,"f":
   
4801  "68"}]},{"c":[{"v":new Date(2013,9,20),"f":"Oct 20 \u2013 26, 2013"},{"v":92.0,"f":"92"}]},{"c":[{"v
    4901  "
:new Date(2013,9,27),"f":"Oct 27 \u2013 Nov 2, 2013"},{"v":77.0,"f":"77"}]},{"c":[{"v":new Date(201
   
5001  3,10,3),"f":"Nov 3 \u2013 9, 2013"},{"v":77.0,"f":"77"}]},{"c":[{"v":new Date(2013,10,10),"f":"Nov 1
    5101  0 \u2013 16, 2013"
},{"v":68.0,"f":"68"}]},{"c":[{"v":new Date(2013,10,17),"f":"Nov 17 \u2013 23, 201
    5201  3"
},{"v":65.0,"f":"65"}]},{"c":[{"v":new Date(2013,10,24),"f":"Nov 24 \u2013 30, 2013"},{"v":62.0,"f
    5301  "
:"62"}]},{"c":[{"v":new Date(2013,11,1),"f":"Dec 1 \u2013 7, 2013"},{"v":71.0,"f":"71"}]},{"c":[{"v
    5401  "
:new Date(2013,11,8),"f":"Dec 8 \u2013 14, 2013"},{"v":67.0,"f":"67"}]},{"c":[{"v":new Date(2013,11
   
5501  ,15),"f":"Dec 15 \u2013 21, 2013"},{"v":55.0,"f":"55"}]},{"c":[{"v":new Date(2013,11,22),"f":"Dec 22
    5601   \u2013 28, 2013"
},{"v":51.0,"f":"51"}]},{"c":[{"v":new Date(2013,11,29),"f":"Dec 29, 2013 \u2013 Ja
    5701  n 4, 2014"
},{"v":53.0,"f":"53"}]},{"c":[{"v":new Date(2014,0,5),"f":"Jan 5 \u2013 11, 2014"},{"v":63
   
5801  .0,"f":"63"}]},{"c":[{"v":new Date(2014,0,12),"f":"Jan 12 \u2013 18, 2014"},{"v":84.0,"f":"84"}]},{"
    5901  c"
:[{"v":new Date(2014,0,19),"f":"Jan 19 \u2013 25, 2014"},{"v":70.0,"f":"70"}]},{"c":[{"v":new Date
   
6001  (2014,0,26),"f":"Jan 26 \u2013 Feb 1, 2014"},{"v":69.0,"f":"69"}]},{"c":[{"v":new Date(2014,1,2),"f"
   
6101  :"Feb 2 \u2013 8, 2014"},{"v":68.0,"f":"68"}]},{"c":[{"v":new Date(2014,1,9),"f":"Feb 9 \u2013 15, 2
3                                                          The SAS System                          02:38 Tuesday, September 30, 2014

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                     
    6201  014"
},{"v":71.0,"f":"71"}]},{"c":[{"v":new Date(2014,1,16),"f":"Feb 16 \u2013 22, 2014"},{"v":73.0,"
    6301  f"
:"73"}]},{"c":[{"v":new Date(2014,1,23),"f":"Feb 23 \u2013 Mar 1, 2014"},{"v":72.0,"f":"72"}]},{"c
    6401  "
:[{"v":new Date(2014,2,2),"f":"Mar 2 \u2013 8, 2014"},{"v":72.0,"f":"72"}]},{"c":[{"v":new Date(201
   
6501  4,2,9),"f":"Mar 9 \u2013 16, 2014"},{"v":68.0,"f":"68"}]},{"c":[{"v":new Date(2014,2,16),"f":"Mar 16
    6601   \u2013 22, 2014"
},{"v":74.0,"f":"74"}]},{"c":[{"v":new Date(2014,2,23),"f":"Mar 23 \u2013 29, 2014"
   
6701  },{"v":71.0,"f":"71"}]},{"c":[{"v":new Date(2014,2,30),"f":"Mar 30 \u2013 Apr 5, 2014"},{"v":84.0,"f
    6801  "
:"84"}]},{"c":[{"v":new Date(2014,3,6),"f":"Apr 6 \u2013 12, 2014"},{"v":76.0,"f":"76"}]},{"c":[{"v
    6901  "
:new Date(2014,3,13),"f":"Apr 13 \u2013 19, 2014"},{"v":84.0,"f":"84"}]},{"c":[{"v":new Date(2014,3
   
7001  ,20),"f":"Apr 20 \u2013 26, 2014"},{"v":79.0,"f":"79"}]},{"c":[{"v":new Date(2014,3,27),"f":"Apr 27
    7101  \u2013 May 3, 2014"
},{"v":76.0,"f":"76"}]},{"c":[{"v":new Date(2014,4,4),"f":"May 4 \u2013 10, 2014"
   
7201  },{"v":71.0,"f":"71"}]},{"c":[{"v":new Date(2014,4,11),"f":"May 11 \u2013 17, 2014"},{"v":76.0,"f":"
    7301  76"
}]},{"c":[{"v":new Date(2014,4,18),"f":"May 18 \u2013 24, 2014"},{"v":84.0,"f":"84"}]},{"c":[{"v"
   
7401  :new Date(2014,4,25),"f":"May 25 \u2013 31, 2014"},{"v":83.0,"f":"83"}]},{"c":[{"v":new Date(2014,5,
   
7501  1),"f":"Jun 1 \u2013 7, 2014"},{"v":89.0,"f":"89"}]},{"c":[{"v":new Date(2014,5,8),"f":"Jun 8 \u2013
    7601   14, 2014"
},{"v":81.0,"f":"81"}]},{"c":[{"v":new Date(2014,5,15),"f":"Jun 15 \u2013 21, 2014"},{"v":
   
7701  69.0,"f":"69"}]},{"c":[{"v":new Date(2014,5,22),"f":"Jun 22 \u2013 28, 2014"},{"v":97.0,"f":"97"}]},
   
7801  {"c":[{"v":new Date(2014,5,29),"f":"Jun 29 \u2013 Jul 5, 2014"},{"v":86.0,"f":"86"}]},{"c":[{"v":new
   
7901   Date(2014,6,6),"f":"Jul 6 \u2013 12, 2014"},{"v":87.0,"f":"87"}]},{"c":[{"v":new Date(2014,6,13),"f
    8001  "
:"Jul 13 \u2013 19, 2014"},{"v":79.0,"f":"79"}]},{"c":[{"v":new Date(2014,6,20),"f":"Jul 20 \u2013
    8101  26, 2014"
},{"v":83.0,"f":"83"}]},{"c":[{"v":new Date(2014,6,27),"f":"Jul 27 \u2013 Aug 2, 2014"},{"v
    8201  "
:77.0,"f":"77"}]},{"c":[{"v":new Date(2014,7,3),"f":"Aug 3 \u2013 9, 2014"},{"v":90.0,"f":"90"}]},{
   
8301  "c":[{"v":new Date(2014,7,10),"f":"Aug 10 \u2013 16, 2014"},{"v":68.0,"f":"68"}]},{"c":[{"v":new Dat
   
8401  e(2014,7,17),"f":"Aug 17 \u2013 23, 2014"},{"v":76.0,"f":"76"}]},{"c":[{"v":new Date(2014,7,24),"f":
   
8501  "Aug 24 \u2013 30, 2014"},{"v":83.0,"f":"83"}]},{"c":[{"v":new Date(2014,7,31),"f":"Aug 31 \u2013 Se
    8601  p 6, 2014"
},{"v":68.0,"f":"68"}]},{"c":[{"v":new Date(2014,8,7),"f":"Sep 7 \u2013 13, 2014"},{"v":10
   
8701  0.0,"f":"100"}]},{"c":[{"v":new Date(2014,8,14),"f":"Sep 14 \u2013 20, 2014"},{"v":97.0,"f":"97"}]},
   
8801  {"c":[{"v":new Date(2014,8,21),"f":"Sep 21 \u2013 27, 2014"},{"v":85.0,"f":"85"}]},{"c":[{"v":new Da
   
8901  te(2014,8,28),"f":"Sep 28 \u2013 Oct 4, 2014"},{"v":72.0,"f":"72"}]}]}}); 8973
NOTE: LOST
CARD.
v=
72.0,"f" f=  _ERROR_=1 _N_=106
NOTE:
2 records were read from the infile MYDATA.
      The minimum record length was
22.
      The maximum record length was
8973.
NOTE: SAS went to a new line when INPUT @
'CHARACTER_STRING' scanned past the end of a line.
NOTE: The data set WORK.TEST has
105 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time          
0.75 seconds
      cpu time           
0.02 seconds
Peter_C
Rhodochrosite | Level 12

Have you heard about the OUTPUT  statement?

data_null__
Jade | Level 19

Peter Crawford wrote:

Have you heard about the OUTPUT  statement?

Heard about it but I try not to use it. :smileygrin:  I thought implied output would suffice.

Peter_C
Rhodochrosite | Level 12

Can't use implicit here because there is no natural end-if-line (so the recfm=f approach might split a required value) and the implied OUTPUT occurs at end of step where buffer was getting released.

Of course, the buffer could be retained through data step iteration if INPUT  uses double trailing @ symbols. In that way the implicit output would do, but the step would need something to terminate at "end of buffer" - like the condition in my UNTIL

Peter_C
Rhodochrosite | Level 12

filename mydata url 'http://www.google.com/trends/fetchComponent?q=sas&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=today%...' ;

data XXXX;

     infile mydata ;

Length f $3 ;

Do cases =1 to 1000 until( f= ' ' ) ;

          input

               @'"v":' v $8.

               @'"f":"' f $3.  @;

         output ;

End ;

Stop ;

     run;

mathias
Quartz | Level 8

Thanks for your help

Is this code working for you ?

On my side, it just gives an empty dataset with 3 columns (f cases v).

filename mydata url 'http://www.google.com/trends/fetchComponent?q=sas&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=today%...

data XXXX;

  infile mydata;

  Length f $3;

  Do cases =1 to 1000 until( f= ' ' );

  input

  @'"v":' v $8.

  @'"f":"' f $3.  @;

  output;

  End;

  Stop;

run;

Peter_C
Rhodochrosite | Level 12

I was trying an hour ago and it produced 105 obs. However just after, the server stopped providing the data.

Maybe it will come back

mathias
Quartz | Level 8

I played with the filename and it seems to work now, I need to use a local copy of the text.

Thank you very much !

mathias
Quartz | Level 8

Is it possible to test if the filename was incorrectly fetched ?

like if the text contains <!DOCTYPE html>, just don't use it

filename mydata url 'http://www.google.com/trends/fetchComponent?q=tekenbeet&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=...

/* if contains <!DOCTYPE html>

then do not copy mydata in gTrends.json

*/

data _null_;

  infile mydata ;

  file '...\gTrends.json';

    input;

    put _infile_;

run;

filename copydata '...\gTrends.json';

data_null__
Jade | Level 19

if find(_infile_,' <!DOCTYPE html>','I') then stop; *perhaps write a message to the log;

art297
Opal | Level 21

While this question has already been marked as having been answered, I have to wonder whether the following comes closer to capturing the data that you are actually wanting to retrieve:

filename mydata3 url 'http://www.google.com/trends/fetchComponent?q=sas&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=today%...' ;

data test (drop=chardate);

   infile mydata lrecl=32767;

   format date date9.;

   input @'{"c":[{"v":new Date(' chardate : $10. @',{"v":' v 4. @@;

   date=mdy(input(scan(chardate,2,','),2.)+1,scan(compress(chardate,')'),3,','),

    scan(chardate,1,','));

run;

mathias
Quartz | Level 8

This was indeed what I intended.

But I couldn't ask for the whole solution without searching more by myself Smiley Happy.

And your solution is shorter than what I have so far, Thanks

    data scannedStrings;

        infile data3 lrecl=32760 scanover;

        do cases=1 to 1001 /*until(value='')*/;

            input

                @'new Date(' dateString $11. @;

            input

                @'},{"v":' valueString $4. @;

            output;

            *put '---' cases ' ---' valueString;

        end;

        drop cases;

        stop;

    run;

    data GoogleTrends;

        format date IS8601DA10. ;

        set scannedStrings;

        year=scan(dateString,1,',');

        month=scan(dateString,2,',')+1;

        day=scan(scan(dateString,3,','),1,')');

        date=mdy(month,day,year);

        value=input(scan(valueString,1,','),best5.);

        search="&search";

        diff=date-lag(date);

        keep date value search;

        run;

I still havn't found a solution to google's request limit. I could not run your code twice.

FriedEgg
SAS Employee

Let's combine the two threads you started about using Google Trends:

1. We want to parse the embedded JSON in the javascript executable response.

2. We don't want to be identified as robots and get quickly shut down for quota overflow

Using PROC GROOVY we can build a robust api interface to accomplish both things (you will still want to be fairly mindful not to run this often though as you will still probably get recognized as a bot)

/** GoogleTrends4SAS

*

* Description: An interface for collecting search results from the unofficial Google Trends API using PROC GROOVY

*

* Note: YOU MUST ALTER THE GROOVY CODE MANUALLY.

* Find the value of PREF cookie on the machine executing the code and enter it below.

* Note: As writted here for SAS 9.4 on Linux x64, available versions of Groovy for sasjar will be dependent on version and OS

*/

filename cp temp;

filename ivy "%sysfunc(pathname(work,l))/ivy.jar";

proc http

  method = 'get'

  url = 'http://central.maven.org/maven2/org/apache/ivy/ivy/2.3.0-rc1/ivy-2.3.0-rc1.jar'

  out = ivy

  ;

run;

proc groovy classpath=cp;

  add classpath=ivy;

  add sasjar="groovy_2.1.3" version="2.1.3.0_SAS_20130517000930";

  submit parseonly;

  import groovyx.net.http.HTTPBuilder

  import groovyx.net.http.URIBuilder

  import groovy.json.JsonSlurper

  import java.util.regex.Pattern

  import java.util.regex.Matcher

  import static groovyx.net.http.ContentType.TEXT

  import static groovyx.net.http.Method.GET

  import org.apache.http.impl.cookie.BasicClientCookie

  @Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.7')

  class GoogleTrendApi {

  def search(String q, String geo) throws Exception {

  def trends = []

  def url = new URIBuilder('http://www.google.com/').with {

  path = 'trends/fetchComponent'

  query = [q: q, geo: geo, cid: "TIMESERIES_GRAPH_0", export: 3]

  return it

  }.toString()

  def api = new HTTPBuilder(url)

  // lookup PREF cookie value and enter below

  def prefCookieVal="<<<YOUR_PREF_COOKIE_VALUE>>>"

  def cookie = new BasicClientCookie('PREF', prefCookieVal)

  cookie["domain"] = "google.com"

  cookie["path"] = "/"

  api.client.cookieStore.addCookie cookie

  api.request(GET, TEXT) { req ->

  headers.'User-Agent' = "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0"

  response.failure = { resp ->

  resp.headers.each { println "${it.name} : ${it.value}" }

  throw new Exception()

  }

 

  response.success = { resp, reader ->

  String text = reader.text

  Matcher m = Pattern.compile("^[^\\(]+\\((\\{.*\\})\\);\$").matcher(text.replaceAll("(new\\ Date\\((\\d{4}),(\\d{1,2}),(\\d{1,2})\\))", "{\"year\": \"\$2\", \"month\": \"\$3\", \"day\": \"\$4\"}"))

  if (m.find()) {

  def slurper = new JsonSlurper()

  def json = slurper.parseText(m.group(1))

  json.table.rows.each {

  trends << [

  date : new Date(Integer.parseInt(it.c[0].v.year)-1900, Integer.parseInt(it.c[0].v.month), Integer.parseInt(it.c[0].v.day)),

  val : Integer.parseInt(it.c[1].f)

  ]

  }

  }

  else {

  throw new Exception()

  }

  }

  }

  return trends

  }

  }

  endsubmit;

  submit parseonly;

  import java.text.SimpleDateFormat;

  import java.util.ArrayList;

  import java.util.Iterator;

  import java.util.LinkedHashMap;

  public class GoogleTrends4SAS {

  public String q = "";

  public String geo = "";

 

  public void main() throws Exception {

  GoogleTrendApi api = new GoogleTrendApi();

  trends = ((ArrayList) (api.search(q, geo)));

  iter = trends.iterator();

  }

 

  public boolean hasNext() {

  return iter.hasNext();

  }

 

  public void getNext() {

  trend = ((LinkedHashMap) (iter.next()));

  }

 

  public String getDateString() {

  SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");

  return sdf.format(trend.get("date"));

  }

 

  public int getVal() {

  return ((int) (trend.get("val")));

  }

 

  protected ArrayList trends;

  protected Iterator iter;

  protected LinkedHashMap trend;

  }

  endsubmit;

run;

options set=classpath "%sysfunc(pathname(cp,f))";

data google_trends;

  dcl javaobj GoogleTrends("GoogleTrends4SAS");

  GoogleTrends.setStringField("q", "sas");

  GoogleTrends.setStringField("geo", "US");

  rc = GoogleTrends.callVoidMethod("main");

 

  rc = GoogleTrends.ExceptionCheck(e); *usage exceeded;

  if (e) then do;

  put 'exception occurred';

  goto exit;

  end;

  GoogleTrends.callBooleanMethod("hasNext", rc);

  do _n_=1 by 1 while(rc);

  GoogleTrends.callVoidMethod("getNext");

  format date date9. DateString $10.;

  GoogleTrends.callStringMethod("getDateString", DateString);

  date = input(DateString, yymmdd10.);

  format val comma8.;

  GoogleTrends.callIntMethod("getVal", val);

  output;

  GoogleTrends.callBooleanMethod("hasNext", rc);

  end;

  keep date val;

  exit:

  stop;

run;

An interface for collecting search results from the unofficial Google Trends API using PROC GROOVY i...

It is also possible to add the additional header information to a PROC HTTP call through the HEADERIN argument

mathias
Quartz | Level 8

Thanks !

Where can I find <<<YOUR_PREF_COOKIE_VALUE>>> ?

The machine running the code is a SASBI server (windows server 2008 R2).

To which I'm connected through An enterprise Guide connection from my laptop.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 3012 views
  • 3 likes
  • 5 in conversation