Hello,
I need to import google trends data from an url
I found on the web a method using a data step :
filename mydata url 'http://www.google.com/trends/fetchComponent?q=sas&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=today%...' ;
data XXXX;
infile mydata scanover;
input
@'"v":' v $8.
@'"f":"' f $3. ;
run;
However, because the file has all the records on one line, the data step create only one record.
If I manually enter carriage returns in the json file, I get the intended result.
Does anyone has a solution for this particular problem ?
Or maybe another method to parse the date from this json text file ?
Many thanks,
filename mydata url 'http://www.google.com/trends/fetchComponent?q=sas&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=today%...' ;
data XXXX;
infile mydata ;
Length f $3 ;
Do cases =1 to 1000 until( f= ' ' ) ;
input
@'"v":' v $8.
@'"f":"' f $3. @;
output ;
End ;
Stop ;
run;
I specified LRECL and use @@ and got something but I don't know if your input after the @'text" is reading what you need.
Have you heard about the OUTPUT statement?
Peter Crawford wrote:
Have you heard about the OUTPUT statement?
Heard about it but I try not to use it. :smileygrin: I thought implied output would suffice.
Can't use implicit here because there is no natural end-if-line (so the recfm=f approach might split a required value) and the implied OUTPUT occurs at end of step where buffer was getting released.
Of course, the buffer could be retained through data step iteration if INPUT uses double trailing @ symbols. In that way the implicit output would do, but the step would need something to terminate at "end of buffer" - like the condition in my UNTIL
filename mydata url 'http://www.google.com/trends/fetchComponent?q=sas&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=today%...' ;
data XXXX;
infile mydata ;
Length f $3 ;
Do cases =1 to 1000 until( f= ' ' ) ;
input
@'"v":' v $8.
@'"f":"' f $3. @;
output ;
End ;
Stop ;
run;
Thanks for your help
Is this code working for you ?
On my side, it just gives an empty dataset with 3 columns (f cases v).
filename mydata url 'http://www.google.com/trends/fetchComponent?q=sas&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=today%...
data XXXX;
infile mydata;
Length f $3;
Do cases =1 to 1000 until( f= ' ' );
input
@'"v":' v $8.
@'"f":"' f $3. @;
output;
End;
Stop;
run;
I was trying an hour ago and it produced 105 obs. However just after, the server stopped providing the data.
Maybe it will come back
I played with the filename and it seems to work now, I need to use a local copy of the text.
Thank you very much !
Is it possible to test if the filename was incorrectly fetched ?
like if the text contains <!DOCTYPE html>, just don't use it
filename mydata url 'http://www.google.com/trends/fetchComponent?q=tekenbeet&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=...
/* if contains <!DOCTYPE html>
then do not copy mydata in gTrends.json
*/
data _null_;
infile mydata ;
file '...\gTrends.json';
input;
put _infile_;
run;
filename copydata '...\gTrends.json';
if find(_infile_,' <!DOCTYPE html>','I') then stop; *perhaps write a message to the log;
While this question has already been marked as having been answered, I have to wonder whether the following comes closer to capturing the data that you are actually wanting to retrieve:
filename mydata3 url 'http://www.google.com/trends/fetchComponent?q=sas&cid=TIMESERIES_GRAPH_0&export=3&geo=BE&date=today%...' ;
data test (drop=chardate);
infile mydata lrecl=32767;
format date date9.;
input @'{"c":[{"v":new Date(' chardate : $10. @',{"v":' v 4. @@;
date=mdy(input(scan(chardate,2,','),2.)+1,scan(compress(chardate,')'),3,','),
scan(chardate,1,','));
run;
This was indeed what I intended.
But I couldn't ask for the whole solution without searching more by myself .
And your solution is shorter than what I have so far, Thanks
data scannedStrings;
infile data3 lrecl=32760 scanover;
do cases=1 to 1001 /*until(value='')*/;
input
@'new Date(' dateString $11. @;
input
@'},{"v":' valueString $4. @;
output;
*put '---' cases ' ---' valueString;
end;
drop cases;
stop;
run;
data GoogleTrends;
format date IS8601DA10. ;
set scannedStrings;
year=scan(dateString,1,',');
month=scan(dateString,2,',')+1;
day=scan(scan(dateString,3,','),1,')');
date=mdy(month,day,year);
value=input(scan(valueString,1,','),best5.);
search="&search";
diff=date-lag(date);
keep date value search;
run;
I still havn't found a solution to google's request limit. I could not run your code twice.
Let's combine the two threads you started about using Google Trends:
1. We want to parse the embedded JSON in the javascript executable response.
2. We don't want to be identified as robots and get quickly shut down for quota overflow
Using PROC GROOVY we can build a robust api interface to accomplish both things (you will still want to be fairly mindful not to run this often though as you will still probably get recognized as a bot)
/** GoogleTrends4SAS
*
* Description: An interface for collecting search results from the unofficial Google Trends API using PROC GROOVY
*
* Note: YOU MUST ALTER THE GROOVY CODE MANUALLY.
* Find the value of PREF cookie on the machine executing the code and enter it below.
* Note: As writted here for SAS 9.4 on Linux x64, available versions of Groovy for sasjar will be dependent on version and OS
*/
filename cp temp;
filename ivy "%sysfunc(pathname(work,l))/ivy.jar";
proc http
method = 'get'
url = 'http://central.maven.org/maven2/org/apache/ivy/ivy/2.3.0-rc1/ivy-2.3.0-rc1.jar'
out = ivy
;
run;
proc groovy classpath=cp;
add classpath=ivy;
add sasjar="groovy_2.1.3" version="2.1.3.0_SAS_20130517000930";
submit parseonly;
import groovyx.net.http.HTTPBuilder
import groovyx.net.http.URIBuilder
import groovy.json.JsonSlurper
import java.util.regex.Pattern
import java.util.regex.Matcher
import static groovyx.net.http.ContentType.TEXT
import static groovyx.net.http.Method.GET
import org.apache.http.impl.cookie.BasicClientCookie
@Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.7')
class GoogleTrendApi {
def search(String q, String geo) throws Exception {
def trends = []
def url = new URIBuilder('http://www.google.com/').with {
path = 'trends/fetchComponent'
query = [q: q, geo: geo, cid: "TIMESERIES_GRAPH_0", export: 3]
return it
}.toString()
def api = new HTTPBuilder(url)
// lookup PREF cookie value and enter below
def prefCookieVal="<<<YOUR_PREF_COOKIE_VALUE>>>"
def cookie = new BasicClientCookie('PREF', prefCookieVal)
cookie["domain"] = "google.com"
cookie["path"] = "/"
api.client.cookieStore.addCookie cookie
api.request(GET, TEXT) { req ->
headers.'User-Agent' = "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0"
response.failure = { resp ->
resp.headers.each { println "${it.name} : ${it.value}" }
throw new Exception()
}
response.success = { resp, reader ->
String text = reader.text
Matcher m = Pattern.compile("^[^\\(]+\\((\\{.*\\})\\);\$").matcher(text.replaceAll("(new\\ Date\\((\\d{4}),(\\d{1,2}),(\\d{1,2})\\))", "{\"year\": \"\$2\", \"month\": \"\$3\", \"day\": \"\$4\"}"))
if (m.find()) {
def slurper = new JsonSlurper()
def json = slurper.parseText(m.group(1))
json.table.rows.each {
trends << [
date : new Date(Integer.parseInt(it.c[0].v.year)-1900, Integer.parseInt(it.c[0].v.month), Integer.parseInt(it.c[0].v.day)),
val : Integer.parseInt(it.c[1].f)
]
}
}
else {
throw new Exception()
}
}
}
return trends
}
}
endsubmit;
submit parseonly;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.LinkedHashMap;
public class GoogleTrends4SAS {
public String q = "";
public String geo = "";
public void main() throws Exception {
GoogleTrendApi api = new GoogleTrendApi();
trends = ((ArrayList) (api.search(q, geo)));
iter = trends.iterator();
}
public boolean hasNext() {
return iter.hasNext();
}
public void getNext() {
trend = ((LinkedHashMap) (iter.next()));
}
public String getDateString() {
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
return sdf.format(trend.get("date"));
}
public int getVal() {
return ((int) (trend.get("val")));
}
protected ArrayList trends;
protected Iterator iter;
protected LinkedHashMap trend;
}
endsubmit;
run;
options set=classpath "%sysfunc(pathname(cp,f))";
data google_trends;
dcl javaobj GoogleTrends("GoogleTrends4SAS");
GoogleTrends.setStringField("q", "sas");
GoogleTrends.setStringField("geo", "US");
rc = GoogleTrends.callVoidMethod("main");
rc = GoogleTrends.ExceptionCheck(e); *usage exceeded;
if (e) then do;
put 'exception occurred';
goto exit;
end;
GoogleTrends.callBooleanMethod("hasNext", rc);
do _n_=1 by 1 while(rc);
GoogleTrends.callVoidMethod("getNext");
format date date9. DateString $10.;
GoogleTrends.callStringMethod("getDateString", DateString);
date = input(DateString, yymmdd10.);
format val comma8.;
GoogleTrends.callIntMethod("getVal", val);
output;
GoogleTrends.callBooleanMethod("hasNext", rc);
end;
keep date val;
exit:
stop;
run;
It is also possible to add the additional header information to a PROC HTTP call through the HEADERIN argument
Thanks !
Where can I find <<<YOUR_PREF_COOKIE_VALUE>>> ?
The machine running the code is a SASBI server (windows server 2008 R2).
To which I'm connected through An enterprise Guide connection from my laptop.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.