I've been trying to write code to scrape the number of hits within a certain date range on google. I've done this by inserting the date into the google search query. When I copy and paste the link it produces, it gives me the correct query, but when the code runs it, I keep getting the number of hits for the search without the date range. I'm not sure what I'm doing wrong here.
from bs4 import BeautifulSoup
import requests
import re
from datetime import date, timedelta
day = date.today()
friday = day - timedelta(days=day.weekday() + 3) + timedelta(days=7)
word = "debt"
for n in range(0,32,7):
date_end = friday - timedelta(days=n)
date_beg = date_end - timedelta(days=4)
link_beg = "https://www.google.com/search?q=%s&source=lnt&tbs=cdr%%3A1%%2Ccd_min%%3A" % (word)
link_date = "%s%%2F%s%%2F%s%%2Ccd_max%%3A%s%%2F%s%%2F%s&tbm=&gws_rd=ssl" % (str(date_beg.month),str(date_beg.day),str(date_beg.year),str(date_end.month),str(date_end.day),str(date_end.year))
url = link_beg + link_date
print url,
print "\t",
r = requests.get(url)
soup = BeautifulSoup(r.content)
products = soup.findAll("div", id = "resultStats")
result = str(products[0])
results = re.findall(r'\d+', result)
number = ''.join([str(i) for i in results])
print number
For example, one of the links that is produced is this:
Google Search for "debt" in date range "3/9/2015 to 3/13/2015"
The hits produced should be: 39,700,000
But instead, it spits out: 293,000,000 (which is what just a generic search produces)