1

I've been trying to write code to scrape the number of hits within a certain date range on google. I've done this by inserting the date into the google search query. When I copy and paste the link it produces, it gives me the correct query, but when the code runs it, I keep getting the number of hits for the search without the date range. I'm not sure what I'm doing wrong here.

from bs4 import BeautifulSoup
import requests
import re
from datetime import date, timedelta


day = date.today()
friday = day - timedelta(days=day.weekday() + 3) + timedelta(days=7)

word = "debt"

for n in range(0,32,7):
    date_end = friday - timedelta(days=n)
    date_beg = date_end - timedelta(days=4)

    link_beg = "https://www.google.com/search?q=%s&source=lnt&tbs=cdr%%3A1%%2Ccd_min%%3A" % (word)
    link_date = "%s%%2F%s%%2F%s%%2Ccd_max%%3A%s%%2F%s%%2F%s&tbm=&gws_rd=ssl" % (str(date_beg.month),str(date_beg.day),str(date_beg.year),str(date_end.month),str(date_end.day),str(date_end.year))

    url = link_beg + link_date

    print url,
    print "\t",
    r = requests.get(url)

    soup = BeautifulSoup(r.content)

    products = soup.findAll("div", id = "resultStats")

    result = str(products[0])
    results = re.findall(r'\d+', result)

    number = ''.join([str(i) for i in results])

    print number

For example, one of the links that is produced is this:

Google Search for "debt" in date range "3/9/2015 to 3/13/2015"

The hits produced should be: 39,700,000

But instead, it spits out: 293,000,000 (which is what just a generic search produces)

Community
  • 1
  • 1
alphabeta
  • 21
  • 3

1 Answers1

0

Google's date range limited search relies on Julian dates-- i.e. the range must be specified in Julian nomenclature. Perhaps you realized this already.

cute kitties daterange:[some Julian date]-[another Julian date] (without brackets).

There are web pages to convert to Julian, or use the jDate Python script or jday shell script.