Indice

As you might already know, the API ufficiale di Google Places is limited to 5 reviews per place only. Therefore developers are looking into scraping to have the ability to fetch all the reviews from any business on Google Maps.

Lo scraping di Google con tutte le sue protezioni e il rendering dinamico delle pagine potrebbe essere un compito impegnativo. Fortunatamente, ci sono molti strumenti che si possono usare per scraperare le recensioni in python o in qualsiasi altro linguaggio di programmazione. In questo post, vedrete i due strumenti più comuni per lo scraping delle recensioni di Google: l'emulazione del browser e Outscraper Platform. Ognuno di essi è sufficiente per ottenere tutte le recensioni di qualsiasi annuncio da Maps.

Scraping delle recensioni di Google in Python utilizzando il browser per renderizzare il contenuto dinamico

Utilizzeremo Selenio to control the Chrome browser. The browser will render the dynamic pages of Google Reviews. To get started with building the reviews scraper with Selenium, we’ll need the following:

  1. Python 3+.
  2. Browser Chrome installato.
  3. Selenium 3.141.0+ (pacchetto python).
  4. Chrome Driver (per il tuo sistema operativo).
  5. Parsel o qualsiasi altra libreria per estrarre dati da HTML come Beautiful Soup.

Installare Selenium e altri pacchetti

Installate i pacchetti Selenium e Parsel eseguendo i seguenti comandi. Useremo Parsel più tardi quando analizzeremo il contenuto da HTML.

				
					pip install selenium
pip install parsel # to extract data from HTML using XPath or CSS selectors
				
			

Start the Browser

Prima di avviare il driver, assicurarsi di aver eseguito i passaggi precedenti e di avere il percorso del file chromedriver. Inizializzare il driver con il seguente codice. Si dovrebbe vedere l'apertura di una nuova finestra del browser.

				
					from selenium import webdriver


chromedrive_path = './chromedriver' # use the path to the driver you downloaded from previous steps
driver = webdriver.Chrome(chromedrive_path)
				
			

You might see the following on mac: “chromedriver cannot be opened because the developer cannot be verified.” To overcome this, control-click the chromedriver in Finder, choose Open from the menu and then click Open in the dialog that appears. You should see “ChromeDriver was started successfully” in the opened terminal windows. Close it, and after this, you will be able to start ChromeDriver from your code.

Scarica tutte le recensioni Pagina

Once you start the driver, you are ready to open some pages. To open any page, use the “get” command.

				
					url = 'https://www.google.com/maps/place/Central+Park+Zoo/@40.7712318,-73.9674707,15z/data=!3m1!5s0x89c259a1e735d943:0xb63f84c661f84258!4m16!1m8!3m7!1s0x89c258faf553cfad:0x8e9cfc7444d8f876!2sTrump+Tower!8m2!3d40.7624284!4d-73.973794!9m1!1b1!3m6!1s0x89c258f1fcd66869:0x65d72e84d91a3f14!8m2!3d40.767778!4d-73.9718335!9m1!1b1?hl=en&hl=en'
driver.get(url)
				
			

Recensioni di Parse

Una volta aperta la pagina, nella finestra di Chrome verrà visualizzata la pagina controllata dal codice. È possibile eseguire il codice seguente per ottenere il contenuto della pagina HTML dal driver.

				
					page_content = driver.page_source
				
			

Per vedere comodamente il contenuto HTML apri la console degli sviluppatori in Chrome aprendo il menu Chrome nell'angolo in alto a destra della finestra del browser e selezionando Altri strumenti > Strumenti di sviluppo. Ora dovresti essere in grado di vedere gli elementi della tua pagina.

raschiare le recensioni di google in python
Finding Xpath to the Reviews We Want to Get With Developer Console

È possibile analizzare il contenuto della pagina HTML utilizzando gli strumenti di analisi preferiti. Utilizzeremo Parsel in questo tutorial.

				
					from parsel import Selector

response = Selector(page_content)
				
			

Esaminare le recensioni.

				
					results = []

for el in response.xpath('//div/div[@data-review-id]/div[contains(@class, "content")]'):
    results.append({
        'title': el.xpath('.//div[contains(@class, "title")]/span/text()').extract_first(''),
        'rating': el.xpath('.//span[contains(@aria-label, "stars")]/@aria-label').extract_first('').replace('stars' ,'').strip(),
        'body': el.xpath('.//span[contains(@class, "text")]/text()').extract_first(''),
    })
    
print(results)
				
			

Output from the Google Reviews Crawler (shortened).

				
					[
  {
    'title': 'Wanda Garrett',
    'rating': '5',
    'body': 'Beautiful ✨ park with a family-friendly atmosphere! I had a great time here; seeing all of the animals and learning all of the interesting facts was a fantastic way to spend the day. The zoo is beautifully landscaped and surrounded by …'
  },
  {
    'title': 'Bernadette Bennett',
    'rating': '4',
    'body': 'Worth going for the seals! They are the main attraction and located in the center of the zoo. We watched a live feeding and it was great. The kids loved it. The zoo is well manicured surrounded by gorgeous gardens. Lots of benches to rest …'
  },
  {
    'title': 'Mary Cutrufelli',
    'rating': '3',
    'body': "So not gonna lie... We came from PA. My kid expected to see lions and hippos and zebra from Madagascar. None of that which is there. It's clean it's a nice zoo. I wouldn't go again though."
  },
  ...
]
				
			

Interrompere il browser

It’s important to start and stop the driver before and after the scraping accordingly. It’s the same as opening and closing your browser before and after surfing the internet. Stop the browser by running the following code.

				
					driver.quit()
				
			

Despite the tricky HTML structure of Google Reviews, with Selenium and good knowledge of XPath and CSS selectors, you can achieve quite good results in scraping. This method of using a browser emulator should protect you from getting blocked. However, if you scale your application, consider using proxies to avoid unexpected problems.

Multiprocessing e altre raccomandazioni

È possibile eseguire i driver in multiprocesso (non in multithreading), ma ogni driver consumerà una CPU. Assicuratevi di averne a sufficienza.

Il modo più semplice di fare scraping delle recensioni di Google in Python

L'estrazione di dati da Google con i browser ha i suoi pro e i suoi contro. Sebbene sia possibile sviluppare lo scraper da soli, in fase di scalata potrebbe comportare grandi spese per l'utilizzo di server con enormi quantità di CPU per gestire le emulazioni dei browser. Inoltre, è necessario che ci sia una persona che mantenga il crawler e lo aggiorni durante le modifiche al sito di Google.

Usando Piattaforma OutscraperAPI, o SDKs, Outscraper offre la soluzione più semplice per le aziende e i privati per iniziare lo scraping delle recensioni da Google senza dover gestire proxy, emulazione del browser e investire nello sviluppo.

Raschiare le recensioni in Python utilizzando l'SDK

1. È necessario disporre di python3+ e di questo pacchetto python. Installate il pacchetto eseguendo il comando.

				
					pip install google-services-api
				
			

2. Ottenere la chiave API dal sito Pagina del profilo.

3. Importare il pacchetto e inizializzarlo con la chiave.

4. Specificare la posizione fornendo un link, l'ID del luogo o il nome.

				
					from outscraper import ApiClient


api_cliet = ApiClient(api_key='KEY_FROM_OUTSCRAPER')
response = api_cliet.google_maps_reviews(
    'https://www.google.com/maps/place/Do+or+Dive+Bar/@40.6867831,-73.9570104,17z/data=!3m2!4b1!5s0x89c25b96a0b10eb9:0xfe4f81ff249e280d!4m5!3m4!1s0x89c25b96a0b30001:0x643d0464b3138078!8m2!3d40.6867791!4d-73.9548217',
    language='en',
    limit=100
)
				
			

5. Wait a few seconds till the reviews are fetched.

				
					[
    {
        "name": "Do or Dive Bar",
        "full_address": "1108 Bedford Ave, Brooklyn, NY 11216, United States",
        "borough": "Bedford-Stuyvesant",
        "street": "1108 Bedford Ave",
        "city": "Brooklyn",
        "postal_code": "11216",
        "country_code": "US",
        "country": "United States of America",
        "us_state": "New York",
        "state": "New York",
        "plus_code": null,
        "latitude": 40.686779099999995,
        "longitude": -73.9548217,
        "time_zone": "America/New_York",
        "site": "https://www.doordivebedstuy.com/",
        "phone": "+1 917-867-5309",
        "type": "Bar",
        "rating": 4.5,
        "reviews": 425,
        "reviews_data": [
            {
                "google_id": "0x89c25b96a0b30001:0x643d0464b3138078",
                "autor_link": "https://www.google.com/maps/contrib/115539085325450648866?hl=en-US",
                "autor_name": "Sam Grjaznovs",
                "autor_id": "115539085325450648866",
                "autor_image": "https://lh3.googleusercontent.com/a-/AOh14GgxmEH7a10v6Bo8AFb6OkbyxxfIBPXbMYVAxeSIRA=c0x00000000-cc-rp-ba3",
                "review_text": "Cozy shin dig with an assortment of drinks. They have a strong specialty for 10bucks and merch too. They have out side dining as well as back yard area. Ask for Brandon every other Saturday. He\u2019s hella cute!",
                "review_img_url": "https://lh5.googleusercontent.com/p/AF1QipPNs8QvvdkBonV5wuxdoylFjLY3k7L6muepbDq-",
                "owner_answer": null,
                "owner_answer_timestamp": null,
                "owner_answer_timestamp_datetime_utc": null,
                "review_link": "https://www.google.com/maps/reviews/data=!4m5!14m4!1m3!1m2!1s115539085325450648866!2s0x0:0x643d0464b3138078?hl=en-US",
                "review_rating": 5,
                "review_timestamp": 1603781021,
                "review_datetime_utc": "10/27/2020 06:43:41",
                "review_likes": 0,
                "reviews_id": "7222934207919784056"
            },
            {
                "google_id": "0x89c25b96a0b30001:0x643d0464b3138078",
                "autor_link": "https://www.google.com/maps/contrib/110571545135018844510?hl=en-US",
                "autor_name": "Arabella Stephens",
                "autor_id": "110571545135018844510",
                "autor_image": "https://lh3.googleusercontent.com/a-/AOh14GisqDfheDO0Aq0cu1Z7YBTbzLyvSEvM3IMDKg3q=c0x00000000-cc-rp",
                "review_text": "Great atmosphere, always fun vibe and good beers. I live in the area and this is a very reliable standby. Would recommend to anyone who is not pretentious and likes a bit of clutter in their watering hole.",
                "review_img_url": "https://lh3.googleusercontent.com/a-/AOh14GisqDfheDO0Aq0cu1Z7YBTbzLyvSEvM3IMDKg3q",
                "owner_answer": null,
                "owner_answer_timestamp": null,
                "owner_answer_timestamp_datetime_utc": null,
                "review_link": "https://www.google.com/maps/reviews/data=!4m5!14m4!1m3!1m2!1s110571545135018844510!2s0x0:0x643d0464b3138078?hl=en-US",
                "review_rating": 5,
                "review_timestamp": 1614111762,
                "review_datetime_utc": "02/23/2021 20:22:42",
                "review_likes": 0,
                "reviews_id": "7222934207919784056"
            },
            {
                "google_id": "0x89c25b96a0b30001:0x643d0464b3138078",
                "autor_link": "https://www.google.com/maps/contrib/101725757133547547783?hl=en-US",
                "autor_name": "Jack Parker",
                "autor_id": "101725757133547547783",
                "autor_image": "https://lh3.googleusercontent.com/a-/AOh14GjFK9CLb8__u5PtJzH1rGuX4DVgPvjaEeIkSJnCNw=c0x00000000-cc-rp",
                "review_text": "All the bartenders are rad. Cheap drinks, and a nice backyard. They have space heaters, but I would still recommend bundling up if you plan on spending a while there. Jeopardy night is always fun too. Can\u2019t wait to sit inside again!",
                "review_img_url": "https://lh3.googleusercontent.com/a-/AOh14GjFK9CLb8__u5PtJzH1rGuX4DVgPvjaEeIkSJnCNw",
                "owner_answer": null,
                "owner_answer_timestamp": null,
                "owner_answer_timestamp_datetime_utc": null,
                "review_link": "https://www.google.com/maps/reviews/data=!4m5!14m4!1m3!1m2!1s101725757133547547783!2s0x0:0x643d0464b3138078?hl=en-US",
                "review_rating": 5,
                "review_timestamp": 1611947492,
                "review_datetime_utc": "01/29/2021 19:11:32",
                "review_likes": 0,
                "reviews_id": "7222934207919784056"
            },
            ...
        ]
]
				
			

Python package ► https://pypi.org/project/google-services-api
API ► https://app.outscraper.com/api-docs

Video tutorial

Domande frequenti

Domande e risposte più frequenti

Thanks to Outscraper’s Google Maps Reviews API, it is possible to scrape all Google Maps reviews. Outscraper API services allow you to scrape without any limits.

There is an API service for Google Maps reviews. This is Outscraper’s Google Maps Reviews API. Thanks to Outscraper services, you can export and download all Google Maps reviews.

Reviews can be scraped with Python and Selenium. It is explained in detail in the article “Scraping di tutte le recensioni di Google in Python“.


Vlad

Responsabile del progetto Linkedin