Crawling the Google Play store

First of all, Google Play’s robots.txt does NOT disallow the pages with base “/store/apps”.

If you want to crawl Google Play you would need to develop your own web crawler, parse the HTML page and extract the app meta-data you need (e.g. title, descriptions, price, etc). This topic has been covered in this other question. There are libraries helping with that, for instance:

The harder part is to “find” the app-pages to crawl. You could use 1) the Google Play Sitemap or 2) follow the app-links you find in every page you crawl as explained in the Link Extractor documentation (in case you plan to use Scrapy).

Another option is to use an open-source library based on ProtoBuf to fetch meta-data about an app, here the link to the project: https://code.google.com/archive/p/android-market-api.
This library fetches app meta-data from Google Play on behalf of a valid Google account, but also in this case you need a crawler to “find” which apps are available and schedule their meta-data retrieval. This other open-source project can help you with that: https://code.google.com/archive/p/android-marketplace-crawler.

If you don’t want to implement all this by yourself, you could use a third-party managed service to access Android apps meta-data through a JSON-based API. For instance, 42matters.com (the company I work for) offers an API for both Android and iOS to retrieve apps’ meta-data, here more details:

https://42matters.com/app-market-data

In order to get the Title, Icon, Description, Downloads for an app you can use the “lookup” endpoint as documented here:

https://42matters.com/docs/app-market-data/android/apps/lookup

This is an example of the JSON response for the “Angry Birds Space Premium” app:

{
    "package_name": "com.rovio.angrybirdsspace.premium",
    "title": "Angry Birds Space Premium",
    "description": "Play over 300 interstellar levels across 10 planets...",
    "short_desc": "The #1 mobile game of all time blasts off into space!",
    "rating": 4.3046236038208,
    "category": "Arcade",
    "cat_key": "GAME_ARCADE",
    "cat_keys": [
        "GAME_ARCADE",
        "GAME",
        "FAMILY_EDUCATION",
        "FAMILY"
    ],
    "price": "$1.15",
    "downloads": "1,000,000 - 5,000,000",
    "version": "2.2.1",
    "content_rating": "Everyone",
    "promo_video": "https://www.youtube.com/embed/g6AL9YqRHaI?ps=play&vq=large&rel=0&autohide=1&showinfo=0&autoplay=1",
    "market_update": "2015-07-03T00:00:00+00:00",
    "screenshots": [
        "https://lh3.googleusercontent.com/ZmuBQzIy1G74coPrQ1R7fCeKdJmjTdpJhNrIHBOaFyM0N2EYdUPwZaQjnQUtiUDGmac=h310",
        "https://lh3.googleusercontent.com/Xg2Aq70ZH0SnNhtSKH7xg9jCfisWgmmq3C7xQbx6YMhTVAIRqlRJeH8GYtjxapb_qR4=h310",
        "https://lh3.googleusercontent.com/T4o5-2_UP82sj4fSSegbjrGmslNHlfvtEYuZacXMSOC55-7eyiKySw05lNF1QQGO2FeU=h310",
        "https://lh3.googleusercontent.com/f2ennaLdivFu5cQQaVPKsRcWxB8FS5T4Bkoy3l0iPW9-GDDnTVRhvR5kz6l4m8FL1c8=h310",
        "https://lh3.googleusercontent.com/H-9M03_-O9Df1nHr2-rUdjtk2aeBY3bAxnqSX3m2zh_aV8-K1t0qU1DxLXnK0GrDAw=h310"
    ],
    "created": "2012-03-22T08:24:00+00:00",
    "developer": "Rovio Entertainment Ltd.",
    "number_ratings": 20812,
    "price_currency": "$",
    "icon": "https://lh3.ggpht.com/aQaIEGrmba1ENSEgUtArdm3yhJUug7BRWlu_WaspoJusZyHv1rjlWtYqe_qRjE_Kmh1E=w300",
    "icon_72": "https://lh3.ggpht.com/aQaIEGrmba1ENSEgUtArdm3yhJUug7BRWlu_WaspoJusZyHv1rjlWtYqe_qRjE_Kmh1E=w72",
    "market_url": "https://play.google.com/store/apps/details?id=com.rovio.angrybirdsspace.premium&referrer=utm_source%3D42matters.com%26utm_medium%3Dapi"
}

I hope this helps, otherwise feel free to get in touch with me. I know this topic quite well and can point you in the right direction.

Regards,

Andrea

Leave a Comment