Web scrapping

Data scraping is a technique used to collect information from the internet. Common data types which are collected through this method include images, videos, text, product informations and reviews or customer sentiment. Collected data is used mainly for data analysis and (further) marketing purposes.

This project is divided into four parts:

- web scraping,

- updating,

- data Analysis,

- visualisation,


PART I - web scraping

First step of this process is importing libraries and assigning headers.

I started web scraping from Inspecting web page and finding code for wanted items.

Next, using BeautifulSoup, I searched for all product codes ( 'li', class_ =product-grid_item')

After that, I put whole code under a loop and pull links to product from pages 1 to 9. All links were collected into one list called 'productlinks'.

Lastly, when I had a list of links for each Irish Whisky, I started to pull needed data.

I opened link to product and pulled name of the item, price, ratings, volume and alcohol percentage.

Name - striped from additional spaces and signs after the name.

Price -'£' sign was removed for easier usage at later stage.

Ratings - not every item had ratings therefore had to put under condition. If product has a ratings, it will display it, otherwise it will show 'NaN'.

Volume and alcohol percentage was pulled at once and devided to extract needed data.

Date was added and all information was put in a dictionary.

Dictionary was transformed into a Data Frame and save into an Excel file.


PART II - updating

In this part I was pulling data which I was planning to analyse, in this case, price change over time.This will be an update on price value and data will be added to main Excel file.

After some period of time, I have run loop for product list again and created product link list. Looped though list one more time. This time just pulled product name and price along with the current date.

Informations were saved to a new Excel file ('update') and appended to the original.

This allows me to analyse price change over time.


PART III - data analysis

During 3 months period, price of irish whisky did not change very much. Out of 196 products, only 14 recorded price shift. Price of 10 items got increased and 4 reduced. Three out of ten products which increased prices was Teeling brand. Highest of all changes was +13.71% for Midleton Barry Crockett Legacy. The biggest price drop was -10.55% for Proper no. Twelve Blended Irish Whiskey.

Most changes happened on August which may suggest that online shop end some mid-year promotion and prices went back to the original value.

EXCLAIMER: Presented values were manipulated to demonstrate more interesting analytical results.


PART IV - visualisation

Final part of the project consist of visualisation the results.

Full interactive dashboard is available