Google Play Store Statistical Analysis in Python

Google Play Store

A project that uses Python to explore various statistics of a dataset regarding Google Play Store app reviews.

The code below displays highlights from the project. For more details, please view the GitHub Repository.

Link to GitHub Repository:

Click Here

Libraries and Data

import thinkstats2
import csv
import pandas
import thinkplot
df = pandas.read_csv('googleplaystore_2.csv')

Histogram of the Ratings Variable

hist_rating = thinkstats2.Hist(df.Rating)
thinkplot.Hist(hist_rating)
thinkplot.Show (xlabel = "Rating", ylabel = "Count")

image-center

Calculate the mean, mode, median, variance, and standard deviation of the Ratings Variable

mean_rating = df.Rating.mean()
mode_rating = df.Rating.mode()
median_rating = df.Rating.median()
var_rating = df.Rating.var()
std_rating = df.Rating.std()

Plot the PMF of the Ratings of Free Apps

pmf_free = thinkstats2.Pmf(free_apps.Rating, label='Rating - Free')
thinkplot.Hist(pmf_free)
thinkplot.Config(xlabel='Rating', ylabel='Pmf')

image-center

Two histograms comparing the PMF of the Ratings of Paid vs. Free apps

# Create PMF variable of the Ratings of Paid Apps
pmf_paid = thinkstats2.Pmf(paid_apps.Rating, label='Rating - Paid')
thinkplot.PrePlot(2, cols=2)
thinkplot.Hist(pmf_free, align='right')
thinkplot.Hist(pmf_paid, align='left')
thinkplot.Config(xlabel='Rating', ylabel='PMF')

thinkplot.PrePlot(2)
thinkplot.SubPlot(2)
thinkplot.Pmfs([pmf_free, pmf_paid])
thinkplot.Config(xlabel='Rating')

image-center

CDF of Ratings

# Create CDF variable for Ratings
cdf_rating = thinkstats2.Cdf(df.Rating, label = 'Rating')
# Plot the CDF for Ratings variable
thinkplot.Cdf(cdf_rating)
thinkplot.Show(xlabel = 'Rating', ylabel = 'CDF')

image-center