Tools Used: SQLite, Excel, Power BI
Download Power BI Dashboard (.pbix)
-
Business Problem: Amazon Prime Video wants to gather useful insights on their shows and movies for their subscribers. The problem is they are working with a large amount of data across two tables, nearly 10,000 titles and 124,000 cast and crew records, and need a way to effectively analyze and extract meaningful insights from it. They need a data analytics solution to uncover valuable patterns and trends around content quality, audience ratings, country of origin, and talent.
-
How I Plan On Solving the Problem: Using SQL in DB Browser for SQLite, I will query both the titles and credits datasets to answer key business questions. By leveraging SQL functions like AVG, COUNT, GROUP BY, JOIN, and HAVING, I can uncover metrics such as IMDB ratings, content volume by country and decade, top performing actors and directors, and age certification trends. Once the data has been extracted, I will use Power BI to present the findings through an interactive dashboard.
1. Which movies and shows on Amazon Prime ranked in the top 10 and bottom 10 based on their IMDB scores?
- Top 10 Movies
SELECT title, type, imdb_score
FROM titles
WHERE type = 'MOVIE'
AND imdb_score IS NOT NULL
ORDER BY imdb_score DESC
LIMIT 10;Result:
- Bottom 10 Movies
SELECT title, type, imdb_score
FROM titles
WHERE type = 'MOVIE'
AND imdb_score IS NOT NULL
ORDER BY imdb_score ASC
LIMIT 10;Result:
- Top 10 Shows
SELECT title, type, imdb_score
FROM titles
WHERE type = 'SHOW'
AND imdb_score IS NOT NULL
ORDER BY imdb_score DESC
LIMIT 10;Result:
- Bottom 10 Shows
SELECT title, type, imdb_score
FROM titles
WHERE type = 'SHOW'
AND imdb_score IS NOT NULL
ORDER BY imdb_score ASC
LIMIT 10;Result:
The top 10 movies and shows had some impressive IMDB scores, with Pawankhind leading movies at 9.9 and Water Helps the Blood Run leading shows at 9.7. One thing that stood out is how much Indian cinema dominates the top movie rankings, which shows Amazon Prime has made a strong push into international content. On the flip side, the bottom 10 titles scored as low as 1.1 for movies and 1.3 for shows, highlighting just how wide the quality range is across the platform.
SELECT type, ROUND(AVG(imdb_score), 2)
FROM titles
GROUP BY type;Result:
Shows came in with a noticeably higher average IMDB score at 7.12 compared to movies at 5.8. That is a pretty significant gap and suggests that Amazon Prime's TV catalog tends to be better received by audiences. It could mean that their original and licensed shows are higher quality productions, or simply that the movie library has a lot more low-budget titles pulling the average down.
SELECT production_countries, COUNT(*)
FROM titles
GROUP BY production_countries
ORDER BY COUNT(*) DESC
LIMIT 10;Result:
The United States leads by a wide margin with 4,810 titles, followed by India with 1,048 and the UK with 667. This makes sense given Amazon's roots, but the strong showing from India is notable and lines up with what we saw in the top IMDB scores. Like genres, production countries are also stored as lists in this dataset, which is another noted data limitation.
SELECT name, COUNT(*) as appearances
FROM credits
WHERE role = 'ACTOR'
GROUP BY name
ORDER BY appearances DESC
LIMIT 10;Result:
George 'Gabby' Hayes leads with 49 appearances, followed by Roy Rogers at 45 and Bess Flowers at 44. The fact that the top actors are all classic Hollywood era names tells you something interesting about Amazon Prime's catalog. They clearly have a deep library of older films, particularly westerns from the 1930s and 1940s.
SELECT name, ROUND(AVG(t.imdb_score), 2) as avg_score, COUNT(*) as titles
FROM credits c
JOIN titles t ON c.id = t.id
WHERE c.role = 'DIRECTOR'
AND t.imdb_score IS NOT NULL
GROUP BY name
HAVING COUNT(*) >= 2
ORDER BY avg_score DESC
LIMIT 10;Result:
Jeethu Joseph tops the list with an average IMDB score of 8.5 across 2 titles. Seeing James Cameron in the top 10 helps validate that the metric is working correctly. The minimum of 2 titles requirement was important here to make sure the results reflect consistent quality rather than just one great film inflating a director's average.
SELECT CONCAT(FLOOR(release_year / 10) * 10, 's') AS decade,
COUNT(*) AS count
FROM titles
GROUP BY decade
ORDER BY decade;Result:
Amazon Prime's catalog goes all the way back to the 1910s which was surprising. Content volume stays relatively modest through most of the 20th century before taking off in the 2000s. The 2010s saw a massive spike with 4,315 titles, which lines up with the global streaming boom and Amazon aggressively building out their library during that period.
SELECT age_certification, ROUND(AVG(imdb_score), 2) as avg_score, COUNT(*) as titles
FROM titles
WHERE age_certification IS NOT NULL
AND imdb_score IS NOT NULL
GROUP BY age_certification
ORDER BY avg_score DESC;Result:
TV-PG content scores the highest on average at 7.43, with TV-MA close behind at 7.37. R-rated movies actually score the lowest among all certified content at 5.67. This pattern suggests that TV content across the board tends to be better received than movies, which ties back to what we found in question 2.
This analysis of Amazon Prime Video's content library turned up some genuinely interesting findings. The platform has an extremely wide quality range, from near-perfect 9.9 rated films to titles scoring below 2.0. Shows consistently outperform movies in average IMDB score, pointing to stronger quality in Amazon's TV catalog. The US dominates content volume but India is a strong second, and that shows up in the top IMDB scores as well. The most frequently appearing actors being classic Hollywood names reveals just how deep Amazon's older film catalog runs. Content production exploded in the 2010s, and TV-PG and TV-MA certifications tend to produce the best rated content on the platform overall.











