Netflix Database Management

Overall, the final report should be 7-10 pages long, but the assumption is many pages will includes graphs, diagrams, and app screenshots, SQL snippets, and a reasonable amount of bullet points. Papers should be at least 1500 words long (Arial font, 12 point, 1.5 spacing).  The papers and presentations must expand on the weekly work, and reflect and include any feedback or suggestions for improvements from fellow students, TA’s, or the Professor.   In other words, the final paper is not just a “cut and paste” – it requires that the student produce professional, thorough, and effective work that clearly reflects understanding of course and lecture material. 

Note that the paper MUST use proper English grammar, be CLEAR and CONCISE.  The goal here is to prepare the students not just from a technical perspective (i.e. the ability to use database tools, SQL, etc) but develop skills in effective writing and communication.  You can be the most technical person in the room, but if you cannot describe your thoughts, goals, and challenges effectively then you will not be successful in the “real world.”  This course is designed to make you a WELL-ROUNDED Analytics and Informatics professional, with the expectation that you will these skills will make you an “agent of change” in whatever business environment you may find yourself in the future.  As such, your final papers will be graded not only on the technical accuracy of the solution, but equally on the ability for the professor to read and clearly understand your challenges.  Northeastern University has many resources to help you develop the latter.  

The students MUST also submit their database is SQLite .DB format for review.

Note that this is NOT a typical “more is better” essay.  Your future bosses, co-workers, and other business stakeholders do NOT want to read a dozen pages of ‘fluff’, nor do they need an exhaustive history or business analysi.  This paper is NOT about designing a new innovative solution, writing long-winded market analysis or the like.  Be clear, be precise, and get your points across.  Less is more!



Describe the use-case, scenario, business challenge, application solution, etc.  The goal here is to provide the reader with a concise yet effective overview of the topic you’ve selected.  Explain how the app or service is used, who uses it and why, it’s cost model (it is truly free, “freeium” (free but you give something up to use it, i.e. watching ads), or paid (one time vs subscription), etc.  Briefly describe your personal connection to this app or service, why you chose it, etc.  This should be no more than 2-3 well-written paragraphs.  

Business Analysis

Explain the end-users or “personna’s” of the people who use the app.  (See related course material) The goal here is to ensure that you, as the database designer, can show that you truly understand WHO uses the application, and WHY and HOW they use it.  There are typically two to four primary users of the app.  Each personna should get about one paragraph.  

Explain the Business Rules or Logic that the app uses or enforces.  Once you enter the real world environment, you (as the Database Developer) MUST understand the expected business functionality, and be able to reflect that understanding back.  The point here is that before you start building tables to store data, you MUST understand all of the ways that data will be created, stored, and retrieved.  Use well-written bullet-points – as many as you need.  Points will be deducted if there are major or obvious gaps in understanding, or if the business logic doesn’t accurately reflect the most-common ways the app will be used.


Business Rules

1. One account can have multiple users, but only one user can visit Netflix at the same time.

2. Each account can subscribe to one plan for permission of visiting.

3. Each user can watch videos, browse, add videos to the list.

4. One user can have multiple visits at different times.

5. Within a visit, a user can watch multiple videos.

6. Each video will only have one genre and one film rating.

7. Each video has multiple actors.

8. Each video can memorize the time on progress bar.



Table Design and Analysis

Describe a reasonable number of data entities (tables) for your topic.  This should be at least 5, but no more than 10 or so, tables that will comprise your simulated database.    Note: this is NOT about Data Types (integer, string, etc).  There was some confusion on this topic in previous classes.  Any papers submitted that focus exclusively Data Types will be flagged for plagiarism and will not be accepted.

Provide accurate ER diagrams for the above.  Those diagrams MUST follow standard Naming conventions (i.e. table names, Primary and Foreign Keys defined as such), must show accurate one-to-many and one-to-one relationships with the lines connecting to the actual related PK or FK field.  This is one of the most critical skills covered in the course.   Points deducted for sloppy, inaccurate, or unclear diagrams.  

Database Implementation

Do not provide SQL Create statements. 

Provide SQL commands to retrieve and update your tables.  These should be representative of major Business Rules.  For example, if your business logic is “A LinkedIn user likes a post.” the corresponding SQL showing how that data activity is conducted should be provided as part of the report.  This is the a critical section. The goal here is that these SQL commands would be given to application developers and analytics teams to store and retrieve data into your tables in a consistent manner.  You are ‘kick-starting’ their application development efforts with the complex, multi-table INSERTs and SELECTs.  Be thorough!  You SHOULD include screenshots from the application you are modeling, to show the real-world example.  You should also “tell a story” with with your examples that follows a most common use-case for the app.  For many, this will be “User registers for the site”.  Start there, then continue along the 3 or 4 steps of a user story.


Netflix SQL query


— 1. Find all movies that Chris Hemsworth acts in.

SELECT v.Title


JOIN VIDEO v ON v.Video_ID = c.Video_ID

JOIN ACTOR a ON a.Actor_ID = c.Actor_ID

WHERE a.Actor_Name = ‘Chris Hemsworth’;



–2. Calcualte Subscription Plan revenue by country

SELECT a.Location,

SUM(s.Subscription_Charge) AS SubscriptionRevenue


JOIN ACCOUNT a ON s.Subscription_ID = a.FK_Subscription_ID

GROUP BY a.Location



–3. Find comedy film(s) with their release years

SELECT v.Title,




JOIN GENRE g ON v.FK_Genre_ID = g.Genre_ID

WHERE g.Genre_Name = ‘Comedies’



–4.Find shows with IMDb_rating > 6 that are rated R

SELECT v.Title


JOIN FILMRATING f ON v.FK_FlimRating_ID = f.Film_Rating_ID

WHERE v.IMDb_Rating > 6.0

AND f.Film_Rating_Name = ‘R’;



–5. Find users has most visits in Feburary 2021, along with the videos they watched

SELECT u.User_Name,




JOIN VISIT vt ON u.User_ID = vt.FK_User_ID

JOIN VISIT_VIDEO vd ON vt.Visit_ID = vd.Visit_ID

JOIN VIDEO vo ON vd.Video_ID = vo.Video_ID

WHERE SUBSTR(vt.Visit_Start_Time,1,7) = ‘2021-02’;



Analytics, Reports, and Metrics

Provide examples of analytics which could be conducted on your database.  Describe reports that end-users or system administrators would want to see.  Provide SQL statements, sample data and/or graphs.  Be specific, and leverage the Goal Question Metric methodology discussed during Week 5.

Security Concerns

Describe any security or privacy concerns around your data.  Are you storing or processing sensitive information?   Do NOT discuss areas that were not covered in the course. The goal here is to highlight what you, the data expert, is aware of, so that the other stakeholders in your organization (i.e. the security team, Legal, Privacy, etc) are informed of data that may be sensitive.  Do not include “Firewalls and Anti-virus” type information – leave that to the folks who are experts on it.  


Briefly describe your suggested solution architecture.  For example, will this be hosted internally on a small client/server solution for a small group of business users, or cloud-based and scaled to handle millions of transactions per day?  Use publicly available resources to provide “just enough” detail and information, but remember to always cite references in APA format.  


Project: Netflix


My topic for the project is Netflix, which is a world-leading online streaming service. According to the research, Netflix has exceeded over 200 million global subscribers in 2020, and over 165 million hours of watching time daily. Since the platform requires a high availability and scalability to serve the customers around the world, and we are storing customer visit-level data in our database, it should be able to handle and process multi-terabytes data.


In terms of the architecture, an n-Tier cloud database will serve the massive storage need and be elastic enough to support the increase demands in Web Servers as necessary. For example, Netflix would benefit from cloud-based platform during the Covid era, as people are spending more time at home, watching videos. With the rapid growth in traffic/subscribers, cloud solution will adapt to the data surge easily and continue to provide reliable database performance. On the other hand, Netflix could also take advantage of other could services like Machine Learning service that comes along with could database to help the site better monitor customer behavior, spot issue promptly, and improve resiliency for better customer streaming experience.


n-Tier cloud database architecture:

Presentation tier (clients send data request from this layer)

Application tier (business logic)

Database tier (internal layer that store the database)



Access to cloud database is provided as-a-service (DBaaS), and the users are charged according to their usage.



Sensitive data should be protected: user account information (email, phone, saved payment, etc.)


Solsman, J. (2021, January 20). Netflix subscriber GROWTH Soars past 200 million members. Retrieved March 04, 2021, from



Project Wrap-up and Future Considerations