This might not be completely error free but by following these suggestion one can increase the probability of a movie to become a hit. LimitationsĪ part of the analysis was done considering the movies which had a significant amount of profit of 60 million dollar and above. These are some of many characteristics which can make a movie profitable, one key point to be noted is that all the numerical results for the most part are the outcome of the analysis performed on the profit(.=60 million) based data frame. Genre must be : Comedy, Drama, Action, Thriller.Average duration of the movie must be 114 minutes.Average Budget must be around 63 millon dollar.Most commonly successfull movies have below characteristics. We derived some intresting conclusions after the analysis. Conclusionsīeing a movie buff myself it was great working on this data set. Hence the movies with the profit greater than 60 million has the avg revenue of 275 million. I created a copy of the original data frame so that we can manipulate the data without making changes to the original file. Changing format of budget and revenue column.Replacing zero with NAN in runtime column.Changing release date column into date format.Some movies in the database have zero budget or zero revenue, that means the value has not been recorded so we will be discarding such entries.We need to remove unused column such as id, imdb_id, vote_count, production_company, keywords, homepage etc.vote_count is different for all the movies, so we cannot directly conclude the popularity of the movies based on the average vote countīelow are the tasks performed to clean the dataset:.For this analysis we will consider it as dollar as it is the most used international currency. No unit of currency is mentioned in the dataset.what are the movies with highest and lowest budgets.what are the movies with highest runtime.what are the movies with highest and lowest profits.what is the most successful genre with respect to movies.Questionsīased on the the given attributes the following questions can be asked and answered: The data also contains null values and missing values. Now lets look at the data to decide what questions can be asked and answered on this data set.įirstly we import all the libraries we will use for this analysisįrom above we can determine that the dataset contains 10,866 rows and 21 columns. Wrangled and explored the data using Pandas and Numpy to gather insights about the relationship between different aspects, created visualizations using Matplotlib and made inferences to answer research questions.Performed necessary cleaning steps to unify formats, deal with missing data and prepare the dataset for analysis. Assessed the data and brainstormed questions that could be answered using the data.We use Numpy and Pandas for Analysis, Matplotlib and Seaborn for Visualization. The primary goal of this is to perform detailed analysis and visualization to derive answers for the questions brainstormed. This dataset is a collection of data on around 10000 movies. As of writing this, the site has indexed 393,000+ movies and 73,000+ TV shows across 39 languages. The metadata on movies and TV shows is contributed by the 1.1 million strong community. The Movie Database is an online crowdsourced database for movie and television information.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |