import pandas as pd
url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-23/raw_anime.csv"
df = pd.read_csv(url)
df.head()
| animeID | name | title_english | title_japanese | title_synonyms | type | source | producers | genre | studio | ... | scored_by | rank | popularity | members | favorites | synopsis | background | premiered | broadcast | related | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Cowboy Bebop | Cowboy Bebop | カウボーイビバップ | [] | TV | Original | ['Bandai Visual'] | ['Action', 'Adventure', 'Comedy', 'Drama', 'Sc... | ['Sunrise'] | ... | 405664.0 | 26.0 | 39.0 | 795733.0 | 43460.0 | In the year 2071, humanity has colonized sever... | When Cowboy Bebop first aired in spring of 199... | Spring 1998 | Saturdays at 01:00 (JST) | {'Adaptation': [{'mal_id': 173, 'type': 'manga... |
| 1 | 5 | Cowboy Bebop: Tengoku no Tobira | Cowboy Bebop: The Movie | カウボーイビバップ 天国の扉 | ["Cowboy Bebop: Knockin' on Heaven's Door"] | Movie | Original | ['Sunrise', 'Bandai Visual'] | ['Action', 'Drama', 'Mystery', 'Sci-Fi', 'Space'] | ['Bones'] | ... | 120243.0 | 164.0 | 449.0 | 197791.0 | 776.0 | Another day, another bounty—such is the life o... | NaN | NaN | NaN | {'Parent story': [{'mal_id': 1, 'type': 'anime... |
| 2 | 6 | Trigun | Trigun | トライガン | [] | TV | Manga | ['Victor Entertainment'] | ['Action', 'Sci-Fi', 'Adventure', 'Comedy', 'D... | ['Madhouse'] | ... | 212537.0 | 255.0 | 146.0 | 408548.0 | 10432.0 | Vash the Stampede is the man with a $$60,000,0... | The Japanese release by Victor Entertainment h... | Spring 1998 | Thursdays at 01:15 (JST) | {'Adaptation': [{'mal_id': 703, 'type': 'manga... |
| 3 | 7 | Witch Hunter Robin | Witch Hunter Robin | Witch Hunter ROBIN | ['WHR'] | TV | Original | ['Bandai Visual'] | ['Action', 'Magic', 'Police', 'Supernatural', ... | ['Sunrise'] | ... | 32837.0 | 2371.0 | 1171.0 | 79397.0 | 537.0 | Witches are individuals with special powers li... | NaN | Summer 2002 | Tuesdays at Unknown | {} |
| 4 | 8 | Bouken Ou Beet | Beet the Vandel Buster | 冒険王ビィト | ['Adventure King Beet'] | TV | Manga | ['TV Tokyo', 'Dentsu'] | ['Adventure', 'Fantasy', 'Shounen', 'Supernatu... | ['Toei Animation'] | ... | 4894.0 | 3544.0 | 3704.0 | 11708.0 | 14.0 | It is the dark century and the people are suff... | NaN | Fall 2004 | Thursdays at 18:30 (JST) | {'Adaptation': [{'mal_id': 1348, 'type': 'mang... |
5 rows × 27 columns
df.columns
Index(['animeID', 'name', 'title_english', 'title_japanese', 'title_synonyms',
'type', 'source', 'producers', 'genre', 'studio', 'episodes', 'status',
'airing', 'aired', 'duration', 'rating', 'score', 'scored_by', 'rank',
'popularity', 'members', 'favorites', 'synopsis', 'background',
'premiered', 'broadcast', 'related'],
dtype='object')
df.dtypes
animeID int64 name object title_english object title_japanese object title_synonyms object type object source object producers object genre object studio object episodes float64 status object airing object aired object duration object rating object score float64 scored_by float64 rank float64 popularity float64 members float64 favorites float64 synopsis object background object premiered object broadcast object related object dtype: object
df.shape[0]
15278
df.isnull().sum()
animeID 0 name 0 title_english 9156 title_japanese 48 title_synonyms 5 type 5 source 5 producers 5 genre 5 studio 5 episodes 546 status 5 airing 5 aired 5 duration 5 rating 5 score 500 scored_by 5 rank 1609 popularity 5 members 5 favorites 5 synopsis 713 background 14160 premiered 11099 broadcast 10876 related 5 dtype: int64
title_synonyms, type, source, producers, genre, studio, status, airing, aired, duration, rating, scored_by, popularity, members, favorites, related have 5 rows with Null values let drop rows from these column¶# drop rows with missing values only in specific columns
df.dropna(subset=['title_synonyms', 'type', 'source', 'producers', 'genre', 'studio', 'status', 'airing', 'aired', 'duration', 'rating', 'scored_by', 'popularity', 'members', 'favorites', 'related'], inplace=True)
As we know we have 15278 rows and almost 14160 rows in background has Null value let's drop this column
df.drop('background', axis=1, inplace=True)
df.columns
Index(['animeID', 'name', 'title_english', 'title_japanese', 'title_synonyms',
'type', 'source', 'producers', 'genre', 'studio', 'episodes', 'status',
'airing', 'aired', 'duration', 'rating', 'score', 'scored_by', 'rank',
'popularity', 'members', 'favorites', 'synopsis', 'premiered',
'broadcast', 'related'],
dtype='object')
df.isnull().sum()
animeID 0 name 0 title_english 9151 title_japanese 43 title_synonyms 0 type 0 source 0 producers 0 genre 0 studio 0 episodes 541 status 0 airing 0 aired 0 duration 0 rating 0 score 495 scored_by 0 rank 1604 popularity 0 members 0 favorites 0 synopsis 708 premiered 11094 broadcast 10871 related 0 dtype: int64
We can see Main columns have lots of missing values as such:
title_english has almost 9151 missing value but we need this names for visualisation so let's try to translate name of title_japanese to english for these values!pip install googletrans==4.0.0-rc1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Requirement already satisfied: googletrans==4.0.0-rc1 in /usr/local/lib/python3.9/dist-packages (4.0.0rc1) Requirement already satisfied: httpx==0.13.3 in /usr/local/lib/python3.9/dist-packages (from googletrans==4.0.0-rc1) (0.13.3) Requirement already satisfied: chardet==3.* in /usr/local/lib/python3.9/dist-packages (from httpx==0.13.3->googletrans==4.0.0-rc1) (3.0.4) Requirement already satisfied: certifi in /usr/local/lib/python3.9/dist-packages (from httpx==0.13.3->googletrans==4.0.0-rc1) (2022.12.7) Requirement already satisfied: idna==2.* in /usr/local/lib/python3.9/dist-packages (from httpx==0.13.3->googletrans==4.0.0-rc1) (2.10) Requirement already satisfied: sniffio in /usr/local/lib/python3.9/dist-packages (from httpx==0.13.3->googletrans==4.0.0-rc1) (1.3.0) Requirement already satisfied: rfc3986<2,>=1.3 in /usr/local/lib/python3.9/dist-packages (from httpx==0.13.3->googletrans==4.0.0-rc1) (1.5.0) Requirement already satisfied: httpcore==0.9.* in /usr/local/lib/python3.9/dist-packages (from httpx==0.13.3->googletrans==4.0.0-rc1) (0.9.1) Requirement already satisfied: hstspreload in /usr/local/lib/python3.9/dist-packages (from httpx==0.13.3->googletrans==4.0.0-rc1) (2023.1.1) Requirement already satisfied: h11<0.10,>=0.8 in /usr/local/lib/python3.9/dist-packages (from httpcore==0.9.*->httpx==0.13.3->googletrans==4.0.0-rc1) (0.9.0) Requirement already satisfied: h2==3.* in /usr/local/lib/python3.9/dist-packages (from httpcore==0.9.*->httpx==0.13.3->googletrans==4.0.0-rc1) (3.2.0) Requirement already satisfied: hyperframe<6,>=5.2.0 in /usr/local/lib/python3.9/dist-packages (from h2==3.*->httpcore==0.9.*->httpx==0.13.3->googletrans==4.0.0-rc1) (5.2.0) Requirement already satisfied: hpack<4,>=3.0 in /usr/local/lib/python3.9/dist-packages (from h2==3.*->httpcore==0.9.*->httpx==0.13.3->googletrans==4.0.0-rc1) (3.0.0)
from googletrans import Translator
translator = Translator()
for i, row in df.iterrows():
if pd.isna(row['title_english']):
try:
# translate 'title_japanese' to english
translated = translator.translate(row['title_japanese'], dest='en').text
# fill missing values in 'title_english' column
df.at[i, 'title_english'] = translated
except:
# handle any errors
print(f"Translation failed for row {i}")
df.isnull().sum()
animeID 0 name 0 title_english 4214 title_japanese 43 title_synonyms 0 type 0 source 0 producers 0 genre 0 studio 0 episodes 541 status 0 airing 0 aired 0 duration 0 rating 0 score 495 scored_by 0 rank 1604 popularity 0 members 0 favorites 0 synopsis 708 premiered 11094 broadcast 10871 related 0 dtype: int64
df.to_csv('preprocessed.csv')
path = "/content/preprocessed.csv"
translated_anime = pd.read_csv(path)
translated_anime.head(3)
| Unnamed: 0 | animeID | name | title_english | title_japanese | title_synonyms | type | source | producers | genre | ... | score | scored_by | rank | popularity | members | favorites | synopsis | premiered | broadcast | related | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | Cowboy Bebop | Cowboy Bebop | カウボーイビバップ | [] | TV | Original | ['Bandai Visual'] | ['Action', 'Adventure', 'Comedy', 'Drama', 'Sc... | ... | 8.81 | 405664.0 | 26.0 | 39.0 | 795733.0 | 43460.0 | In the year 2071, humanity has colonized sever... | Spring 1998 | Saturdays at 01:00 (JST) | {'Adaptation': [{'mal_id': 173, 'type': 'manga... |
| 1 | 1 | 5 | Cowboy Bebop: Tengoku no Tobira | Cowboy Bebop: The Movie | カウボーイビバップ 天国の扉 | ["Cowboy Bebop: Knockin' on Heaven's Door"] | Movie | Original | ['Sunrise', 'Bandai Visual'] | ['Action', 'Drama', 'Mystery', 'Sci-Fi', 'Space'] | ... | 8.41 | 120243.0 | 164.0 | 449.0 | 197791.0 | 776.0 | Another day, another bounty—such is the life o... | NaN | NaN | {'Parent story': [{'mal_id': 1, 'type': 'anime... |
| 2 | 2 | 6 | Trigun | Trigun | トライガン | [] | TV | Manga | ['Victor Entertainment'] | ['Action', 'Sci-Fi', 'Adventure', 'Comedy', 'D... | ... | 8.30 | 212537.0 | 255.0 | 146.0 | 408548.0 | 10432.0 | Vash the Stampede is the man with a $$60,000,0... | Spring 1998 | Thursdays at 01:15 (JST) | {'Adaptation': [{'mal_id': 703, 'type': 'manga... |
3 rows × 27 columns
translated_anime.isnull().sum()
Unnamed: 0 0 animeID 0 name 0 title_english 4214 title_japanese 43 title_synonyms 0 type 0 source 0 producers 0 genre 0 studio 0 episodes 541 status 0 airing 0 aired 0 duration 0 rating 0 score 495 scored_by 0 rank 1604 popularity 0 members 0 favorites 0 synopsis 708 premiered 11094 broadcast 10871 related 0 dtype: int64
translated_anime.shape[0]
15273
# drop rows with missing values only in specific columns
translated_anime.dropna(subset=['title_english', 'title_japanese', 'episodes','score','rank','synopsis','premiered','broadcast'], inplace=True)
translated_anime.columns
Index(['Unnamed: 0', 'animeID', 'name', 'title_english', 'title_japanese',
'title_synonyms', 'type', 'source', 'producers', 'genre', 'studio',
'episodes', 'status', 'airing', 'aired', 'duration', 'rating', 'score',
'scored_by', 'rank', 'popularity', 'members', 'favorites', 'synopsis',
'premiered', 'broadcast', 'related'],
dtype='object')
translated_anime.head(3)
| Unnamed: 0 | animeID | name | title_english | title_japanese | title_synonyms | type | source | producers | genre | ... | score | scored_by | rank | popularity | members | favorites | synopsis | premiered | broadcast | related | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | Cowboy Bebop | Cowboy Bebop | カウボーイビバップ | [] | TV | Original | ['Bandai Visual'] | ['Action', 'Adventure', 'Comedy', 'Drama', 'Sc... | ... | 8.81 | 405664.0 | 26.0 | 39.0 | 795733.0 | 43460.0 | In the year 2071, humanity has colonized sever... | Spring 1998 | Saturdays at 01:00 (JST) | {'Adaptation': [{'mal_id': 173, 'type': 'manga... |
| 2 | 2 | 6 | Trigun | Trigun | トライガン | [] | TV | Manga | ['Victor Entertainment'] | ['Action', 'Sci-Fi', 'Adventure', 'Comedy', 'D... | ... | 8.30 | 212537.0 | 255.0 | 146.0 | 408548.0 | 10432.0 | Vash the Stampede is the man with a $$60,000,0... | Spring 1998 | Thursdays at 01:15 (JST) | {'Adaptation': [{'mal_id': 703, 'type': 'manga... |
| 3 | 3 | 7 | Witch Hunter Robin | Witch Hunter Robin | Witch Hunter ROBIN | ['WHR'] | TV | Original | ['Bandai Visual'] | ['Action', 'Magic', 'Police', 'Supernatural', ... | ... | 7.33 | 32837.0 | 2371.0 | 1171.0 | 79397.0 | 537.0 | Witches are individuals with special powers li... | Summer 2002 | Tuesdays at Unknown | {} |
3 rows × 27 columns
Unnamed: 0animeIDtitle_synonymsmemberssynopsis relatedtranslated_anime.drop(['Unnamed: 0','animeID','title_synonyms','members','synopsis','related'],axis = 1, inplace = True)
translated_anime.head(3)
| name | title_english | title_japanese | type | source | producers | genre | studio | episodes | status | ... | aired | duration | rating | score | scored_by | rank | popularity | favorites | premiered | broadcast | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Cowboy Bebop | Cowboy Bebop | カウボーイビバップ | TV | Original | ['Bandai Visual'] | ['Action', 'Adventure', 'Comedy', 'Drama', 'Sc... | ['Sunrise'] | 26.0 | Finished Airing | ... | {'from': '1998-04-03T00:00:00+00:00', 'to': '1... | 24 min per ep | R - 17+ (violence & profanity) | 8.81 | 405664.0 | 26.0 | 39.0 | 43460.0 | Spring 1998 | Saturdays at 01:00 (JST) |
| 2 | Trigun | Trigun | トライガン | TV | Manga | ['Victor Entertainment'] | ['Action', 'Sci-Fi', 'Adventure', 'Comedy', 'D... | ['Madhouse'] | 26.0 | Finished Airing | ... | {'from': '1998-04-01T00:00:00+00:00', 'to': '1... | 24 min per ep | PG-13 - Teens 13 or older | 8.30 | 212537.0 | 255.0 | 146.0 | 10432.0 | Spring 1998 | Thursdays at 01:15 (JST) |
| 3 | Witch Hunter Robin | Witch Hunter Robin | Witch Hunter ROBIN | TV | Original | ['Bandai Visual'] | ['Action', 'Magic', 'Police', 'Supernatural', ... | ['Sunrise'] | 26.0 | Finished Airing | ... | {'from': '2002-07-02T00:00:00+00:00', 'to': '2... | 25 min per ep | PG-13 - Teens 13 or older | 7.33 | 32837.0 | 2371.0 | 1171.0 | 537.0 | Summer 2002 | Tuesdays at Unknown |
3 rows × 21 columns
translated_anime.columns
Index(['name', 'title_english', 'title_japanese', 'type', 'source',
'producers', 'genre', 'studio', 'episodes', 'status', 'airing', 'aired',
'duration', 'rating', 'score', 'scored_by', 'rank', 'popularity',
'favorites', 'premiered', 'broadcast'],
dtype='object')
translated_anime.drop(['status','aired','airing'], axis=1, inplace=True)
translated_anime.head(3)
| name | title_english | title_japanese | type | source | producers | genre | studio | episodes | duration | rating | score | scored_by | rank | popularity | favorites | premiered | broadcast | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Cowboy Bebop | Cowboy Bebop | カウボーイビバップ | TV | Original | ['Bandai Visual'] | ['Action', 'Adventure', 'Comedy', 'Drama', 'Sc... | ['Sunrise'] | 26.0 | 24 min per ep | R - 17+ (violence & profanity) | 8.81 | 405664.0 | 26.0 | 39.0 | 43460.0 | Spring 1998 | Saturdays at 01:00 (JST) |
| 2 | Trigun | Trigun | トライガン | TV | Manga | ['Victor Entertainment'] | ['Action', 'Sci-Fi', 'Adventure', 'Comedy', 'D... | ['Madhouse'] | 26.0 | 24 min per ep | PG-13 - Teens 13 or older | 8.30 | 212537.0 | 255.0 | 146.0 | 10432.0 | Spring 1998 | Thursdays at 01:15 (JST) |
| 3 | Witch Hunter Robin | Witch Hunter Robin | Witch Hunter ROBIN | TV | Original | ['Bandai Visual'] | ['Action', 'Magic', 'Police', 'Supernatural', ... | ['Sunrise'] | 26.0 | 25 min per ep | PG-13 - Teens 13 or older | 7.33 | 32837.0 | 2371.0 | 1171.0 | 537.0 | Summer 2002 | Tuesdays at Unknown |
translated_anime.columns
Index(['name', 'title_english', 'title_japanese', 'type', 'source',
'producers', 'genre', 'studio', 'episodes', 'duration', 'rating',
'score', 'scored_by', 'rank', 'popularity', 'favorites', 'premiered',
'broadcast'],
dtype='object')
premiered column into premiered_season, premiered_year for better understanding¶translated_anime[['premiered_season', 'premiered_year']] = translated_anime['premiered'].str.split(' ', n=1, expand=True)
translated_anime.drop(['premiered'], axis=1, inplace=True)
translated_anime.head(3)
| name | title_english | title_japanese | type | source | producers | genre | studio | episodes | duration | rating | score | scored_by | rank | popularity | favorites | broadcast | premiered_season | premiered_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Cowboy Bebop | Cowboy Bebop | カウボーイビバップ | TV | Original | ['Bandai Visual'] | ['Action', 'Adventure', 'Comedy', 'Drama', 'Sc... | ['Sunrise'] | 26.0 | 24 min per ep | R - 17+ (violence & profanity) | 8.81 | 405664.0 | 26.0 | 39.0 | 43460.0 | Saturdays at 01:00 (JST) | Spring | 1998 |
| 2 | Trigun | Trigun | トライガン | TV | Manga | ['Victor Entertainment'] | ['Action', 'Sci-Fi', 'Adventure', 'Comedy', 'D... | ['Madhouse'] | 26.0 | 24 min per ep | PG-13 - Teens 13 or older | 8.30 | 212537.0 | 255.0 | 146.0 | 10432.0 | Thursdays at 01:15 (JST) | Spring | 1998 |
| 3 | Witch Hunter Robin | Witch Hunter Robin | Witch Hunter ROBIN | TV | Original | ['Bandai Visual'] | ['Action', 'Magic', 'Police', 'Supernatural', ... | ['Sunrise'] | 26.0 | 25 min per ep | PG-13 - Teens 13 or older | 7.33 | 32837.0 | 2371.0 | 1171.0 | 537.0 | Tuesdays at Unknown | Summer | 2002 |
translated_anime.dtypes
name object title_english object title_japanese object type object source object producers object genre object studio object episodes float64 duration object rating object score float64 scored_by float64 rank float64 popularity float64 favorites float64 broadcast object premiered_season object premiered_year object dtype: object
translated_anime.isnull().sum()
name 0 title_english 0 title_japanese 0 type 0 source 0 producers 0 genre 0 studio 0 episodes 0 duration 0 rating 0 score 0 scored_by 0 rank 0 popularity 0 favorites 0 broadcast 0 premiered_season 0 premiered_year 0 dtype: int64
translated_anime.shape[0]
3295
translated_anime['producers'].unique()
array(["['Bandai Visual']", "['Victor Entertainment']",
"['TV Tokyo', 'Dentsu']", ...,
"['Studio Mausu', 'Namu Animation']",
"['DAX Production', 'Twin Planet']", "['Polygon Magic']"],
dtype=object)
Producer Column by extraccting value from list¶we first use the ast.literal_eval() function to convert each string representation of a list to an actual list of strings. Then, we use the .str accessor to get the first element of each list.
import ast
# Convert 'producers' column to list data type
translated_anime['producers'] = translated_anime['producers'].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)
# Extract the first element from each list in 'producers' column
translated_anime['producers'] = translated_anime['producers'].str[0]
translated_anime.head(3)
| name | title_english | title_japanese | type | source | producers | genre | studio | episodes | duration | rating | score | scored_by | rank | popularity | favorites | broadcast | premiered_season | premiered_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Cowboy Bebop | Cowboy Bebop | カウボーイビバップ | TV | Original | Bandai Visual | ['Action', 'Adventure', 'Comedy', 'Drama', 'Sc... | ['Sunrise'] | 26.0 | 24 min per ep | R - 17+ (violence & profanity) | 8.81 | 405664.0 | 26.0 | 39.0 | 43460.0 | Saturdays at 01:00 (JST) | Spring | 1998 |
| 2 | Trigun | Trigun | トライガン | TV | Manga | Victor Entertainment | ['Action', 'Sci-Fi', 'Adventure', 'Comedy', 'D... | ['Madhouse'] | 26.0 | 24 min per ep | PG-13 - Teens 13 or older | 8.30 | 212537.0 | 255.0 | 146.0 | 10432.0 | Thursdays at 01:15 (JST) | Spring | 1998 |
| 3 | Witch Hunter Robin | Witch Hunter Robin | Witch Hunter ROBIN | TV | Original | Bandai Visual | ['Action', 'Magic', 'Police', 'Supernatural', ... | ['Sunrise'] | 26.0 | 25 min per ep | PG-13 - Teens 13 or older | 7.33 | 32837.0 | 2371.0 | 1171.0 | 537.0 | Tuesdays at Unknown | Summer | 2002 |
we first use the ast.literal_eval() function to convert each string representation of a list to an actual list of strings. Then, we use the random.randint() function to generate a random integer between 0 and the length of the list of genres (minus one, since indexing starts at zero). Finally, we use this random integer to select a random genre from the list using indexing, and assign it to the 'genre' column.
Note that we use if x to check if the 'genre' column contains any empty lists, and if it does, we assign an empty string to that row's 'genre' value.
We have list of Genre which might lead to confusion so we are using random library to random assigning a single genre to anime
import random
# Convert 'genre' column to list data type
translated_anime['genre'] = translated_anime['genre'].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)
# Randomly select a genre from each list in 'genre' column
translated_anime['genre'] = translated_anime['genre'].apply(lambda x: x[random.randint(0, len(x)-1)] if x else '')
translated_anime.head(3)
| name | title_english | title_japanese | type | source | producers | genre | studio | episodes | duration | rating | score | scored_by | rank | popularity | favorites | broadcast | premiered_season | premiered_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Cowboy Bebop | Cowboy Bebop | カウボーイビバップ | TV | Original | Bandai Visual | Sci-Fi | ['Sunrise'] | 26.0 | 24 min per ep | R - 17+ (violence & profanity) | 8.81 | 405664.0 | 26.0 | 39.0 | 43460.0 | Saturdays at 01:00 (JST) | Spring | 1998 |
| 2 | Trigun | Trigun | トライガン | TV | Manga | Victor Entertainment | Sci-Fi | ['Madhouse'] | 26.0 | 24 min per ep | PG-13 - Teens 13 or older | 8.30 | 212537.0 | 255.0 | 146.0 | 10432.0 | Thursdays at 01:15 (JST) | Spring | 1998 |
| 3 | Witch Hunter Robin | Witch Hunter Robin | Witch Hunter ROBIN | TV | Original | Bandai Visual | Magic | ['Sunrise'] | 26.0 | 25 min per ep | PG-13 - Teens 13 or older | 7.33 | 32837.0 | 2371.0 | 1171.0 | 537.0 | Tuesdays at Unknown | Summer | 2002 |
Studio¶# Convert 'producers' column to list data type
translated_anime['studio'] = translated_anime['studio'].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)
# Extract the first element from each list in 'producers' column
translated_anime['studio'] = translated_anime['studio'].str[0]
translated_anime.head(3)
| name | title_english | title_japanese | type | source | producers | genre | studio | episodes | duration | rating | score | scored_by | rank | popularity | favorites | broadcast | premiered_season | premiered_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Cowboy Bebop | Cowboy Bebop | カウボーイビバップ | TV | Original | Bandai Visual | Sci-Fi | Sunrise | 26.0 | 24 min per ep | R - 17+ (violence & profanity) | 8.81 | 405664.0 | 26.0 | 39.0 | 43460.0 | Saturdays at 01:00 (JST) | Spring | 1998 |
| 2 | Trigun | Trigun | トライガン | TV | Manga | Victor Entertainment | Sci-Fi | Madhouse | 26.0 | 24 min per ep | PG-13 - Teens 13 or older | 8.30 | 212537.0 | 255.0 | 146.0 | 10432.0 | Thursdays at 01:15 (JST) | Spring | 1998 |
| 3 | Witch Hunter Robin | Witch Hunter Robin | Witch Hunter ROBIN | TV | Original | Bandai Visual | Magic | Sunrise | 26.0 | 25 min per ep | PG-13 - Teens 13 or older | 7.33 | 32837.0 | 2371.0 | 1171.0 | 537.0 | Tuesdays at Unknown | Summer | 2002 |
translated_anime.isnull().sum()
Unnamed: 0 0 name 0 title_english 0 title_japanese 0 type 0 source 0 producers 683 genre 0 studio 405 episodes 0 duration 0 rating 0 score 0 scored_by 0 rank 0 popularity 0 favorites 0 broadcast 0 premiered_season 0 premiered_year 0 dtype: int64
There are few Missing values in producers and studio columns let's drop that values
# drop rows with missing values only in specific columns
translated_anime.dropna(subset=['producers','studio'], inplace=True)
df_anime_preprocessed = translated_anime.copy()
df_anime_preprocessed.to_csv('anime_dataset.csv')
anime_dataset.csv on our github repo at https://github.com/vaibhavhariramani/Anime_Dataset_Visualisation¶import pandas as pd
url = "https://raw.githubusercontent.com/vaibhavhariramani/Anime_Dataset_Visualisation/main/anime_dataset.csv"
df_anime_preprocessed = pd.read_csv(url)
translated_anime.dtypes
Unnamed: 0 int64 name object title_english object title_japanese object type object source object producers object genre object studio object episodes float64 duration object rating object score float64 scored_by float64 rank float64 popularity float64 favorites float64 broadcast object premiered_season object premiered_year int64 dtype: object
translated_anime.isnull().sum()
Unnamed: 0 0 name 0 title_english 0 title_japanese 0 type 0 source 0 producers 683 genre 0 studio 405 episodes 0 duration 0 rating 0 score 0 scored_by 0 rank 0 popularity 0 favorites 0 broadcast 0 premiered_season 0 premiered_year 0 dtype: int64