K-POP Fandom Data Analysis with networkX
- 1. Introduction
- 2. Supporting Activities
- 3. Correlation between supporting and supported activities
- 4. Hyphothesis test by independent t-test
- 5. Graph Analysis
- 6. Comparison between G1 and G2
- 7. Effect of the fandom activities on the monthly chart
1. Introduction
-
In K-POP scene, a fandom plays a critical role in the success of their idol. They even cooperate with other fandoms to achieve their goal. For example, they do “supports”; steam albums or encourage to vote in awards for another idol and get helped in the same way when their idol release an album or be nominated for an award. With these data collected in 2018, we will find out the effect of the fandom activities.
- fandom information data: {fandom_id, fandom_name, nov_post, dec_post, nov_support, dec_support, type(gender)}
- fandom support activities data: {source, target, nov_support, dec_support}
- monthly chart data: {album, artist, rank, title, start, end}
- fandoms metadata: {fandom_name, artist, agent, year(debut), #articles}
import pandas as pd
from pandas import DataFrame, Series
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import networkx as nx
2. Supporting Activities
2.1. Fandom Supporting Ratio (Total Supporting activities in Total activities)
nodes = pd.read_csv('./dataset/fandom_nodes_2018.csv')
nodes['total_support'] = nodes.loc[:,['nov_support','dec_support']].sum(axis=1)
nodes['total_activity'] = nodes.loc[:,['nov_post','dec_post','nov_support','dec_support']].sum(axis=1)
nodes['support_ratio'] = nodes['total_support'] / nodes['total_activity']
nodes['support_ratio'] = nodes['support_ratio'].fillna(0)
nodes.loc[:,['fandom_id','total_support','total_activity','support_ratio']].sort_values(by=['support_ratio'],ascending=False).head(10)
fandom_id | total_support | total_activity | support_ratio | |
---|---|---|---|---|
213 | oh_soul | 14 | 28 | 0.500000 |
283 | tbz1206 | 1 | 2 | 0.500000 |
196 | myname | 1 | 2 | 0.500000 |
3 | 2PM | 1 | 2 | 0.500000 |
98 | imcenter | 1 | 2 | 0.500000 |
259 | shinjihoon | 921 | 1848 | 0.498377 |
180 | livematilda0317 | 1458 | 2932 | 0.497271 |
315 | xtracam | 2527 | 5083 | 0.497147 |
303 | we100 | 2848 | 5759 | 0.494530 |
201 | namyujin | 516 | 1048 | 0.492366 |
- 5 fandoms with very few activities seems to be not quite active in the period.
nodes.describe()
nov_post | dec_post | nov_support | dec_support | type | total_support | total_activity | support_ratio | |
---|---|---|---|---|---|---|---|---|
count | 330.000000 | 330.000000 | 330.000000 | 330.000000 | 330.000000 | 330.000000 | 330.000000 | 330.000000 |
mean | 5854.921212 | 5249.675758 | 1455.627273 | 859.090909 | 0.493939 | 2314.718182 | 13419.315152 | 0.226563 |
std | 17573.678298 | 16329.999791 | 2329.251725 | 1166.459438 | 0.500723 | 3310.463410 | 35436.387630 | 0.174810 |
min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 250.000000 | 127.500000 | 19.750000 | 7.250000 | 0.000000 | 53.750000 | 656.250000 | 0.043461 |
50% | 1739.000000 | 1472.000000 | 535.500000 | 403.500000 | 0.000000 | 984.500000 | 4776.000000 | 0.221352 |
75% | 5047.000000 | 4258.000000 | 1736.750000 | 1187.500000 | 1.000000 | 3077.750000 | 13687.000000 | 0.397054 |
max | 261517.000000 | 249246.000000 | 12462.000000 | 6989.000000 | 1.000000 | 16917.000000 | 525546.000000 | 0.500000 |
2.2. Case by Gender Type
edges = pd.read_csv('./dataset/fandom_edges_2018.csv')
edges['total_support'] = edges.loc[:,['nov_support','dec_support']].sum(axis=1)
df_supported = edges.groupby(['target']).sum()
df_supported = df_supported.reset_index()
df_supported.columns = ['fandom_id','nov_supported','dec_supported','total_supported']
df = pd.merge(nodes,df_supported,on='fandom_id',how='outer')
df = df.fillna(0)
# Supporting Top 10 fandoms for both genders
df.loc[:,['fandom_id','total_support']].sort_values(by=['total_support'],ascending=False).head(10)
fandom_id | total_support | |
---|---|---|
308 | winner | 16917 |
301 | wannaonego | 16036 |
31 | btob | 15739 |
181 | lovelyz | 15328 |
209 | nuest | 14895 |
292 | twice | 14783 |
183 | mamamoo | 14472 |
32 | bts | 14006 |
243 | roykim | 13149 |
81 | gx9 | 11954 |
# Supporting Top 10 fandoms for boy groups
df.loc[df['type']==1,['fandom_id','total_support']].sort_values(by=['total_support'],ascending=False).head(10)
fandom_id | total_support | |
---|---|---|
308 | winner | 16917 |
301 | wannaonego | 16036 |
31 | btob | 15739 |
209 | nuest | 14895 |
32 | bts | 14006 |
243 | roykim | 13149 |
298 | vikon | 10589 |
88 | highlight | 10575 |
124 | jsh | 9673 |
288 | tj3579 | 9314 |
# Supporting Top 10 fandoms for girl groups
df.loc[df['type']==0,['fandom_id','total_support']].sort_values(by=['total_support'],ascending=False).head(10)
fandom_id | total_support | |
---|---|---|
181 | lovelyz | 15328 |
292 | twice | 14783 |
183 | mamamoo | 14472 |
81 | gx9 | 11954 |
67 | gf | 11564 |
234 | real__izo | 11104 |
26 | blackpink | 10678 |
214 | ohmygirl | 9091 |
236 | redvelvetreveluv | 8155 |
64 | fromis | 8003 |
# Supported Top 10 fandoms for both genders
df.loc[:,['fandom_id','total_supported']].sort_values(by=['total_supported'],ascending=False).head(10)
fandom_id | total_supported | |
---|---|---|
301 | wannaonego | 33609.0 |
181 | lovelyz | 29083.0 |
234 | real__izo | 25459.0 |
32 | bts | 24865.0 |
308 | winner | 21896.0 |
209 | nuest | 19312.0 |
183 | mamamoo | 18795.0 |
67 | gf | 14899.0 |
124 | jsh | 14437.0 |
243 | roykim | 14267.0 |
# Supported Top 10 fandoms for boyt groups
df.loc[df['type']==1,['fandom_id','total_supported']].sort_values(by=['total_supported'],ascending=False).head(10)
fandom_id | total_supported | |
---|---|---|
301 | wannaonego | 33609.0 |
32 | bts | 24865.0 |
308 | winner | 21896.0 |
209 | nuest | 19312.0 |
124 | jsh | 14437.0 |
243 | roykim | 14267.0 |
298 | vikon | 14245.0 |
137 | kim | 13884.0 |
4 | 6kies | 13565.0 |
31 | btob | 11493.0 |
# Supported Top 10 fandoms for girl groups
df.loc[df['type']==0,['fandom_id','total_supported']].sort_values(by=['total_supported'],ascending=False).head(10)
fandom_id | total_supported | |
---|---|---|
181 | lovelyz | 29083.0 |
234 | real__izo | 25459.0 |
183 | mamamoo | 18795.0 |
67 | gf | 14899.0 |
64 | fromis | 13957.0 |
292 | twice | 12942.0 |
214 | ohmygirl | 12335.0 |
26 | blackpink | 9632.0 |
262 | shitaomiu | 8563.0 |
125 | jungchaeyeon | 6820.0 |
2.3. Case by Agents (Sum of artists data in a company)
meta = pd.read_json('./dataset/fandom_meta_2018.json')
df_meta = meta.copy()
df_meta.rename(columns={'fandom_name':'fandom_id'},inplace=True)
df_meta
df = pd.merge(df,df_meta,on='fandom_id')
# Supporting Top 10 Agents
df.groupby(['agent'])[['total_support','total_supported']].sum().sort_values(by='total_support',ascending=False).head(10)
total_support | total_supported | |
---|---|---|
agent | ||
YG Entertainment | 67790 | 81735.0 |
JYP Entertainment | 54025 | 41464.0 |
Off The Record | 47300 | 75508.0 |
Swing Entertainment | 41180 | 73716.0 |
SM Entertainment | 38862 | 23736.0 |
PLEDIS Entertainment | 38839 | 43115.0 |
Cube Entertainment | 30991 | 20990.0 |
Woollim Entertainment | 22780 | 35219.0 |
Jellyfish Entertainment | 17606 | 7583.0 |
Fantagio Music | 15853 | 8202.0 |
# Supported Top 10 Agents
df.groupby(['agent'])[['total_support','total_supported']].sum().sort_values(by='total_supported',ascending=False).head(10)
total_support | total_supported | |
---|---|---|
agent | ||
YG Entertainment | 67790 | 81735.0 |
Off The Record | 47300 | 75508.0 |
Swing Entertainment | 41180 | 73716.0 |
PLEDIS Entertainment | 38839 | 43115.0 |
JYP Entertainment | 54025 | 41464.0 |
Woollim Entertainment | 22780 | 35219.0 |
Big Hit Entertainment | 14006 | 24865.0 |
SM Entertainment | 38862 | 23736.0 |
Cube Entertainment | 30991 | 20990.0 |
AKS | 8554 | 19646.0 |
3. Correlation between supporting and supported activities
3.1. by Pearson correlation
from scipy import stats
print(stats.pearsonr(df['total_activity'],df['total_support']))
print(stats.pearsonr(df['total_activity'],df['total_supported']))
(0.538005581735917, 3.725917667518625e-26)
(0.5114446175904475, 2.1548883750451727e-23)
3.2. by Spearman rank correlation
print(stats.spearmanr(df['total_activity'],df['total_support']))
print(stats.spearmanr(df['total_activity'],df['total_supported']))
SpearmanrResult(correlation=0.8461473515842707, pvalue=1.1711298194140583e-91)
SpearmanrResult(correlation=0.8244727920052708, pvalue=4.2257040130501645e-83)
- Spearman rank method shows better correlation and confidence level in average.
4. Hyphothesis test by independent t-test
4.1. Supported activities and gender types
- H_0: There is a difference in supported activities between girl and boy groups.
- H_1: There is no difference in supported activities between girl and boy groups.
boy = df.loc[df['type']==1,['fandom_id','total_supported','total_support']]
girl = df.loc[df['type']==0,['fandom_id','total_supported','total_support']]
len(boy),len(girl)
(163, 167)
# t-test
stats.ttest_ind(boy['total_supported'], girl['total_supported'])
Ttest_indResult(statistic=1.494285307111671, pvalue=0.1360624474957505)
4.2. Supporting activities and gender types
- H_0: There is a difference in supporting activities between girl and boy groups.
- H_1: There is no difference in supporting activities between girl and boy groups.
# t-test
stats.ttest_ind(boy['total_support'], girl['total_support'])
Ttest_indResult(statistic=1.33823756824641, pvalue=0.1817459557355244)
- both H_0’s would not be rejected
5. Graph Analysis
5.1. Generating graphs with different time periods
- G0; graph with data in entire period
G1; grpah with data in november
G2; grpah with data in december
total_edges = edges[['source','target','total_support']].sort_values(by='total_support',ascending=False)
total_edges = total_edges.iloc[:int(len(total_edges)*0.01),]
G0 = nx.DiGraph(total_edges.loc[:,('source','target')].values.tolist())
nx.set_edge_attributes(G0, total_edges.set_index(['source', 'target'])['total_support'], 'weight')
nx.set_node_attributes(G0,df.set_index(['fandom_id'])['agent'],'agent')
nx.set_node_attributes(G0,df.set_index(['fandom_id'])['year(debut)'],'debut_year')
nx.set_node_attributes(G0,df.set_index(['fandom_id'])['type'],'gender_type')
nov_edges = edges[['source','target','nov_support']].sort_values(by='nov_support',ascending=False)
nov_edges = nov_edges.iloc[:int(len(nov_edges)*0.01),]
G1 = nx.DiGraph(nov_edges.loc[:,('source','target')].values.tolist())
nx.set_edge_attributes(G1, nov_edges.set_index(['source', 'target'])['nov_support'], 'weight')
nx.set_node_attributes(G1,df.set_index(['fandom_id'])['agent'],'agent')
nx.set_node_attributes(G1,df.set_index(['fandom_id'])['year(debut)'],'debut_year')
nx.set_node_attributes(G1,df.set_index(['fandom_id'])['type'],'gender_type')
dec_edges = edges[['source','target','dec_support']].sort_values(by='dec_support',ascending=False)
dec_edges = dec_edges.iloc[:int(len(dec_edges)*0.01),]
G2 = nx.DiGraph(dec_edges.loc[:,('source','target')].values.tolist())
nx.set_edge_attributes(G2, dec_edges.set_index(['source', 'target'])['dec_support'], 'weight')
nx.set_node_attributes(G2,df.set_index(['fandom_id'])['agent'],'agent')
nx.set_node_attributes(G2,df.set_index(['fandom_id'])['year(debut)'],'debut_year')
nx.set_node_attributes(G2,df.set_index(['fandom_id'])['type'],'gender_type')
print(len(G0.nodes),len(G0.edges))
print(len(G1.nodes),len(G1.edges))
print(len(G2.nodes),len(G2.edges))
59 236
62 236
90 236
5.2. Cumulative Distribution Function(CDF) with G0’s degree
G0_degree = dict(nx.degree(G0))
h = plt.hist(G0_degree.values())
cdf = stats.norm().cdf(sorted(G0_degree.values()))
plt.plot(sorted(G0_degree.values()), cdf)
plt.show()
5.3. Top 5 fandoms in G0 sorted by Pagerank and Degree Centrality
- in both results, top fandoms shows very high scores in features - supporting, supported, total activites, while the average of supporting/supported activities for all data is 2,314 and the average of total activities for all data is about 13,419.
def viewer(_dict_data, col_name):
result = pd.DataFrame().from_dict(_dict_data, orient='index', columns=[col_name])
return df.set_index('fandom_id').join(result, how='right').sort_values(col_name, ascending=False)
pagerank = nx.pagerank(G0)
pagerank = viewer(pagerank, 'pagerank')
pagerank.head(5)[['type','total_support','total_activity','total_supported','agent','year(debut)','pagerank']]
type | total_support | total_activity | total_supported | agent | year(debut) | pagerank | |
---|---|---|---|---|---|---|---|
wannaonego | 1 | 16036 | 77608 | 33609.0 | Swing Entertainment | 2017 | 0.131164 |
real__izo | 0 | 11104 | 90757 | 25459.0 | Off The Record | 2018 | 0.081406 |
lovelyz | 0 | 15328 | 171752 | 29083.0 | Woollim Entertainment | 2014 | 0.076592 |
jsh | 1 | 9673 | 20872 | 14437.0 | Antenna | 2016 | 0.065368 |
nuest | 1 | 14895 | 56741 | 19312.0 | PLEDIS Entertainment | 2012 | 0.064216 |
degree_centrality = nx.degree_centrality(G0)
degree_centrality = viewer(degree_centrality, 'degree_centrality')
degree_centrality.head(5)[['type','total_support','total_activity','total_supported','agent','year(debut)','degree_centrality']]
type | total_support | total_activity | total_supported | agent | year(debut) | degree_centrality | |
---|---|---|---|---|---|---|---|
wannaonego | 1 | 16036 | 77608 | 33609.0 | Swing Entertainment | 2017 | 0.827586 |
lovelyz | 0 | 15328 | 171752 | 29083.0 | Woollim Entertainment | 2014 | 0.482759 |
winner | 1 | 16917 | 41928 | 21896.0 | YG Entertainment | 2014 | 0.465517 |
nuest | 1 | 14895 | 56741 | 19312.0 | PLEDIS Entertainment | 2012 | 0.448276 |
bts | 1 | 14006 | 125849 | 24865.0 | Big Hit Entertainment | 2013 | 0.431034 |
df[['total_support','total_activity','total_supported']].describe()
total_support | total_activity | total_supported | |
---|---|---|---|
count | 330.000000 | 330.000000 | 330.000000 |
mean | 2314.718182 | 13419.315152 | 2314.718182 |
std | 3310.463410 | 35436.387630 | 4464.948830 |
min | 0.000000 | 0.000000 | 0.000000 |
25% | 53.750000 | 656.250000 | 2.000000 |
50% | 984.500000 | 4776.000000 | 511.500000 |
75% | 3077.750000 | 13687.000000 | 2923.500000 |
max | 16917.000000 | 525546.000000 | 33609.000000 |
5.4. Supporting-supported cases by gender types in G0
- Generally, boy group fandoms are more supporting.
G0_support = pd.DataFrame().from_dict({e for e in G0.edges})
G0_support.columns = ['source','target']
tmp = pd.DataFrame().from_dict({n for n in G0.nodes.data('gender_type')})
tmp.columns = ['source','source_gender']
G0_support = pd.merge(G0_support,tmp,how='left')
tmp.columns = ['target','target_gender']
G0_support = pd.merge(G0_support,tmp,how='left')
G0_support
source | target | source_gender | target_gender | |
---|---|---|---|---|
0 | winner | day6 | 1 | 1 |
1 | btob | mamamoo | 1 | 0 |
2 | mmld | ohmygirl | 0 | 0 |
3 | nuest | got7vlive | 1 | 1 |
4 | nuest | ch_freemonth | 1 | 0 |
... | ... | ... | ... | ... |
231 | sanarang | twice | 0 | 0 |
232 | 6kies | wannaonego | 1 | 1 |
233 | redvelvetreveluv | wannaonego | 0 | 1 |
234 | mamamoo | twice | 0 | 0 |
235 | gx9 | vikon | 0 | 1 |
236 rows × 4 columns
# boy -> boy
print(len(G0_support[(G0_support.source_gender==1) & (G0_support.target_gender==1)]) / len(G0_support))
# boy -> girl
print(len(G0_support[(G0_support.source_gender==1) & (G0_support.target_gender==0)]) / len(G0_support))
# girl -> boy
print(len(G0_support[(G0_support.source_gender==0) & (G0_support.target_gender==1)]) / len(G0_support))
# girl -> girl
print(len(G0_support[(G0_support.source_gender==0) & (G0_support.target_gender==0)]) / len(G0_support))
0.3220338983050847
0.2627118644067797
0.2542372881355932
0.16101694915254236
6. Comparison between G1 and G2
6.1. Persistence of edges-supporting and gender types
G1_support = set(e for e in G1.edges)
G2_support = set(e for e in G2.edges)
G1_only = pd.DataFrame.from_dict(G1_support - G2_support)
G1_only.columns = ['source','target']
tmp = pd.DataFrame().from_dict({n for n in G1.nodes.data('gender_type')})
tmp.columns = ['source','source_gender']
G1_only = pd.merge(G1_only,tmp,how='left')
tmp.columns = ['target','target_gender']
G1_only = pd.merge(G1_only,tmp,how='left')
# Supporting in november only
G1_only
source | target | source_gender | target_gender | |
---|---|---|---|---|
0 | winner | day6 | 1 | 1 |
1 | mamamoo | wannaonego | 0 | 1 |
2 | btob | mamamoo | 1 | 0 |
3 | mmld | ohmygirl | 0 | 0 |
4 | nuest | got7vlive | 1 | 1 |
... | ... | ... | ... | ... |
139 | anyujin | wannaonego | 0 | 1 |
140 | buzz | mamamoo | 1 | 0 |
141 | redvelvetreveluv | wannaonego | 0 | 1 |
142 | idlesong | lovelyz | 0 | 0 |
143 | gx9 | vikon | 0 | 1 |
144 rows × 4 columns
G2_only = pd.DataFrame.from_dict(G2_support-G1_support)
G2_only.columns = ['source','target']
tmp = pd.DataFrame().from_dict({n for n in G2.nodes.data('gender_type')})
tmp.columns = ['source','source_gender']
G2_only = pd.merge(G2_only,tmp,how='left')
tmp.columns = ['target','target_gender']
G2_only = pd.merge(G2_only,tmp,how='left')
# Supporting in december only
G2_only
source | target | source_gender | target_gender | |
---|---|---|---|---|
0 | wheesung | bts | 1 | 1 |
1 | mkyunghoon | lovelyz | 1 | 0 |
2 | bts | nuest | 1 | 1 |
3 | god | bts | 1 | 1 |
4 | onairpril | lovelyz | 0 | 0 |
... | ... | ... | ... | ... |
139 | jjy | vikon | 1 | 1 |
140 | sanarang | twice | 0 | 0 |
141 | onf | lovelyz | 1 | 0 |
142 | got7vlive | fromis | 1 | 0 |
143 | idlesong | blackpink | 0 | 0 |
144 rows × 4 columns
G1_n_G2 = pd.DataFrame.from_dict(G1_support & G2_support)
G1_n_G2.columns = ['source','target']
tmp = pd.DataFrame().from_dict({n for n in G1.nodes.data('gender_type')})
tmp.columns = ['source','source_gender']
G1_n_G2 = pd.merge(G1_n_G2,tmp,how='left')
tmp.columns = ['target','target_gender']
G1_n_G2 = pd.merge(G1_n_G2,tmp,how='left')
# Supporting in both months
G1_n_G2
source | target | source_gender | target_gender | |
---|---|---|---|---|
0 | twice | roykim | 0 | 1 |
1 | guckkasten | wannaonego | 1 | 1 |
2 | bigbang | wannaonego | 1 | 1 |
3 | btob | bts | 1 | 1 |
4 | ohmygirl | fromis | 0 | 0 |
... | ... | ... | ... | ... |
87 | real__izo | wannaonego | 0 | 1 |
88 | tj3579 | jsh | 1 | 1 |
89 | nuest | jsh | 1 | 1 |
90 | 6kies | wannaonego | 1 | 1 |
91 | mamamoo | twice | 0 | 0 |
92 rows × 4 columns
# boy -> boy
print(len(G1_n_G2[(G1_n_G2.source_gender==1) & (G1_n_G2.target_gender==1)]) / len(G1_n_G2))
# boy -> girl
print(len(G1_n_G2[(G1_n_G2.source_gender==1) & (G1_n_G2.target_gender==0)]) / len(G1_n_G2))
# girl -> boy
print(len(G1_n_G2[(G1_n_G2.source_gender==0) & (G1_n_G2.target_gender==1)]) / len(G1_n_G2))
# girl -> girl
print(len(G1_n_G2[(G1_n_G2.source_gender==0) & (G1_n_G2.target_gender==0)]) / len(G1_n_G2))
0.40217391304347827
0.21739130434782608
0.25
0.13043478260869565
6.2. Graph Role Extraction by RolX algorithm
from pprint import pprint
import seaborn as sns
from graphrole import RecursiveFeatureExtractor, RoleExtractor
feature_extractor = RecursiveFeatureExtractor(G1)
features = feature_extractor.extract_features()
role_extractor = RoleExtractor(n_roles=None)
role_extractor.extract_role_factors(features)
G1_roles = role_extractor.roles
G1_role_groups = {}
for n,r in G1_roles.items():
if r in G1_role_groups:
G1_role_groups[r].append(n)
else:
G1_role_groups[r] = [n]
for r,g in G1_role_groups.items():
print(r,g)
role_2 ['6kies', 'ahnhyungsub', 'anyujin', 'apink', 'boa', 'dmlwlsska', 'girlsgeneration_new', 'girlsofthemonth', 'haonkim', 'highfiveofteenager', 'idlesong', 'jjy', 'jungsewoon', 'jungyeon', 'kgkg', 'kim', 'kimsejeong', 'mkyunghoon', 'mmld', 'nth0510', 'onairpril', 'paka', 'pjb', 'sakura0319', 'samkim', 'shinhwa', 'taeyeon_new1', 'unitb', 'wheesung', 'yuseonho']
role_4 ['bigbang', 'blackpink', 'ch_freemonth', 'chungha', 'fromis', 'guckkasten', 'iu_tv', 'redvelvetreveluv', 'tj3579', 'vikon']
role_1 ['btob', 'bts', 'gx9', 'jsh', 'mamamoo', 'nuest', 'real__izo', 'roykim', 'straykids', 'wannaonego', 'winner', 'zico']
role_3 ['buzz', 'got7vlive', 'kimdonghan', 'madewg', 'twice']
role_0 ['day6', 'gf', 'lovelyz', 'miyazakimiho', 'ohmygirl']
feature_extractor = RecursiveFeatureExtractor(G2)
features = feature_extractor.extract_features()
role_extractor = RoleExtractor(n_roles=5) # set 'n_roles' same as G1`s
role_extractor.extract_role_factors(features)
G2_roles = role_extractor.roles
G2_role_groups = {}
for n,r in G2_roles.items():
if r in G2_role_groups:
G2_role_groups[r].append(n)
else:
G2_role_groups[r] = [n]
for r,g in G2_role_groups.items():
print(r,g)
role_2 ['6kies', 'ahnhyungsub', 'berrygood', 'blackpink', 'buzz', 'dickpunks', 'fromis', 'highfiveofteenager', 'jjy', 'jungchaeyeon', 'kgkg', 'kimsejeong', 'lovelyz', 'mmld', 'pjb', 'real__izo', 'sanarang', 'wheesung', 'wjsnvlive', 'woojinyoung', 'zico']
role_4 ['astro', 'b1a4', 'bigbang', 'doakim', 'doitamazing7', 'dongbang_new', 'fortediquattro', 'girllaboum', 'girlsgeneration_new', 'god', 'godgayoung', 'goldenchild', 'guckkasten', 'gx9', 'highlight', 'hotshot', 'in2it', 'infinite', 'ioi', 'iu_tv', 'jsh', 'kanghyewon', 'kim', 'kimchaewon', 'kimdonghan', 'ksy', 'ladiescode', 'mamamoo', 'mino0330', 'mkyunghoon', 'nojisun', 'nth0510', 'nuest', 'ohmygirl', 'onairpril', 'onf', 'paka', 'pentagon', 'pledis', 'pushkang', 'redvelvetreveluv', 'sonamoo', 'taewookim', 'tj3579', 'vikon', 'wekimeki']
role_1 ['btob', 'bts', 'day6', 'gf', 'got7vlive', 'idlesong', 'madewg', 'roykim', 'straykids', 'twice', 'wannaonego', 'winner']
role_0 ['dmlwlsska', 'eunjiwon', 'jdh', 'jeonsomi', 'jungsewoon', 'kdani', 'leechaeyeon', 'miyazakimiho', 'yabukinako']
role_3 ['shitaomiu', 'take_miyu']
6.3. Community Detection by Girvan-Newman algorithm
from networkx.algorithms import community
comp = community.girvan_newman(G1.to_undirected())
communities = tuple(sorted(c) for c in next(comp))
print([len(c) for c in communities])
for c in communities:
print(c)
[39, 23]
['6kies', 'apink', 'bigbang', 'blackpink', 'btob', 'bts', 'ch_freemonth', 'chungha', 'day6', 'fromis', 'gf', 'girlsgeneration_new', 'girlsofthemonth', 'got7vlive', 'guckkasten', 'gx9', 'haonkim', 'idlesong', 'iu_tv', 'jsh', 'jungyeon', 'kimdonghan', 'lovelyz', 'madewg', 'mamamoo', 'mkyunghoon', 'mmld', 'nuest', 'ohmygirl', 'pjb', 'real__izo', 'redvelvetreveluv', 'roykim', 'straykids', 'tj3579', 'twice', 'vikon', 'winner', 'zico']
['ahnhyungsub', 'anyujin', 'boa', 'buzz', 'dmlwlsska', 'highfiveofteenager', 'jjy', 'jungsewoon', 'kgkg', 'kim', 'kimsejeong', 'miyazakimiho', 'nth0510', 'onairpril', 'paka', 'sakura0319', 'samkim', 'shinhwa', 'taeyeon_new1', 'unitb', 'wannaonego', 'wheesung', 'yuseonho']
comp = community.girvan_newman(G2.to_undirected())
communities = tuple(sorted(c) for c in next(comp))
print([len(c) for c in communities])
for c in communities:
print(c)
[84, 6]
['6kies', 'ahnhyungsub', 'astro', 'b1a4', 'berrygood', 'bigbang', 'blackpink', 'btob', 'bts', 'buzz', 'day6', 'dickpunks', 'doakim', 'doitamazing7', 'dongbang_new', 'eunjiwon', 'fortediquattro', 'fromis', 'gf', 'girllaboum', 'girlsgeneration_new', 'god', 'godgayoung', 'goldenchild', 'got7vlive', 'guckkasten', 'gx9', 'highfiveofteenager', 'highlight', 'hotshot', 'in2it', 'infinite', 'ioi', 'iu_tv', 'jdh', 'jjy', 'jsh', 'jungchaeyeon', 'jungsewoon', 'kanghyewon', 'kdani', 'kgkg', 'kim', 'kimchaewon', 'kimdonghan', 'kimsejeong', 'ksy', 'ladiescode', 'leechaeyeon', 'lovelyz', 'madewg', 'mamamoo', 'mino0330', 'mkyunghoon', 'mmld', 'nojisun', 'nth0510', 'nuest', 'ohmygirl', 'onairpril', 'onf', 'paka', 'pentagon', 'pjb', 'pledis', 'pushkang', 'real__izo', 'redvelvetreveluv', 'roykim', 'sanarang', 'sonamoo', 'straykids', 'taewookim', 'tj3579', 'twice', 'vikon', 'wannaonego', 'wekimeki', 'wheesung', 'winner', 'wjsnvlive', 'woojinyoung', 'yabukinako', 'zico']
['dmlwlsska', 'idlesong', 'jeonsomi', 'miyazakimiho', 'shitaomiu', 'take_miyu']
6.4. Page Rank and Degree Centrality in G1, G2
pagerank = nx.pagerank(G1)
pagerank = viewer(pagerank, 'pagerank')
pagerank.head(10).index
Index(['wannaonego', 'real__izo', 'jsh', 'nuest', 'winner', 'roykim',
'lovelyz', 'ohmygirl', 'gf', 'twice'],
dtype='object')
pagerank = nx.pagerank(G2)
pagerank = viewer(pagerank, 'pagerank')
pagerank.head(10).index
Index(['lovelyz', 'bts', 'jsh', 'wannaonego', 'fromis', 'nuest', 'gf', 'twice',
'highfiveofteenager', 'roykim'],
dtype='object')
degree_centrality = nx.degree_centrality(G1)
degree_centrality = viewer(degree_centrality, 'degree_centrality')
degree_centrality.head(10).index
Index(['wannaonego', 'nuest', 'real__izo', 'winner', 'mamamoo', 'roykim',
'jsh', 'bts', 'lovelyz', 'twice'],
dtype='object')
degree_centrality = nx.degree_centrality(G2)
degree_centrality = viewer(degree_centrality, 'degree_centrality')
degree_centrality.head(10).index
Index(['lovelyz', 'bts', 'wannaonego', 'winner', 'twice', 'jsh', 'fromis',
'gf', 'nuest', 'btob'],
dtype='object')
7. Effect of the fandom activities on the monthly chart
monthly_chart = pd.read_json('./dataset/monthly_chart_2018.json')
m11 = monthly_chart[monthly_chart.start==201811]
m12 = monthly_chart[monthly_chart.start==201812]
df_chart = df[['fandom_id','chart_name','total_activity','nov_supported','dec_supported','total_supported']]
in_nov = df_chart['chart_name'].isin(m11['artist'])
df_nov = df_chart[in_nov].groupby(['chart_name']).sum()
rank_nov = m11[m11['artist'].isin(df_chart['chart_name'])].groupby(['artist'])['rank'].mean()
chart_nov = pd.merge(df_nov,rank_nov,left_index=True,right_index=True,how='left')
chart_nov.sort_values(by='rank')
total_activity | nov_supported | dec_supported | total_supported | rank | |
---|---|---|---|---|---|
chart_name | |||||
바이브 | 1088 | 70.0 | 7.0 | 77.0 | 4.000000 |
임창정 | 1649 | 6.0 | 1.0 | 7.0 | 7.000000 |
벤 | 304 | 4.0 | 0.0 | 4.0 | 16.000000 |
TWICE (트와이스) | 606440 | 8341.0 | 5549.0 | 13890.0 | 17.500000 |
아이유 | 130843 | 2467.0 | 1407.0 | 3874.0 | 19.000000 |
선미 | 718 | 36.0 | 5.0 | 41.0 | 20.000000 |
IZ*ONE (아이즈원) | 309295 | 37770.0 | 18416.0 | 56186.0 | 22.000000 |
로이킴 | 30003 | 10979.0 | 3288.0 | 14267.0 | 29.500000 |
iKON | 33653 | 9430.0 | 5150.0 | 14580.0 | 39.500000 |
비투비 | 47090 | 6551.0 | 4942.0 | 11493.0 | 41.250000 |
BLACKPINK | 104426 | 7314.0 | 2318.0 | 9632.0 | 45.500000 |
Red Velvet (레드벨벳) | 194521 | 6117.0 | 3132.0 | 9249.0 | 50.000000 |
EXO | 28649 | 5.0 | 1.0 | 6.0 | 55.272727 |
방탄소년단 | 125849 | 14519.0 | 10346.0 | 24865.0 | 63.300000 |
Apink (에이핑크) | 89709 | 507.0 | 492.0 | 999.0 | 71.000000 |
하이라이트 (Highlight) | 29392 | 4317.0 | 3357.0 | 7674.0 | 75.000000 |
Wanna One (워너원) | 427571 | 47720.0 | 25996.0 | 73716.0 | 77.800000 |
정승환 | 20872 | 9704.0 | 4733.0 | 14437.0 | 78.000000 |
지코 (ZICO) | 19315 | 2272.0 | 566.0 | 2838.0 | 84.000000 |
(여자)아이들 | 22860 | 1500.0 | 830.0 | 2330.0 | 91.000000 |
df_chart = df[['fandom_id','chart_name','total_activity','nov_supported','dec_supported','total_supported']]
in_dec = df_chart['chart_name'].isin(m12['artist'])
df_dec = df_chart[in_dec].groupby(['chart_name']).sum()
rank_dec = m12[m12['artist'].isin(df_chart['chart_name'])].groupby(['artist'])['rank'].mean()
chart_dec = pd.merge(df_dec,rank_dec,left_index=True,right_index=True,how='left')
chart_dec.sort_values(by='rank')
total_activity | nov_supported | dec_supported | total_supported | rank | |
---|---|---|---|---|---|
chart_name | |||||
바이브 | 1088 | 70.0 | 7.0 | 77.0 | 6.000000 |
벤 | 304 | 4.0 | 0.0 | 4.0 | 12.000000 |
임창정 | 1649 | 6.0 | 1.0 | 7.0 | 15.000000 |
허각 | 336 | 0.0 | 0.0 | 0.0 | 17.000000 |
WINNER | 69689 | 17153.0 | 11363.0 | 28516.0 | 21.000000 |
TWICE (트와이스) | 606440 | 8341.0 | 5549.0 | 13890.0 | 25.500000 |
아이유 | 130843 | 2467.0 | 1407.0 | 3874.0 | 33.000000 |
IZ*ONE (아이즈원) | 309295 | 37770.0 | 18416.0 | 56186.0 | 37.000000 |
비투비 | 47090 | 6551.0 | 4942.0 | 11493.0 | 38.666667 |
BLACKPINK | 104426 | 7314.0 | 2318.0 | 9632.0 | 39.000000 |
선미 | 718 | 36.0 | 5.0 | 41.0 | 40.000000 |
로이킴 | 30003 | 10979.0 | 3288.0 | 14267.0 | 45.000000 |
EXO | 28649 | 5.0 | 1.0 | 6.0 | 60.000000 |
Red Velvet (레드벨벳) | 194521 | 6117.0 | 3132.0 | 9249.0 | 66.000000 |
EXID | 28513 | 19.0 | 6.0 | 25.0 | 66.000000 |
방탄소년단 | 125849 | 14519.0 | 10346.0 | 24865.0 | 67.875000 |
iKON | 33653 | 9430.0 | 5150.0 | 14580.0 | 71.000000 |
Wanna One (워너원) | 427571 | 47720.0 | 25996.0 | 73716.0 | 73.000000 |
print(stats.spearmanr(chart_nov['nov_supported'],chart_nov['rank']))
print(stats.spearmanr(chart_dec['dec_supported'],chart_dec['rank']))
SpearmanrResult(correlation=0.2721804511278195, pvalue=0.2456688782803774)
SpearmanrResult(correlation=0.42480625827858726, pvalue=0.07887457444015573)
common = pd.merge(chart_nov[chart_nov.index.isin(chart_dec.index)],chart_dec['rank'],left_index=True,right_index=True,how='left')
common.sort_values(by=['rank_x','rank_y'])
total_activity | nov_supported | dec_supported | total_supported | rank_x | rank_y | |
---|---|---|---|---|---|---|
chart_name | ||||||
바이브 | 1088 | 70.0 | 7.0 | 77.0 | 4.000000 | 6.000000 |
임창정 | 1649 | 6.0 | 1.0 | 7.0 | 7.000000 | 15.000000 |
벤 | 304 | 4.0 | 0.0 | 4.0 | 16.000000 | 12.000000 |
TWICE (트와이스) | 606440 | 8341.0 | 5549.0 | 13890.0 | 17.500000 | 25.500000 |
아이유 | 130843 | 2467.0 | 1407.0 | 3874.0 | 19.000000 | 33.000000 |
선미 | 718 | 36.0 | 5.0 | 41.0 | 20.000000 | 40.000000 |
IZ*ONE (아이즈원) | 309295 | 37770.0 | 18416.0 | 56186.0 | 22.000000 | 37.000000 |
로이킴 | 30003 | 10979.0 | 3288.0 | 14267.0 | 29.500000 | 45.000000 |
iKON | 33653 | 9430.0 | 5150.0 | 14580.0 | 39.500000 | 71.000000 |
비투비 | 47090 | 6551.0 | 4942.0 | 11493.0 | 41.250000 | 38.666667 |
BLACKPINK | 104426 | 7314.0 | 2318.0 | 9632.0 | 45.500000 | 39.000000 |
Red Velvet (레드벨벳) | 194521 | 6117.0 | 3132.0 | 9249.0 | 50.000000 | 66.000000 |
EXO | 28649 | 5.0 | 1.0 | 6.0 | 55.272727 | 60.000000 |
방탄소년단 | 125849 | 14519.0 | 10346.0 | 24865.0 | 63.300000 | 67.875000 |
Wanna One (워너원) | 427571 | 47720.0 | 25996.0 | 73716.0 | 77.800000 | 73.000000 |
print(stats.spearmanr(common['nov_supported'],common['rank_x']))
print(stats.spearmanr(common['dec_supported'],common['rank_y']))
SpearmanrResult(correlation=0.4892857142857142, pvalue=0.06416038631454055)
SpearmanrResult(correlation=0.5004470273579545, pvalue=0.057440003802729525)