/ PROJECTS

K-POP Fandom Data Analysis with networkX

1. Introduction

  • In K-POP scene, a fandom plays a critical role in the success of their idol. They even cooperate with other fandoms to achieve their goal. For example, they do “supports”; steam albums or encourage to vote in awards for another idol and get helped in the same way when their idol release an album or be nominated for an award. With these data collected in 2018, we will find out the effect of the fandom activities.

  • fandom information data: {fandom_id, fandom_name, nov_post, dec_post, nov_support, dec_support, type(gender)}
  • fandom support activities data: {source, target, nov_support, dec_support}
  • monthly chart data: {album, artist, rank, title, start, end}
  • fandoms metadata: {fandom_name, artist, agent, year(debut), #articles}
import pandas as pd
from pandas import DataFrame, Series
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import networkx as nx

2. Supporting Activities

2.1. Fandom Supporting Ratio (Total Supporting activities in Total activities)

nodes = pd.read_csv('./dataset/fandom_nodes_2018.csv')
nodes['total_support'] = nodes.loc[:,['nov_support','dec_support']].sum(axis=1)
nodes['total_activity'] = nodes.loc[:,['nov_post','dec_post','nov_support','dec_support']].sum(axis=1)
nodes['support_ratio'] = nodes['total_support'] / nodes['total_activity']
nodes['support_ratio'] = nodes['support_ratio'].fillna(0)
nodes.loc[:,['fandom_id','total_support','total_activity','support_ratio']].sort_values(by=['support_ratio'],ascending=False).head(10)
fandom_id total_support total_activity support_ratio
213 oh_soul 14 28 0.500000
283 tbz1206 1 2 0.500000
196 myname 1 2 0.500000
3 2PM 1 2 0.500000
98 imcenter 1 2 0.500000
259 shinjihoon 921 1848 0.498377
180 livematilda0317 1458 2932 0.497271
315 xtracam 2527 5083 0.497147
303 we100 2848 5759 0.494530
201 namyujin 516 1048 0.492366
  • 5 fandoms with very few activities seems to be not quite active in the period.
nodes.describe()
nov_post dec_post nov_support dec_support type total_support total_activity support_ratio
count 330.000000 330.000000 330.000000 330.000000 330.000000 330.000000 330.000000 330.000000
mean 5854.921212 5249.675758 1455.627273 859.090909 0.493939 2314.718182 13419.315152 0.226563
std 17573.678298 16329.999791 2329.251725 1166.459438 0.500723 3310.463410 35436.387630 0.174810
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 250.000000 127.500000 19.750000 7.250000 0.000000 53.750000 656.250000 0.043461
50% 1739.000000 1472.000000 535.500000 403.500000 0.000000 984.500000 4776.000000 0.221352
75% 5047.000000 4258.000000 1736.750000 1187.500000 1.000000 3077.750000 13687.000000 0.397054
max 261517.000000 249246.000000 12462.000000 6989.000000 1.000000 16917.000000 525546.000000 0.500000

2.2. Case by Gender Type

edges = pd.read_csv('./dataset/fandom_edges_2018.csv')
edges['total_support'] = edges.loc[:,['nov_support','dec_support']].sum(axis=1)
df_supported = edges.groupby(['target']).sum()
df_supported = df_supported.reset_index()
df_supported.columns = ['fandom_id','nov_supported','dec_supported','total_supported']
df = pd.merge(nodes,df_supported,on='fandom_id',how='outer')
df = df.fillna(0)
# Supporting Top 10 fandoms for both genders
df.loc[:,['fandom_id','total_support']].sort_values(by=['total_support'],ascending=False).head(10)
fandom_id total_support
308 winner 16917
301 wannaonego 16036
31 btob 15739
181 lovelyz 15328
209 nuest 14895
292 twice 14783
183 mamamoo 14472
32 bts 14006
243 roykim 13149
81 gx9 11954
# Supporting Top 10 fandoms for boy groups
df.loc[df['type']==1,['fandom_id','total_support']].sort_values(by=['total_support'],ascending=False).head(10)
fandom_id total_support
308 winner 16917
301 wannaonego 16036
31 btob 15739
209 nuest 14895
32 bts 14006
243 roykim 13149
298 vikon 10589
88 highlight 10575
124 jsh 9673
288 tj3579 9314
# Supporting Top 10 fandoms for girl groups
df.loc[df['type']==0,['fandom_id','total_support']].sort_values(by=['total_support'],ascending=False).head(10)
fandom_id total_support
181 lovelyz 15328
292 twice 14783
183 mamamoo 14472
81 gx9 11954
67 gf 11564
234 real__izo 11104
26 blackpink 10678
214 ohmygirl 9091
236 redvelvetreveluv 8155
64 fromis 8003
# Supported Top 10 fandoms for both genders
df.loc[:,['fandom_id','total_supported']].sort_values(by=['total_supported'],ascending=False).head(10)
fandom_id total_supported
301 wannaonego 33609.0
181 lovelyz 29083.0
234 real__izo 25459.0
32 bts 24865.0
308 winner 21896.0
209 nuest 19312.0
183 mamamoo 18795.0
67 gf 14899.0
124 jsh 14437.0
243 roykim 14267.0
# Supported Top 10 fandoms for boyt groups
df.loc[df['type']==1,['fandom_id','total_supported']].sort_values(by=['total_supported'],ascending=False).head(10)
fandom_id total_supported
301 wannaonego 33609.0
32 bts 24865.0
308 winner 21896.0
209 nuest 19312.0
124 jsh 14437.0
243 roykim 14267.0
298 vikon 14245.0
137 kim 13884.0
4 6kies 13565.0
31 btob 11493.0
# Supported Top 10 fandoms for girl groups
df.loc[df['type']==0,['fandom_id','total_supported']].sort_values(by=['total_supported'],ascending=False).head(10)
fandom_id total_supported
181 lovelyz 29083.0
234 real__izo 25459.0
183 mamamoo 18795.0
67 gf 14899.0
64 fromis 13957.0
292 twice 12942.0
214 ohmygirl 12335.0
26 blackpink 9632.0
262 shitaomiu 8563.0
125 jungchaeyeon 6820.0

2.3. Case by Agents (Sum of artists data in a company)

meta = pd.read_json('./dataset/fandom_meta_2018.json')
df_meta = meta.copy()
df_meta.rename(columns={'fandom_name':'fandom_id'},inplace=True)
df_meta
df = pd.merge(df,df_meta,on='fandom_id')
# Supporting Top 10 Agents
df.groupby(['agent'])[['total_support','total_supported']].sum().sort_values(by='total_support',ascending=False).head(10)
total_support total_supported
agent
YG Entertainment 67790 81735.0
JYP Entertainment 54025 41464.0
Off The Record 47300 75508.0
Swing Entertainment 41180 73716.0
SM Entertainment 38862 23736.0
PLEDIS Entertainment 38839 43115.0
Cube Entertainment 30991 20990.0
Woollim Entertainment 22780 35219.0
Jellyfish Entertainment 17606 7583.0
Fantagio Music 15853 8202.0
# Supported Top 10 Agents
df.groupby(['agent'])[['total_support','total_supported']].sum().sort_values(by='total_supported',ascending=False).head(10)
total_support total_supported
agent
YG Entertainment 67790 81735.0
Off The Record 47300 75508.0
Swing Entertainment 41180 73716.0
PLEDIS Entertainment 38839 43115.0
JYP Entertainment 54025 41464.0
Woollim Entertainment 22780 35219.0
Big Hit Entertainment 14006 24865.0
SM Entertainment 38862 23736.0
Cube Entertainment 30991 20990.0
AKS 8554 19646.0

3. Correlation between supporting and supported activities

3.1. by Pearson correlation

from scipy import stats
print(stats.pearsonr(df['total_activity'],df['total_support']))
print(stats.pearsonr(df['total_activity'],df['total_supported']))
(0.538005581735917, 3.725917667518625e-26)
(0.5114446175904475, 2.1548883750451727e-23)

3.2. by Spearman rank correlation

print(stats.spearmanr(df['total_activity'],df['total_support']))
print(stats.spearmanr(df['total_activity'],df['total_supported']))
SpearmanrResult(correlation=0.8461473515842707, pvalue=1.1711298194140583e-91)
SpearmanrResult(correlation=0.8244727920052708, pvalue=4.2257040130501645e-83)
  • Spearman rank method shows better correlation and confidence level in average.

4. Hyphothesis test by independent t-test

4.1. Supported activities and gender types

  • H_0: There is a difference in supported activities between girl and boy groups.
  • H_1: There is no difference in supported activities between girl and boy groups.
boy = df.loc[df['type']==1,['fandom_id','total_supported','total_support']]
girl = df.loc[df['type']==0,['fandom_id','total_supported','total_support']]
len(boy),len(girl)
(163, 167)
# t-test
stats.ttest_ind(boy['total_supported'], girl['total_supported'])
Ttest_indResult(statistic=1.494285307111671, pvalue=0.1360624474957505)

4.2. Supporting activities and gender types

  • H_0: There is a difference in supporting activities between girl and boy groups.
  • H_1: There is no difference in supporting activities between girl and boy groups.
# t-test
stats.ttest_ind(boy['total_support'], girl['total_support'])
Ttest_indResult(statistic=1.33823756824641, pvalue=0.1817459557355244)
  • both H_0’s would not be rejected

5. Graph Analysis

5.1. Generating graphs with different time periods

  • G0; graph with data in entire period
    G1; grpah with data in november
    G2; grpah with data in december
total_edges = edges[['source','target','total_support']].sort_values(by='total_support',ascending=False)
total_edges = total_edges.iloc[:int(len(total_edges)*0.01),]
G0 = nx.DiGraph(total_edges.loc[:,('source','target')].values.tolist())
nx.set_edge_attributes(G0, total_edges.set_index(['source', 'target'])['total_support'], 'weight')
nx.set_node_attributes(G0,df.set_index(['fandom_id'])['agent'],'agent')
nx.set_node_attributes(G0,df.set_index(['fandom_id'])['year(debut)'],'debut_year')
nx.set_node_attributes(G0,df.set_index(['fandom_id'])['type'],'gender_type')
nov_edges = edges[['source','target','nov_support']].sort_values(by='nov_support',ascending=False)
nov_edges = nov_edges.iloc[:int(len(nov_edges)*0.01),]
G1 = nx.DiGraph(nov_edges.loc[:,('source','target')].values.tolist())
nx.set_edge_attributes(G1, nov_edges.set_index(['source', 'target'])['nov_support'], 'weight')
nx.set_node_attributes(G1,df.set_index(['fandom_id'])['agent'],'agent')
nx.set_node_attributes(G1,df.set_index(['fandom_id'])['year(debut)'],'debut_year')
nx.set_node_attributes(G1,df.set_index(['fandom_id'])['type'],'gender_type')
dec_edges = edges[['source','target','dec_support']].sort_values(by='dec_support',ascending=False)
dec_edges = dec_edges.iloc[:int(len(dec_edges)*0.01),]
G2 = nx.DiGraph(dec_edges.loc[:,('source','target')].values.tolist())
nx.set_edge_attributes(G2, dec_edges.set_index(['source', 'target'])['dec_support'], 'weight')
nx.set_node_attributes(G2,df.set_index(['fandom_id'])['agent'],'agent')
nx.set_node_attributes(G2,df.set_index(['fandom_id'])['year(debut)'],'debut_year')
nx.set_node_attributes(G2,df.set_index(['fandom_id'])['type'],'gender_type')
print(len(G0.nodes),len(G0.edges)) 
print(len(G1.nodes),len(G1.edges)) 
print(len(G2.nodes),len(G2.edges)) 
59 236
62 236
90 236

5.2. Cumulative Distribution Function(CDF) with G0’s degree

G0_degree = dict(nx.degree(G0))
h = plt.hist(G0_degree.values())

png

cdf = stats.norm().cdf(sorted(G0_degree.values()))
plt.plot(sorted(G0_degree.values()), cdf)
plt.show()

png

5.3. Top 5 fandoms in G0 sorted by Pagerank and Degree Centrality

  • in both results, top fandoms shows very high scores in features - supporting, supported, total activites, while the average of supporting/supported activities for all data is 2,314 and the average of total activities for all data is about 13,419.
def viewer(_dict_data, col_name):
    result = pd.DataFrame().from_dict(_dict_data, orient='index', columns=[col_name])
    return df.set_index('fandom_id').join(result, how='right').sort_values(col_name, ascending=False)
pagerank = nx.pagerank(G0)
pagerank = viewer(pagerank, 'pagerank')
pagerank.head(5)[['type','total_support','total_activity','total_supported','agent','year(debut)','pagerank']]
type total_support total_activity total_supported agent year(debut) pagerank
wannaonego 1 16036 77608 33609.0 Swing Entertainment 2017 0.131164
real__izo 0 11104 90757 25459.0 Off The Record 2018 0.081406
lovelyz 0 15328 171752 29083.0 Woollim Entertainment 2014 0.076592
jsh 1 9673 20872 14437.0 Antenna 2016 0.065368
nuest 1 14895 56741 19312.0 PLEDIS Entertainment 2012 0.064216
degree_centrality = nx.degree_centrality(G0)
degree_centrality = viewer(degree_centrality, 'degree_centrality')
degree_centrality.head(5)[['type','total_support','total_activity','total_supported','agent','year(debut)','degree_centrality']]
type total_support total_activity total_supported agent year(debut) degree_centrality
wannaonego 1 16036 77608 33609.0 Swing Entertainment 2017 0.827586
lovelyz 0 15328 171752 29083.0 Woollim Entertainment 2014 0.482759
winner 1 16917 41928 21896.0 YG Entertainment 2014 0.465517
nuest 1 14895 56741 19312.0 PLEDIS Entertainment 2012 0.448276
bts 1 14006 125849 24865.0 Big Hit Entertainment 2013 0.431034
df[['total_support','total_activity','total_supported']].describe()
total_support total_activity total_supported
count 330.000000 330.000000 330.000000
mean 2314.718182 13419.315152 2314.718182
std 3310.463410 35436.387630 4464.948830
min 0.000000 0.000000 0.000000
25% 53.750000 656.250000 2.000000
50% 984.500000 4776.000000 511.500000
75% 3077.750000 13687.000000 2923.500000
max 16917.000000 525546.000000 33609.000000

5.4. Supporting-supported cases by gender types in G0

  • Generally, boy group fandoms are more supporting.
G0_support = pd.DataFrame().from_dict({e for e in G0.edges})
G0_support.columns = ['source','target']
tmp = pd.DataFrame().from_dict({n for n in G0.nodes.data('gender_type')})
tmp.columns = ['source','source_gender']
G0_support = pd.merge(G0_support,tmp,how='left')
tmp.columns = ['target','target_gender']
G0_support = pd.merge(G0_support,tmp,how='left')
G0_support
source target source_gender target_gender
0 winner day6 1 1
1 btob mamamoo 1 0
2 mmld ohmygirl 0 0
3 nuest got7vlive 1 1
4 nuest ch_freemonth 1 0
... ... ... ... ...
231 sanarang twice 0 0
232 6kies wannaonego 1 1
233 redvelvetreveluv wannaonego 0 1
234 mamamoo twice 0 0
235 gx9 vikon 0 1

236 rows × 4 columns

# boy -> boy
print(len(G0_support[(G0_support.source_gender==1) & (G0_support.target_gender==1)]) / len(G0_support))
# boy -> girl
print(len(G0_support[(G0_support.source_gender==1) & (G0_support.target_gender==0)]) / len(G0_support))
# girl -> boy
print(len(G0_support[(G0_support.source_gender==0) & (G0_support.target_gender==1)]) / len(G0_support))
# girl -> girl
print(len(G0_support[(G0_support.source_gender==0) & (G0_support.target_gender==0)]) / len(G0_support))
0.3220338983050847
0.2627118644067797
0.2542372881355932
0.16101694915254236

6. Comparison between G1 and G2

6.1. Persistence of edges-supporting and gender types

G1_support = set(e for e in G1.edges)
G2_support = set(e for e in G2.edges)
G1_only = pd.DataFrame.from_dict(G1_support - G2_support)
G1_only.columns = ['source','target']
tmp = pd.DataFrame().from_dict({n for n in G1.nodes.data('gender_type')})
tmp.columns = ['source','source_gender']
G1_only = pd.merge(G1_only,tmp,how='left')
tmp.columns = ['target','target_gender']
G1_only = pd.merge(G1_only,tmp,how='left')
# Supporting in november only
G1_only
source target source_gender target_gender
0 winner day6 1 1
1 mamamoo wannaonego 0 1
2 btob mamamoo 1 0
3 mmld ohmygirl 0 0
4 nuest got7vlive 1 1
... ... ... ... ...
139 anyujin wannaonego 0 1
140 buzz mamamoo 1 0
141 redvelvetreveluv wannaonego 0 1
142 idlesong lovelyz 0 0
143 gx9 vikon 0 1

144 rows × 4 columns

G2_only = pd.DataFrame.from_dict(G2_support-G1_support)
G2_only.columns = ['source','target']
tmp = pd.DataFrame().from_dict({n for n in G2.nodes.data('gender_type')})
tmp.columns = ['source','source_gender']
G2_only = pd.merge(G2_only,tmp,how='left')
tmp.columns = ['target','target_gender']
G2_only = pd.merge(G2_only,tmp,how='left')
# Supporting in december only
G2_only
source target source_gender target_gender
0 wheesung bts 1 1
1 mkyunghoon lovelyz 1 0
2 bts nuest 1 1
3 god bts 1 1
4 onairpril lovelyz 0 0
... ... ... ... ...
139 jjy vikon 1 1
140 sanarang twice 0 0
141 onf lovelyz 1 0
142 got7vlive fromis 1 0
143 idlesong blackpink 0 0

144 rows × 4 columns

G1_n_G2 = pd.DataFrame.from_dict(G1_support & G2_support)
G1_n_G2.columns = ['source','target']
tmp = pd.DataFrame().from_dict({n for n in G1.nodes.data('gender_type')})
tmp.columns = ['source','source_gender']
G1_n_G2 = pd.merge(G1_n_G2,tmp,how='left')
tmp.columns = ['target','target_gender']
G1_n_G2 = pd.merge(G1_n_G2,tmp,how='left')
# Supporting in both months
G1_n_G2
source target source_gender target_gender
0 twice roykim 0 1
1 guckkasten wannaonego 1 1
2 bigbang wannaonego 1 1
3 btob bts 1 1
4 ohmygirl fromis 0 0
... ... ... ... ...
87 real__izo wannaonego 0 1
88 tj3579 jsh 1 1
89 nuest jsh 1 1
90 6kies wannaonego 1 1
91 mamamoo twice 0 0

92 rows × 4 columns

# boy -> boy
print(len(G1_n_G2[(G1_n_G2.source_gender==1) & (G1_n_G2.target_gender==1)]) / len(G1_n_G2))
# boy -> girl
print(len(G1_n_G2[(G1_n_G2.source_gender==1) & (G1_n_G2.target_gender==0)]) / len(G1_n_G2))
# girl -> boy
print(len(G1_n_G2[(G1_n_G2.source_gender==0) & (G1_n_G2.target_gender==1)]) / len(G1_n_G2))
# girl -> girl
print(len(G1_n_G2[(G1_n_G2.source_gender==0) & (G1_n_G2.target_gender==0)]) / len(G1_n_G2))
0.40217391304347827
0.21739130434782608
0.25
0.13043478260869565

6.2. Graph Role Extraction by RolX algorithm

from pprint import pprint
import seaborn as sns
from graphrole import RecursiveFeatureExtractor, RoleExtractor
feature_extractor = RecursiveFeatureExtractor(G1)
features = feature_extractor.extract_features()
role_extractor = RoleExtractor(n_roles=None)
role_extractor.extract_role_factors(features)
G1_roles = role_extractor.roles
G1_role_groups = {}
for n,r in G1_roles.items():
    if r in G1_role_groups:
        G1_role_groups[r].append(n)
    else:
        G1_role_groups[r] = [n]
for r,g in G1_role_groups.items():
    print(r,g)
role_2 ['6kies', 'ahnhyungsub', 'anyujin', 'apink', 'boa', 'dmlwlsska', 'girlsgeneration_new', 'girlsofthemonth', 'haonkim', 'highfiveofteenager', 'idlesong', 'jjy', 'jungsewoon', 'jungyeon', 'kgkg', 'kim', 'kimsejeong', 'mkyunghoon', 'mmld', 'nth0510', 'onairpril', 'paka', 'pjb', 'sakura0319', 'samkim', 'shinhwa', 'taeyeon_new1', 'unitb', 'wheesung', 'yuseonho']
role_4 ['bigbang', 'blackpink', 'ch_freemonth', 'chungha', 'fromis', 'guckkasten', 'iu_tv', 'redvelvetreveluv', 'tj3579', 'vikon']
role_1 ['btob', 'bts', 'gx9', 'jsh', 'mamamoo', 'nuest', 'real__izo', 'roykim', 'straykids', 'wannaonego', 'winner', 'zico']
role_3 ['buzz', 'got7vlive', 'kimdonghan', 'madewg', 'twice']
role_0 ['day6', 'gf', 'lovelyz', 'miyazakimiho', 'ohmygirl']
feature_extractor = RecursiveFeatureExtractor(G2)
features = feature_extractor.extract_features()
role_extractor = RoleExtractor(n_roles=5) # set 'n_roles' same as G1`s
role_extractor.extract_role_factors(features)
G2_roles = role_extractor.roles
G2_role_groups = {}
for n,r in G2_roles.items():
    if r in G2_role_groups:
        G2_role_groups[r].append(n)
    else:
        G2_role_groups[r] = [n]
for r,g in G2_role_groups.items():
    print(r,g)
role_2 ['6kies', 'ahnhyungsub', 'berrygood', 'blackpink', 'buzz', 'dickpunks', 'fromis', 'highfiveofteenager', 'jjy', 'jungchaeyeon', 'kgkg', 'kimsejeong', 'lovelyz', 'mmld', 'pjb', 'real__izo', 'sanarang', 'wheesung', 'wjsnvlive', 'woojinyoung', 'zico']
role_4 ['astro', 'b1a4', 'bigbang', 'doakim', 'doitamazing7', 'dongbang_new', 'fortediquattro', 'girllaboum', 'girlsgeneration_new', 'god', 'godgayoung', 'goldenchild', 'guckkasten', 'gx9', 'highlight', 'hotshot', 'in2it', 'infinite', 'ioi', 'iu_tv', 'jsh', 'kanghyewon', 'kim', 'kimchaewon', 'kimdonghan', 'ksy', 'ladiescode', 'mamamoo', 'mino0330', 'mkyunghoon', 'nojisun', 'nth0510', 'nuest', 'ohmygirl', 'onairpril', 'onf', 'paka', 'pentagon', 'pledis', 'pushkang', 'redvelvetreveluv', 'sonamoo', 'taewookim', 'tj3579', 'vikon', 'wekimeki']
role_1 ['btob', 'bts', 'day6', 'gf', 'got7vlive', 'idlesong', 'madewg', 'roykim', 'straykids', 'twice', 'wannaonego', 'winner']
role_0 ['dmlwlsska', 'eunjiwon', 'jdh', 'jeonsomi', 'jungsewoon', 'kdani', 'leechaeyeon', 'miyazakimiho', 'yabukinako']
role_3 ['shitaomiu', 'take_miyu']

6.3. Community Detection by Girvan-Newman algorithm

from networkx.algorithms import community
comp = community.girvan_newman(G1.to_undirected())
communities = tuple(sorted(c) for c in next(comp))
print([len(c) for c in communities])
for c in communities:
    print(c)
[39, 23]
['6kies', 'apink', 'bigbang', 'blackpink', 'btob', 'bts', 'ch_freemonth', 'chungha', 'day6', 'fromis', 'gf', 'girlsgeneration_new', 'girlsofthemonth', 'got7vlive', 'guckkasten', 'gx9', 'haonkim', 'idlesong', 'iu_tv', 'jsh', 'jungyeon', 'kimdonghan', 'lovelyz', 'madewg', 'mamamoo', 'mkyunghoon', 'mmld', 'nuest', 'ohmygirl', 'pjb', 'real__izo', 'redvelvetreveluv', 'roykim', 'straykids', 'tj3579', 'twice', 'vikon', 'winner', 'zico']
['ahnhyungsub', 'anyujin', 'boa', 'buzz', 'dmlwlsska', 'highfiveofteenager', 'jjy', 'jungsewoon', 'kgkg', 'kim', 'kimsejeong', 'miyazakimiho', 'nth0510', 'onairpril', 'paka', 'sakura0319', 'samkim', 'shinhwa', 'taeyeon_new1', 'unitb', 'wannaonego', 'wheesung', 'yuseonho']
comp = community.girvan_newman(G2.to_undirected())
communities = tuple(sorted(c) for c in next(comp))
print([len(c) for c in communities])
for c in communities:
    print(c)
[84, 6]
['6kies', 'ahnhyungsub', 'astro', 'b1a4', 'berrygood', 'bigbang', 'blackpink', 'btob', 'bts', 'buzz', 'day6', 'dickpunks', 'doakim', 'doitamazing7', 'dongbang_new', 'eunjiwon', 'fortediquattro', 'fromis', 'gf', 'girllaboum', 'girlsgeneration_new', 'god', 'godgayoung', 'goldenchild', 'got7vlive', 'guckkasten', 'gx9', 'highfiveofteenager', 'highlight', 'hotshot', 'in2it', 'infinite', 'ioi', 'iu_tv', 'jdh', 'jjy', 'jsh', 'jungchaeyeon', 'jungsewoon', 'kanghyewon', 'kdani', 'kgkg', 'kim', 'kimchaewon', 'kimdonghan', 'kimsejeong', 'ksy', 'ladiescode', 'leechaeyeon', 'lovelyz', 'madewg', 'mamamoo', 'mino0330', 'mkyunghoon', 'mmld', 'nojisun', 'nth0510', 'nuest', 'ohmygirl', 'onairpril', 'onf', 'paka', 'pentagon', 'pjb', 'pledis', 'pushkang', 'real__izo', 'redvelvetreveluv', 'roykim', 'sanarang', 'sonamoo', 'straykids', 'taewookim', 'tj3579', 'twice', 'vikon', 'wannaonego', 'wekimeki', 'wheesung', 'winner', 'wjsnvlive', 'woojinyoung', 'yabukinako', 'zico']
['dmlwlsska', 'idlesong', 'jeonsomi', 'miyazakimiho', 'shitaomiu', 'take_miyu']

6.4. Page Rank and Degree Centrality in G1, G2

pagerank = nx.pagerank(G1)
pagerank = viewer(pagerank, 'pagerank')
pagerank.head(10).index
Index(['wannaonego', 'real__izo', 'jsh', 'nuest', 'winner', 'roykim',
       'lovelyz', 'ohmygirl', 'gf', 'twice'],
      dtype='object')
pagerank = nx.pagerank(G2)
pagerank = viewer(pagerank, 'pagerank')
pagerank.head(10).index
Index(['lovelyz', 'bts', 'jsh', 'wannaonego', 'fromis', 'nuest', 'gf', 'twice',
       'highfiveofteenager', 'roykim'],
      dtype='object')
degree_centrality = nx.degree_centrality(G1)
degree_centrality = viewer(degree_centrality, 'degree_centrality')
degree_centrality.head(10).index
Index(['wannaonego', 'nuest', 'real__izo', 'winner', 'mamamoo', 'roykim',
       'jsh', 'bts', 'lovelyz', 'twice'],
      dtype='object')
degree_centrality = nx.degree_centrality(G2)
degree_centrality = viewer(degree_centrality, 'degree_centrality')
degree_centrality.head(10).index
Index(['lovelyz', 'bts', 'wannaonego', 'winner', 'twice', 'jsh', 'fromis',
       'gf', 'nuest', 'btob'],
      dtype='object')

7. Effect of the fandom activities on the monthly chart

monthly_chart = pd.read_json('./dataset/monthly_chart_2018.json')
m11 = monthly_chart[monthly_chart.start==201811]
m12 = monthly_chart[monthly_chart.start==201812]
df_chart = df[['fandom_id','chart_name','total_activity','nov_supported','dec_supported','total_supported']]
in_nov = df_chart['chart_name'].isin(m11['artist'])
df_nov = df_chart[in_nov].groupby(['chart_name']).sum()
rank_nov = m11[m11['artist'].isin(df_chart['chart_name'])].groupby(['artist'])['rank'].mean()
chart_nov = pd.merge(df_nov,rank_nov,left_index=True,right_index=True,how='left')
chart_nov.sort_values(by='rank')
total_activity nov_supported dec_supported total_supported rank
chart_name
바이브 1088 70.0 7.0 77.0 4.000000
임창정 1649 6.0 1.0 7.0 7.000000
304 4.0 0.0 4.0 16.000000
TWICE (트와이스) 606440 8341.0 5549.0 13890.0 17.500000
아이유 130843 2467.0 1407.0 3874.0 19.000000
선미 718 36.0 5.0 41.0 20.000000
IZ*ONE (아이즈원) 309295 37770.0 18416.0 56186.0 22.000000
로이킴 30003 10979.0 3288.0 14267.0 29.500000
iKON 33653 9430.0 5150.0 14580.0 39.500000
비투비 47090 6551.0 4942.0 11493.0 41.250000
BLACKPINK 104426 7314.0 2318.0 9632.0 45.500000
Red Velvet (레드벨벳) 194521 6117.0 3132.0 9249.0 50.000000
EXO 28649 5.0 1.0 6.0 55.272727
방탄소년단 125849 14519.0 10346.0 24865.0 63.300000
Apink (에이핑크) 89709 507.0 492.0 999.0 71.000000
하이라이트 (Highlight) 29392 4317.0 3357.0 7674.0 75.000000
Wanna One (워너원) 427571 47720.0 25996.0 73716.0 77.800000
정승환 20872 9704.0 4733.0 14437.0 78.000000
지코 (ZICO) 19315 2272.0 566.0 2838.0 84.000000
(여자)아이들 22860 1500.0 830.0 2330.0 91.000000
df_chart = df[['fandom_id','chart_name','total_activity','nov_supported','dec_supported','total_supported']]
in_dec = df_chart['chart_name'].isin(m12['artist'])
df_dec = df_chart[in_dec].groupby(['chart_name']).sum()
rank_dec = m12[m12['artist'].isin(df_chart['chart_name'])].groupby(['artist'])['rank'].mean()
chart_dec = pd.merge(df_dec,rank_dec,left_index=True,right_index=True,how='left')
chart_dec.sort_values(by='rank')
total_activity nov_supported dec_supported total_supported rank
chart_name
바이브 1088 70.0 7.0 77.0 6.000000
304 4.0 0.0 4.0 12.000000
임창정 1649 6.0 1.0 7.0 15.000000
허각 336 0.0 0.0 0.0 17.000000
WINNER 69689 17153.0 11363.0 28516.0 21.000000
TWICE (트와이스) 606440 8341.0 5549.0 13890.0 25.500000
아이유 130843 2467.0 1407.0 3874.0 33.000000
IZ*ONE (아이즈원) 309295 37770.0 18416.0 56186.0 37.000000
비투비 47090 6551.0 4942.0 11493.0 38.666667
BLACKPINK 104426 7314.0 2318.0 9632.0 39.000000
선미 718 36.0 5.0 41.0 40.000000
로이킴 30003 10979.0 3288.0 14267.0 45.000000
EXO 28649 5.0 1.0 6.0 60.000000
Red Velvet (레드벨벳) 194521 6117.0 3132.0 9249.0 66.000000
EXID 28513 19.0 6.0 25.0 66.000000
방탄소년단 125849 14519.0 10346.0 24865.0 67.875000
iKON 33653 9430.0 5150.0 14580.0 71.000000
Wanna One (워너원) 427571 47720.0 25996.0 73716.0 73.000000
print(stats.spearmanr(chart_nov['nov_supported'],chart_nov['rank']))
print(stats.spearmanr(chart_dec['dec_supported'],chart_dec['rank']))
SpearmanrResult(correlation=0.2721804511278195, pvalue=0.2456688782803774)
SpearmanrResult(correlation=0.42480625827858726, pvalue=0.07887457444015573)
common = pd.merge(chart_nov[chart_nov.index.isin(chart_dec.index)],chart_dec['rank'],left_index=True,right_index=True,how='left')
common.sort_values(by=['rank_x','rank_y'])
total_activity nov_supported dec_supported total_supported rank_x rank_y
chart_name
바이브 1088 70.0 7.0 77.0 4.000000 6.000000
임창정 1649 6.0 1.0 7.0 7.000000 15.000000
304 4.0 0.0 4.0 16.000000 12.000000
TWICE (트와이스) 606440 8341.0 5549.0 13890.0 17.500000 25.500000
아이유 130843 2467.0 1407.0 3874.0 19.000000 33.000000
선미 718 36.0 5.0 41.0 20.000000 40.000000
IZ*ONE (아이즈원) 309295 37770.0 18416.0 56186.0 22.000000 37.000000
로이킴 30003 10979.0 3288.0 14267.0 29.500000 45.000000
iKON 33653 9430.0 5150.0 14580.0 39.500000 71.000000
비투비 47090 6551.0 4942.0 11493.0 41.250000 38.666667
BLACKPINK 104426 7314.0 2318.0 9632.0 45.500000 39.000000
Red Velvet (레드벨벳) 194521 6117.0 3132.0 9249.0 50.000000 66.000000
EXO 28649 5.0 1.0 6.0 55.272727 60.000000
방탄소년단 125849 14519.0 10346.0 24865.0 63.300000 67.875000
Wanna One (워너원) 427571 47720.0 25996.0 73716.0 77.800000 73.000000
print(stats.spearmanr(common['nov_supported'],common['rank_x']))
print(stats.spearmanr(common['dec_supported'],common['rank_y']))
SpearmanrResult(correlation=0.4892857142857142, pvalue=0.06416038631454055)
SpearmanrResult(correlation=0.5004470273579545, pvalue=0.057440003802729525)