bipartite 그래프 간단하게 시각화하기

포스팅 개요

bipartite 그래프는 이종, 그러니까 다른 종류의 노드로 구성된 그래프이다. 이를 테면 추천 문제에서 유저와 아이템의 관계와 같이. 그래프 데이터를 간단하게 시각화하여 보기 위해서는 networkx라는 라이브러리를 활용할 수 있다. 상용화 툴이나 자바스크립트 기반의 예쁘게 그려주는 라이브러리들도 있는데, 지금 소개하는 방법은 주로 주피터 노트북 위에서 아주 간단하게 전체 노드의 관계를 전반적인 관점에서 보기 위한 원시적인 방법이라고 생각하면 될 것 같다.

1. 예제 개요

툴: python의 networkx 라이브러리 (pip install networkx)
데이터: 무비렌즈 100k

1) 시각화를 위한 라이브러리 로드

import os

import pandas as pd

import matplotlib.pyplot as plt

from networkx.algorithms import bipartite
import networkx as nx

2) 데이터 불러오기

data_dir = '[my-path]/ml-100k/'

# 영화 평점 데이터 로드
ratings_df = pd.read_csv(os.path.join(data_dir, 'ua.base'), sep='\t', header=None)
ratings_df.columns = ['uid', 'cid', 'rating', 'timestamp']
print(ratings_df.shape)

(90570, 4) 로 매우 많은 관계(평점으로 연결된)를 가짐

3) 시각화할 데이터 샘플링

# undersampling ramdomly
df = ratings_df.sample(n=2000)

# 로그가 하나만 있는 유저나 콘텐츠는 제거
df = df[df.duplicated(subset='uid', keep=False) & df.duplicated(subset='cid', keep=False)]

# 시각화 방해 노드 수동 제거
df = df[(df['uid'] != 206) & (df['uid'] != 589) & (df['uid'] != 844)]

print(df.shape)

(1458, 4) 건의 데이터를 샘플링

데이터가 너무 많으면 시각화에서 잘 안보이기 때문에 적은 수로 샘플링해준다. 이 때, 거대연결요소로 연결되지 않은 일부 노드가 시각화를 방해하여 수동으로 제거해줌

2. 이종 그래프 노드 색을 달리하여 시각화

아래 코드를 통해, 오렌지색의 유저 노드와 파란색의 아이템 노드를 연결한 복잡계를 시각화함.

# 그래프 만들기
bipartite_G = nx.Graph()

## Logic to add nodes and edges to graph with their metadata
for r, d in df.iterrows():
    pid = 'P{0}'.format(d['uid'])
    cid = 'C{0}'.format(d['cid'])
    bipartite_G.add_node(pid, bipartite='user')
    bipartite_G.add_node(cid, bipartite='item')
    bipartite_G.add_edge(pid, cid, role=d['rating'])
    
color_map = []
for node in bipartite_G:
    if node.startswith('P'): # 유저노드 오렌지색
        color_map.append('orange')
    else: 					 # 아이템노드 파란색
        color_map.append('blue')      

plt.figure(figsize=(20,20))
nx.draw(bipartite_G, node_color=color_map, with_labels=True)
plt.show()

3. 이종 그래프를 이형 노드로 align하여 시각화

유저 노드를 파란색으로 좌측 정렬하고, 아이템 노드를 오렌지색으로 우측 정렬한 뒤 시각화하면 아래와 같이 예쁜 그림이 그려짐. 비대칭적인 그래프에서는 일부 노드에 연결이 매우 집중된 것을 눈에 띄게 확인할 수 있는데, 이 데이터는 거의 그렇지 않게 나타나고 있음.

# 노드명 구분을 위한 태깅
df.uid = df.uid.apply(lambda x: f'U{x}')
df.cid = df.cid.apply(lambda x: f'C{x}')

# 노드와 엣지 정의
users = df.uid.unique()
contents = df.cid.unique()
relations = df[['uid', 'cid']].values

# 그래프 생성
B = nx.Graph()

B.add_nodes_from(users, bipartite='user') # Add the node attribute "bipartite"
B.add_nodes_from(contents, bipartite='item')
B.add_edges_from(relations)
bottom_nodes, top_nodes = bipartite.sets(B)

# 그래프 시각화
color_dict = {'user':'b', 'item':'r'}
color_list = [color_dict[i[1]] for i in B.nodes.data('bipartite')]

# Draw bipartite graph
pos = dict()
color = []
pos.update( (n, (1, i)) for i, n in enumerate(bottom_nodes) ) # put nodes from X at x=1
pos.update( (n, (2, i)) for i, n in enumerate(top_nodes) ) # put nodes from Y at x=2

plt.figure(figsize=(20,20))
nx.draw(B, pos=pos, with_labels=False, node_color=color_list, node_size=40, width=.3)
plt.show()

마무리

사실 그래프 시각화는 이번에 처음 해봤는데, 생소해서 다른 시각화보다는 조금 까다로웠던 것 같다. 그래도 이러한 자료는 보고서 같은 곳에 첨부하거나, 이종 그래프 형태의 데이터가 있다면 구조를 설명할 때에 활용하기 좋을 것 같다.

'CS' 카테고리의 다른 글

파이썬에서 SSH 터널링과 mysql 커넥션 풀을 통해 DB 접속하기 (0)	2023.07.10
vimrc 설정하기 (0)	2023.07.05
Git 기존 브랜치 가져와서 새로운 브랜치 생성하기 (0)	2023.06.05
[vscode] 원격 탐색기 ssh config alias 설정하기 (0)	2023.05.19
ubuntu matplotlib 한글 폰트 설정 (0)	2023.05.19

완벽하지 않은 완벽주의자

bipartite 그래프 간단하게 시각화하기

포스팅 개요

1. 예제 개요

1) 시각화를 위한 라이브러리 로드

2) 데이터 불러오기

3) 시각화할 데이터 샘플링

2. 이종 그래프 노드 색을 달리하여 시각화

3. 이종 그래프를 이형 노드로 align하여 시각화

마무리

'CS' 카테고리의 다른 글

티스토리툴바

bipartite 그래프 간단하게 시각화하기

포스팅 개요

1. 예제 개요

1) 시각화를 위한 라이브러리 로드

2) 데이터 불러오기

3) 시각화할 데이터 샘플링

2. 이종 그래프 노드 색을 달리하여 시각화

3. 이종 그래프를 이형 노드로 align하여 시각화

마무리

'CS' 카테고리의 다른 글

관련글

티스토리툴바