Machine Learning Journal Club

Overview

Mission Statement

We read and discuss the latest papers in machine learning and computational neuroscience. The hope is to generate new ideas and keep up-to-date with the literature.

Weekly meetings will focus on three themes: graph-based machine learning, deep generative models, and ML4Neuroscience.
All meetings will be in person or on Zoom. Graduate students and postdocs are especially encouraged to attend and propose/present papers. Appropriate papers present foundational principles for computation, new statistical, mathematical, or machine learning methods, etc.

Time & Place

Thursday, 2:00pm - 3:00pm KST
R&D Building(산학기술관), 702-1 (google maps), Hanyang University, Seoul

Mailing List

Organizers: Please contact HyunGeun Lee (didls1228@hanyang.ac.kr) if you are interested in presenting or have any questions about the journal club.

Meetings

Below is the list of upcoming, as well as all prior journal club meetings.

Date	Presenter	Reading	Location
04.18.2024	HyunGeun Lee	LFADS - Latent Factor Analysis via Dynamical Systems, Sussillo et al. (link, video) summary	산학기술관 702-1
04.11.2024	Wooyul Jung	Discrete Graph Structure Learning for Forecasting Multiple Time Series, Shang et al. (link, video) summary summary : 본 논문은 graph structure를 학습하는 multivariate time series forecasting model을 소개합니다. GTS(graph for time series)는 NRI, DCRNN를 포함하는 baselines과 비교해서 time series forecasting 성능에 더 좋은 성능을 가집니다. 논문은 graph structure의 priori knowledge가 더 나은 forecasting quality를 만드는 것에 도움을 준다고 말합니다. strength : NRI와는 달리 같은 time series에서 같은 graph structure를 생성합니다. DCRNN을 사용함으로써 multivariate time series 데이터를 잘 분석할 수 있습니다. weakness : 상황에 따라 다른 structure를 가지는 것이 더 real world data에 가까울 것입니다. question : Dot product를 사용하지 않고 concat을 하는 이유가 궁금합니다. 저희 연구에서 T-GCN이나 DCGRU 대신 일반적인 gru를 사용한 이유가 궁금합니다. Mulivariate Dynamic의 real connectivity를 추론하는 논문의 발전 방향이 궁금합니다.	산학기술관 702-1
04.04.2024	Seungwon Yu	Parameterized Explainer for Graph Neural Network, Luo et al. (link, video) summary summary : 본 논문은 single instance 를 target으로 하는 GNNExplainer의 성능을 개선하기 위해 mutual information 값의 loss로 직접적인 mask update를 진행하지 않고 mask의 확률값을 binary concrete distribution으로 나타내고 해당 분포의 변수인 latent variable w 의 값을 학습하는 방식을 사용합니다. 학습된 parameter 각 edge가 존재할 확률값을 구하는데 쓰이며 해당 값은 양쪽 node의 feature representation을 concat한 값을 입력으로 합니다. strength : 해당 논문은 GNNExplainer 대비 높은 성능과 훨씬 빠른 속도를 보여줍니다. 각 node 및 각 graph에 대하여 새로 학습을 진행하고 inference를 해야하는 GNNExplainer 대비 연산량의 큰 감소를 보이고 있습니다. weekness : 표현되는 subgraph의 크기를 제한하는 buget constraint term과 같은 node를 포함하는 edge를 연결하는 connectivity constraint term 에 대한 설명이 조금 부실합니다. 해당 constraint는 subgraph를 구성하는 중요한 요소임에도 불구하고 너무 간략하게만 설명되어있습니다.	산학기술관 702-1
03.28.2024	Ung Hwang	Future Directions in Foundations of Graph Machine Learning, Morris et al. (link, video) summary 본 논문에서는 기존의 1-WL test의 기준에만 focus를 둔 취지의 논문들과 또 그것을 용인하는 듯한 학회들의 반응에 대해 의문을 던지며 본 논문을 작성하였습니다. 제안된 논문들이 현실에서 잘 작동 안되는 이유들이 있는데, 그것들을 표현력, 일반화 능력, 최적화의 문제로 접근하여 주로 레퍼런스를 같이 소개하면서 문제들을 조명합니다. Strength: 기존에 발표된 다수의 논문들과 GNN 모델의 성능 측정면에서 다각적인 차원(표현력, 일반화 능력, 최적화) 에서 바라 볼 수 있게 다양한 문제를 제기해줍니다. 그 중에서 그 문제를 해결하기 좋은 접근 방법을 제시한 논문들을 소개해주는 방식으로 어떤 포인트로 문제 해결을 위해 접근할 수 있는지를 알려줍니다. 기존의 실험된 사례들을 다수 소개하며 거기서 밝혀진 현상들에 대한 사실을 기반으로 문제를 제기합니다. 자체 실험은 없지만 실험된 결과들에 대한 기반으로 논지를 전개하였기에 길지 않은 내용 속에서도 여러 논문들의 실험 결과를 파악 할 수 있는 점이 장점입니다. Weakness : 논문 자체에서 새로 실험을 하거나 기발한 방법에 대해 소개하는 것은 아니었고 이러한 시각으로 GNN 논문 작성에 접근해보자는 논문이었기에 독자로 하여금 GNN 모델에 대한 더 깊은 고찰을 한번 더 생각 할 수 있는 부분에서 끝난 것은 장점이기도 하지만 본 논문의 특성상 한계라고 볼 수도 있습니다. Question : 본 논문에서 제기한 문제들에 대해 기존에 내가 알고 있는 잘 제안된 해결방법들을 가졌던 논문들이 있었나요? 아니면 이 논문에서 제기한 문제들은 전부 받아들이기 적합한 제안이었을까요? 무리하게 문제제기를 한 부분이 없진 않았을까요?	산학기술관 702-1
03.14.2024	Miseon Park	Graph U-Nets, Hongyang Gao & Shuiwang Ji (link, video) summary Summary : 본 논문은 기존 grid data에서 적용되던 U-Nets을 graph에서 적용하고자 하였습니다. Graph Pooling Layer와 Graph unPooling Layer를 통해 Network embedding을 수행합니다. gPool layer에서는 trainable projection vector p와 k-max pooling을 통해서 sub graph를 생성합니다. unPool layer에서는 만들어진 sub graph를 기반으로 기존의 그래프 형태로 복구시키는 작업을 수행합니다. Strengthen : 기존의 그래프에서 더 중요한 노드에 가중치를 줄 수 있습니다. 또한 graph connectivity augmentation via graph power 등의 방법을 이용하여 GCN을 더욱 효과적으로 사용하고 있습니다. Weakness : COLLAB datasets의 경우 비교 모델인 DiffPool-DET보다 낮은 성능을 보여주고 있습니다. 또한, 변형된 GCN을 사용하고 있는데, 기존의 GCN을 사용한 결과와의 차이를 보여주지 않습니다. Question : 변형된 GCN을 사용하고 있는데, 기존의 GCN을 사용한 결과와 얼마나 유의미한 차이가 있는가 궁금합니다. 또한 저널 클럽 시간에 이야기한 것처럼 edge feature를 이용해서 classification을 하는 방법에 대해서 궁금해졌습니다.	산학기술관 702-1
03.07.2024	YeongHak Jeong	Position-aware Graph Neural Networks, You et al. (link, video) summary 본 논문은 기존의 GNN이 isomorphic graph의 symmetric한 node를 구별하지 못하는 structure-aware embedding의 한계를 해결하기 위해, node의 position을 함꼐 저장하는 방법을 제안하고 있습니다. Anchor-set을 임의로 지정하여 node의 symmetry를 깨고, Anchor-set과 node 간의 거리 정보를 propagate하고자 하는 message에 포함시킴으로서 position-aware embedding 또한 가능하다고 주장하고 있습니다. Anchor-set의 개수를 적절히 정하기 위해 metric space와 관련된 수학 이론을 활용했습니다. 위와 같은 요약을 기반으로 만들어진 P-GNN은 다양한 dataset과 task에 대하여 GNN에 비해 더 좋은 성능을 보였습니다. 특히 position 정보가 이미 dataset에 포함되어 있거나 node 자체가 identifier를 가지고 있는 transductive learning에서는 GNN에 비해 그렇게 큰 성능 개선을 보이지 않았음을 보였습니다. 제안된 framework이 position-aware 하다는 것을 주장하려는 논조로 보였습니다. Position을 지정하기 위해 '최단 경로'를 쓰는 것이 모든 graph에 유효할지에 대한 의문이 있습니다. 그리고 isomorphic network 이외의 case에는 범용성있게 쓰일 수 있는 framework인가에 대한 생각이 듭니다. Anchor-set sampling, distance-weighted aggreation 연산이 추가되면서, 기존 GNN와 비교하여 계산 복잡도는 어떠한지, 모델이 어떻게 node를 분류했는지에 대해 해석하는데 어려움이 있었습니다.	산학기술관 702-1
02.28.2024	HyunGeun Lee	What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior, Behrens et al. (link, video) summary Summary & Strength 본 논문은 Edward Tolman이 개발한 개념인 "Cognitive map"에 대한 리뷰를 다루고 있습니다. Tolman은 쥐가 먹이를 찾아나가는 실험에서 한 환경에서 먹이를 찾도록 한뒤, 환경을 바뀌어 기존 경로가 막혀있을 때 쥐가 먹이를 찾아나가도록 지름길을 탄다는 것을 발견하게 됩니다. 이는 향후 신경과학자들의 여러 연구를 통해 hippocampal-entorhinal system으로 구성된 네트워크로 그 작용이 이루어진다는 것이 밝혀지고 있고, 주로 공간 내에서의 탐색에 특화된 뉴런들로 구성되어 있다는 것을 알게됩니다. 이 논문은 이러한 시스템이 공간 이동뿐만 아니라 다른 비공간적인 작업에 대해서도 '구조'와 '예측'이라는 개념으로 접근하여 설명할 수 있는 통합적인 시스템으로 볼 수 있다고 주장하며 관련 토픽에 대하여 리뷰합니다. 이를 뒷받침하기 위해, 특정 위치에서 발화하는 뉴런인 place cell이 특정 주파수의 소리를 들을 때도 반응하는 예시, 다리와 목이 늘어나는 새의 영상에 규칙적으로 반응하는 GridCell 등을 소개합니다. 또한, Medial entorhinal cortex에서는 태스크의 구조적 정보를 처리하고, lateral entorhinal cortex에서는 감각 입력을 처리하고 이를 통합하여 hippocampus에 전달하는 과정인 hippocampal remapping, factorization에 대한 개념과 연구를 소개합니다. 더불어, 구조적 추상화와 추론을 위한 중요한 요소로서 인공지능의 일반화와 유연한 행동의 성능 향상에 기여할 것이라 주장합니다.	산학기술관 702-1
02.21.2024	Seungwon Yu	GNNExplainer: Generating Explanations for Graph Neural Networks, Ying et al. (link, video) summary summary : 본 논문은 graph를 해석하기 위한 explainable AI 방식을 제안합니다. Graph의 경우 feature 의 정보와는 다르게 Node 간의 연결성이 중요하여 두가지의 정보를 모두 고려하여야 합니다. 이러한 정보를 고려하여 저자들은 sub graph 형태를 생성하여 원본 graph 형태와 Mutual Information(MI)를 비교하여 가장 높은 값을 가지는 sub graph 의 구조를 찾아갑니다. Strength : Graph 형태를 해석하기에 수월한 형태를 처음으로 제안하였으며 효율적으로 그래프 형태의 Explainable Ai 방식을 제안하였습니다. Weakness : 현재 진행하고 있는 Pose Estimation에 적용하기가 어려워 보입니다. 해당 방식은 Ground Truth 의 subgraph 를 가지고 학습되는데 현재 사용하는 Data에서는 그러한 정보를 얻기가 쉽지 않을 것 같습니다. 또한 Node classification 시 각각의 node에 대하여 모든 적용을 해야하여 연산량의 증가가 필수적입니다.	산학기술관 702-1
02.07.2024	Wooyul Jung	Graph Neural Networks with Learnable Structural and Positional Representations, Dwivedi et al. (link, video) summary Summary : 이 논문은 structural과 positionial representations를 따로 학습하고자 하는 아이디어를 제시합니다. 제시하는 architecture인 LSPE는 random walk feature를 Positional Encoding initialization 으로 사용하여 high-order position information을 가집니다. 본 구조는 어떤 sparse GNNs이나 transformer GNNs에서도 적용이 가능하며 성능을 향상시킵니다. Strength : 기존의 positional representation과는 다르게 structure와 따로 학습을 시켜 더 정확하고 추론에 도움이 되는 PE를 생성할 수 있습니다. Laplacian PE와는 다르게 Random Walk PE는 1-WL test를 통과하며 sign shifting 현상을 가지지 않습니다. Weakness : positional encoding에 대한 사전 조사나 비교 실험이 부족합니다. PE를 학습하고자 하는 아이디어에 대한 근거를 가지지 않습니다. Question : 과연 positional encoding으로써 효과가 있는 것인지 아니면 multihead적 접근으로써의 연산이 효과가 있는 것일지 궁금합니다. 또한 RWPE 대신에 random initialization을 한다면 결과가 어떻게 달리지는지 실험해보고 싶습니다. positional representation을 학습하는 것의 의미를 파악하고자 훈련을 마치고 난 후의 positional feature을 RWPE initial 값와 비교하여 보고 싶습니다.	산학기술관 702-1
01.31.2024	Ung Hwang	Identity-aware Graph Neural Networks, You et al. (link, video) summary Summary : ID-GNN (Identity-aware Graph Neural Network)는 기존 그래프 신경망(GNN)의 표현력 한계를 극복하기 위해 제안된 모델입니다. 기존 GNN 모델들이 그래프의 구조적 정보들을 주로 사용하여 노드의 특징을 업데이트하기 때문에, 이러한 모델들은 서로 다른 그래프 구조에서 동일한 이웃 구성을 가진 노드들을 구별하는 데 어려움을 겪습니다. ID-GNN은 각 노드의 고유 신원(identity) 정보를 ego network의 구성들로 활용해 학습하는 방식으로 모델에 통합함으로써 이러한 문제를 해결하고자 했습니다. Strength : 향상된 표현력: ID-GNN은 ego-network를 통합 노드의 고유 신원 정보를 활용함으로써, 기존 GNN 모델들이 구별하지 못하는 노드들을 효과적으로 구별할 수 있습니다. 이는 모델의 표현력을 크게 향상시킵니다. 유연성: ID-GNN은 다양한 유형의 그래프 데이터에 적용될 수 있는 유연성을 제공합니다. 이는 신원 정보의 통합 방식이 다양한 그래프 구조와 호환될 수 있기 때문입니다. 이론적 근거: ID-GNN의 설계는 그래프의 동형이 아님(isomorphism)을 판별할 수 있는 이론적 능력에 근거하고 있어, 모델의 설계 및 분석에 강력한 이론적 근거를 제공합니다. Weakness : 계산 복잡도: 각 노드에 대해 1부터 K까지의 길이를 가진 사이클의 개수를 계산하고, 이를 노드의 특징 벡터에 추가하는 간단한 방식인 IDGNN-FAST 방법을 추가로 제안했지만 ID-GNN 본 내용 자체는 기존 방식 대비 4배가량의 연산속도를 가집니다. 이는 노드의 신원 정보를 모델에 통합하기 위해 추가적인 계산을 수행해야 한다는 것을 의미하며 본 논문을 직접 적용하고자 하는데 있어 단점이 됩니다. Question : 실험결과에서 ID-GNN은 대개의 경우 기존의 알고리듬의 성능을 상회하는 결과를 보였지만, 약간의 경우에서는 더 낮은 결과를 보이기도 했습니다. ID-GNN이 신원 정보를 보다 이전의 방식에 비해 더 정확하게 제공한다면, 이러한 결과는 어떻게 나오는 것일까요?	산학기술관 702-1
01.24.2024	Miseon Park	Multivariate Time-series Anomaly Detection via Graph Attention Network, Zhao et al. (link, video) summary 본 논문에서는 다변량 시계열 데이터의 이상탐지를 위해 self-supervised 모델인 Multivariate Time series Anomaly Detection via Graph Attention Network(MTAD-GAT)를 제안합니다. 이 모델은 normalization과 spectral residual을 이용하여 data preprocessing을 진행합니다. 이후 두 개의 병렬 graph attention network를 이용하여 feature dependency와 time dependency를 잘 파악합니다. 또한 이 모델은 forecasting based model과 reconstruction based model을 모두 사용함으로써, 한 가지 방법만을 사용한 기존의 모델들에 비해 높은 탐지율을 보여줍니다. 두 모델을 사용해서 계산한 inference score는 peaks over threshold(POT)를 이용하여 이상 또는 정상으로 구분되게 됩니다.	산학기술관 702-1
01.17.2024	YeongHak Jeong	Variational Graph Auto-Encoders, Kipf et al. (link, video) summary 본 논문에서는 Graph Distribution을 어떻게 Variational Auto Encoder로 학습할 것인가에 대한 아이디어를 설명하고 있습니다. Graph를 Generate하는 함수를 정확하게 알기 어렵기 때문에, 이를 잘 근사할 수 있는 함수를 이용하여 추론하는 것이 목적입니다. 그리고 Generative 함수를 만들기 위해, Graph의 Distribution을 low dimension의 latent space에 저장할 필요가 있는데, 이 때 Graph를 Gaussian의 parameter (mean, variance)로 저장한다는 점에도 주목할 필요가 있습니다. Link prediction task를 수행한 결과, Cora, Citesser에 대해 Variational Inference 한 경우, 그리고 Cora, Citeseer, Pubmed에 대해 node feature를 함께 학습시킨 경우애 더 좋은 성능을 보인다는 결과를 도출했습니다. 더 나은 모델 생성과 학습을 위해, 기존의 Gaussian보다 better suit한 prior distribution을 찾아보아야 할 필요가 있다고 주장하고 있습니다.	산학기술관 702-1
01.10.2024	HyunGeun Lee	AMAG: Additive, Multiplicative and Adaptive Graph Neural Network For Forecasting Neuron Activity, Li et al. (link, video) summary Summary & Strengths: 이 논문은 신경 반응에 대한 forecasting task를 통해 Neural population dynamics를 모델링하는 것을 다룹니다. 저자들은 해당 태스크에서 adjacency matrix를 random initialization할 경우 학습과정이 불안정해진다고 주장하며, correlation matrix의 형태의 prior를 도입하며 성능을 향상했습니다. 또한 Additive, Modulator Message Passing과 더불어 샘플별로 적응할 수 있는 Adaptor를 도입한 AMAG라는 그래프 신경망을 개발하여 Synthetic data와 Monkey data에서의 높은 성능을 보임을 서술하였습니다. Weaknesses: Correlation matrix를 도입함으로써 성능이 향상된다는 것을 empirical analysis를 통해 보였는데 이에 대한 이론적인 배경이 추가된다면 더욱 좋을 것 같다는 생각이 들었습니다. [1]번 논문에 서술된 바와 같이 Neurons의 반응 패턴에 correlation이 존재한다는 것은 direct connection을 의미하기도 하지만, Correlation이 높은 두 뉴런에 동일한 input이 들어오는 것을 의미하기도 합니다. 즉, 이러한 ‘explain away’ correlation이 존재하는 뉴런들 사이의 연결관계 또한 AMAG 모델이 설명할 수 있는 것인지 의문입니다. [1] Systematic errors in connectivity inferred from activity in strongly recurrent networks	산학기술관 702-1
01.03.2024	Wooyul Jung	Neural Relational Inference for Interacting Systems, Kipf et al. (link, video) summary 본 논문은 Interacting system에서 dynamic model을 훈련하며 relational structure를 동시에 추론하는 방법인 NRI를 제안하였습니다. Neural relational inference (NRI) model은 discrete latent graph를 통해 GNN으로 dynamics를 학습하고 추론된 edge type으로 interaction을 분류하게 됩니다. 본 논문에서는 Physics simulation task를 집중적으로 다뤘습니다. 다른 연구와는 달리 explicit interaction structure를 사용하여 dynamic model의 interaction을 학습합니다. Latent graph structure를 추론하기 위해 VAE를 사용하였고, 물리현상에 대해서는 Markovian assumption를 적용하여 dymanic을 예측한다는 특징을 가집니다. Real-world의 많은 예시에서 interaction이 dynamic하나 본 모델은 static interaction graph만을 이해할 수 있다는 한계점을 가집니다.	산학기술관 702-1
12.27.2023	Seungwon Yu	Graph Transformer Networks, Yun et al. (link, video) summary summary: 본 논문은 기존의 대부분의 GNN model 이 graph를 fixed & homogenous graph로 생각하고 해석하는데 한계점을 제시하며 다양한 edge와 node type을 가진 heterogenous graph 로 해석하는 방법을 제시합니다. 또한 기존의 heterogeneous graph 해석 논문에서 제시하는 고정된 meta-path 방법의 문제점을 제시하고 meta-path 자체를 추정하는 Network를 제안합니다. Adjacency matrix 의 곱으로 연결되지 않은 node 끼리의 meta-path 생성이 가능하다는 점을 이용하여 1D-conv 를 사용하여 Selected Adjacency Matrices 를 생성하게 됩니다. 해당 1D-conv는 softmax 함수를 적용하여 meta-path 생성 시 adjancy matrix에 대한 attention score 역할을 하게 됩니다. 이후 생성된 meta-path 들에 전부 GCN, GAT 를 적용하고 해당 정보를 concat 하여 최종 downstream task를 진행합니다. strength : 본 논문은 기존의 사람의 지식에 기반하여 pre-define 된 meta-path 가 실제 gnn network 가 참조하는 metapath와 차이가 있다는 점을 밝혀내고 그래프의 정보에 따라 meta-path를 fit하게 생성이 가능합니다 또한 전체 알고리즘이 아닌 meta-path 생성 알고리즘을 제안하여 해당 meta-path를 기반으로 기존의 고성능 GNN 알고리즘을 함께 사용할 수 있습니다 weakness : 생성된 metapath 전체에 대해서 GNN 알고리즘을 적용해야 하기 때문에 연산량이 높아지며 해당 논문에서도 2개의 Selected Adjacency Matrices 에 대한 결과만 보여주고 있습니다.	산학기술관 702-1
12.2023		ML4H & NeurIPS Conference	New Orleans
11.30.2023	HyunGeun Lee	Modular meta-learning, Alet et al. (link, video) summary 본 논문에서는 다양한 방식으로 결합할 수 있는 신경망 모듈 세트를 학습하기 위한 전략을 소개합니다. 서로 다른 관련 태스크에 대한 훈련을 통해 이러한 모듈을 재사용 및 결합하여 새로운 태스크에 적용하여 해결할 수 있으며, 이를 통해 더 나은 예측 성능을 얻을 수 있음을 확인할 수 있습니다. 저자들은 두 가지 로봇공학 관련 문제에서 접근 방식의 효과를 입증하였습니다.	산학기술관 702-1
11.23.2023	YeongHak Jeong	Clone-structured graph representations enable flexible learning and vicarious evaluation of cognitive maps, George et al. (link, video) summary 본 논문은, human brain이 spatial & conceptual 정보를 어떻게 cognition 할 수 있을까에 대한 motivation에서 시작되었습니다. 여기서 이 cognition을 표현하기 위해 'Cognitive map'을 만들고자 하는데, 사람이 실제로 cognition을 할 때는 어떠한 관찰이 다양한 맥락에서 이루어지는 aliased 특성을 가지기 때문에 - 이 현상을 모두 대표할 수 있는 model을 만드는 것은 어렵고, 수많은 시도 또한 이루어져 왔습니다. 이 논문에서는 Cloned-structured cognitive graph (CSCG)를 제안하여, 기존의 이론을 보완할 수 있는 cognitive map 표현 방식을 제안했습니다. 여기서 표현 방식이란, cognitive map을 cloned hidden markov model에 기반한 probablistic sequential model로 만들고자 한다는 것입니다. cloned hidden markov model에, action을 augmented 한 것을 CSCG로 정의하였고, CSCG를 구성하는 parameters를 optimize하기 위한 이론으로 Expecation-Maximization (EM) Algorithm을 제시했습니다. CSCG는 관찰(observation)과 관련된 맥락(context)을 latent space에 저장할 수 있고, 이러한 관찰과 맥락의 연결성이 Graph의 형태를 띕니다. CSCG을 python으로 구현하여, 기존의 cognition과 관련된 neuroscience의 연구 결과를 이를 이용하여 재현하였고, CSCG가 neuroscience에서 밝혀낸 결과를 잘 설명하고 있다고 주장하고 있습니다.	산학기술관 702-1
11.21.2023	Miseon Park	An Efficient Graph Convolutional Network Technique for the Traveling Salesman Problem, Joshi et al. (link, video) summary 본 논문은 TSP 문제 해결을 위해 새로운 non-autoregressive learning-based 방법을 제안합니다. 먼저, Graph Convolutional Network를 사용하여 그래프의 표현을 heat map으로 나타냅니다. 그 다음 만들어진 heat map을 Beam Search를 통해 non-autoregressive하게 경로를 찾아냅니다. 이러한 방법은 고정된 크기의 그래프인 경우 해결 품질, 추론 속도, 샘플 효율성 측면에서 기존의 autoregressive deep learning model을 능가했습니다. 그러나, 표준 Operation Research solver에는 미치지 못했습니다.	산학기술관 702-1
11.16.2023	Seungwon Yu	Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition, Yan et al. (link, video) summary summary : 본 논문은 graph-based 로 video action recognition을 진행하는 방법을 제안합니다. RGB 형태의 video를 openpose 알고리즘을 사용하여 2D skeleton으로 변환한 후 skeleton의 형태를 graph로 해석하여 각 feature 간의 update를 진행합니다. Update 시 1d convolution을 사용하여 다른 timestamp의 같은 joint 의 정보를 message-passing 한 후 spatial data의 정보를 gcn을 통하여 update 하는 방식을 사용합니다. strength : 본 논문에서 저자들은 2D skeleton 및 3D skeleton data를 전부 사용하였는데 이는 2D, 3D data의 입력 pixel 위치만을 참고하여 해석을 진행하기 때문에 가능하며 2D, 3D 형태를 전부 적용이 가능하다는 장점을 가집니다. 실제 실험 시 CNN Based 방법보다 연산량이 월등하게 작아 학습 및 추론 속도에서의 강점을 가집니다. weakness : 전체 timestamp를 참조하여 action recognition을 진행하기 때문에 여러 영상이 함께 나오는 task에서는 사용하기 힘들 수 있습니다. sliding window 방식을 접목하면 해당 문제점을 해결할 수 있을수도 있습니다.	산학기술관 702-1
11.09.2023	HyunGeun Lee	Graph Element Networks: adaptive, structured computation and memory, Alet et al. (link, video) summary Graph Element Networks는 partial differential equation의 solver 중 하나인 Finite element method에서 아이디어를 얻은 모델로서, function은 locally linear하다는 inductive bias를 가지고 있습니다. 저자는 이러한 특성을 바탕으로 function to function mapping task에 GENs 모델을 적용하였습니다. GENs 모델은 기존의 그래프의 데이터 포인트를 직접적으로 GNN의 node 혹은 edge entities에 상응시켜서 사용하는 것이 아닌 metric space 상에 정의된 data points를 무게중심의 일반화된 방법론과 softmax를 활용한 attention과 흡사한 방식을 통해 grid라는 일종의 template graph의 initial node representation으로 맵핑을 합니다. 이러한 특성 덕분에 데이터가 그래프 set에 국한되는 것이 아니라 metric space 상에 자유롭게 위치할 수 있게 되고, template grid graph의 density를 조정함으로써 computaional cost 와 accuracy 사이의 trade-off에 있어 유연성을 가진다고 볼 수 있습니다. 이 논문에 대하여 한 가지 의문이 드는 점은 GENs의 output이 grid graph의 위치에 대한 미분이 가능하여 template graph의 topology를 조정할 수 있다는 부분입니다. 이러한 topology는 철저히 data-driven으로 학습이 되는 것이기 때문에 특정 영역으로 데이터 쏠림이 발생한다면 template graph의 노드 또한 그 쪽으로 모이는 현상이 발생하게 됩니다. 이렇게 되면 추후 extrapolation test 혹은 OOD generalization에 있어서 이 현상이 단점으로 부각될 수도 있지 않을까하는 우려가 있습니다. 이에 대한 검증을 위한 추가 실험이 있었다면 더욱 좋았을 것 같습니다.	산학기술관 702-1
11.02.2023	Wooyul Jung	Principal Neighbourhood Aggregation for Graph Nets, Corso et al. (link, video) summary 위 논문은 'How powerful are Graph Neural Networks?'애 이어서 그래프의 isomorphism problem을 해결하고자 하는 시도입니다. discrete feature에서는 sum aggregator가 유용하다는 'How powerful are Graph Neural Networks?'의 주장과는 달리 본 논문은 continuous feature에 대해서는 multiple aggregators가 필요하다고 말합니다. 이 주장에 대한 근거로 여러 수학적 접근으로 엄밀성을 갖추었으며 real world bentchmarks에서 기존 방법론보다 훌륭한 결과를 보임을 밝힙니다.	산학기술관 702-1
10.12.2023	Woong Hwang	How Powerful are Graph Neural Networks?, Xu et al. (link, video) summary 저자들은 Graph Neural Network의 aggregation mechanism의 expressive power에 대한 연구를 진행하였습니다. 그리고 이를 Graph isomorphism test를 위한 1-WL(Weisfeiler-Lehman) test와 연결지어 설명하며, 1-hop neighbourhood의 information을 aggregation하는 spatial GNN은 이 1-WL test보다 더 좋은 성능을 낼 수 없다는 것을 보입니다. 또한, discrete한 feature를 가지는 그래프의 message passing과정을 tree형태로 inject하여 해석하면서 non-isomorphic 그래프 쌍의 aggregation에 대한 구분에 있어 실패하는 사례를 보여줍니다. 저자들은 이러한 한계점을 극복하기 위해 Sum aggregation의 중요성을 언급하며, Expressive power를 극대화한 GIN 아키텍쳐를 제안하며 graph-level task에서 우수한 성능을 보여주었습니다.	산학기술관 702-1
10.05.2023	HyunGeun Lee	Neural Message Passing for Quantum Chemistry, Gilmer et al. (link, video) summary 본 논문은 기존의 다양한 GNN architectures를 하나의 framework로 통합했다는 것에 의미가 큰 논문입니다. 그 framework는 Message Passing Neural Network framework라고 부르며, 이는 총 3가지 단계, Message passing, Update, 그리고 Readout로 구성되어 있습니다. 먼저, Message Passing 단계에서는 인접 노드의 정보를 계산하고 그 정보를 sum, mean, max, min 등의 연산을 통해 aggregation합니다. 이 때, aggregation된 정보를 message라고 합니다. 다음으로, 업데이트 단계에서 업데이트 함수 $U_t$는 target node와 메시지의 hidden state를 입력으로 받아 target node의 hidden state를 업데이트합니다. 마지막으로, readout 단계에서는 전체 그래프에 대한 representation vector를 계산합니다. 저자들은 이 Framework를 통해 MPNN Variant를 개발하여 QM9 데이터셋에 대하여 Chemical Accuracy를 달성하였으며, 이 결과에 대한 더 자세한 내용은 논문에 기술되어 있습니다.	산학기술관 702-1
9.21.2023	Seungwon Yu	Graph Attention Networks, Veličković et al. (link, video) summary Graph attention networks(GAT)는 2018년도 ICLR에서 발표된 논문입니다. GAT는 마스킹된 self-attention layer를 활용하여 기존의 spectral graph convolution의 단점을 해결했다고 합니다. 인접한 노드에 정규화된 attention coefficient를 곱하여 합산하는 aggregation 방법론을 사용함으로써, 행렬 연산 및 그래프 구조에 의존하지 않는 방법론을 제안합니다. GAT는 그래프의 구조에 영향을 받지 않기 때문에 기존의 transductive task 뿐만 아니라 inductive task에서도 좋은 성능을 보여준다고 합니다. 이를 검증하기 위해 저자는 transductive task 벤치마킹을 위해 Cora, Citeseer 그리고 Pubmed 데이터셋을 사용하였습니다. 또한, inductive task에 대한 벤치마킹을 위해 PPI(Protein-protein interaction) 데이터셋을 사용하여 GAT의 SOTA 성능을 검증하였습니다. 이번 미팅을 통해 GAT에서 사용한 self-attention layer의 수식에 대해 더 깊은 이해를 하였고, GNN 도메인에서 중요하게 여겨지는 방법론에 대한 기초를 다지는 시간을 가졌습니다.	산학기술관 702-1
9.14.2023	Wooyul Jung	Semi-Supervised Classification with Graph Convolutional Networks, Kipf & Welling (link, video) summary 이번 논문은 spectral graph convolution의 first-order approximation으로 유도되는 graph convolutional network를 제안합니다. 그동안의 graph based semi-supervised learning에서 graph의 smoothness만 고려한 것과는 달리 위 모델은 edge weight과 node간의 전달을 통해 고차원적인 정보를 다룰 수 있습니다. 다양한 citation network datasets에서 실험한 결과 다른 모델들에 비해 모두 뛰어난 성능을 보였습니다. 발표에서는 수식의 이해를 돕기 위해 graph laplacian과 graph fourier transform에 대한 설명을 더하였으며 마지막에는 transformer와 GCN의 유사성을 다뤘습니다.	산학기술관 702-1
7.19.2023	Hyungeun Lee	Variance-invariance-covariance regularization for self-supervised learning, Bardes et al. (link, video) summary	산학기술관 702-1
7.19.2023	Hyungeun Lee	Unsupervised learning of visual features by contrasting cluster assignments., Caron et al. (link, video) summary	산학기술관 702-1
7.19.2023	Seungwon Yu	Emerging Properties in Self-Supervised Vision Transformers, Caron et al. (link, video) summary	산학기술관 702-1
7.19.2023	Seungwon Yu	Exploring Simple Siamese Representation Learning, Chen et al. (link, video) summary	산학기술관 702-1
7.19.2023	Seungwon Yu	Bootstrap your own latent: A new approach to self-supervised Learning, Grill et al. (link, video) summary	산학기술관 702-1
7.19.2023	Woong Hwang	With a little help from my friends: Nearest-neighbor contrastive learning of visual representations, Dwibedi et al. (link, video) summary	산학기술관 702-1
7.19.2023	Woong Hwang	A Simple Framework for Contrastive Learning of Visual Representations, Chen et al. (link, video) summary	산학기술관 702-1
10.15.2021	Taehoon Park	Learning Neural Causal Models from Unknown Interventions, Ke et al. (link, video) summary 이번 논문은 system을 contional probability로 modeling 하는 경우에서 graph sturcture recovery에 관한 내용입니다. 사용된 graph의 structure는 DAG(Directed Acyclic Graph)입니다. Modeling을 할 때 기존의 system에서 관측가능한 data인 observational data 뿐만 아니라, system에서 일어날 것 같지 않은 condition의 값을 할당한 data인 interventional data까지 training 단계에서 사용할 경우 sturcture recovery가 더 좋은 결과를 낸다는 것이 주요내용입니다. 또한 모든 실험에서 Intervention을 apply할 때 sparse하게 하나의 node에만 intervention을 사용했습니다. Intervention을 사용할 경우 논문의 모든 dataset(synthetic/real)에서 best performance를 보였으며, 어떠한 node에서 intervention이 일어났는 지 모르는 경우, 이를 inference하는 방법도 제시하였으며 괜찮은 성능을 가집니다. Intervention의 경우 data의 perturbation과 동일한 역할을 하는것으로 보여집니다.	Zoom meetings
10.22.2021	Donghee Kang	Contrastive Self-supervised Learning for Graph Classification, Zeng et al. (link, video) summary contrastive self supervised learning 방법을 이용해 graph classification이 발생할 수 있는 overfitting을 예방하려는 방법을 제안하였습니다. augmentation된 그래프들이 같은 그래프에서 나왓는지 아닌지를 prediction하는 모델을 학습한 후 fine-tuning을 통해 classification을 수행하는 방법과 prediction과 classification을 동시에 수행 하며 regularization을 동시에 수행하는 두가지 방법이 있습니다. 두 방법 모두 단순히 classification만을 수행하는 모델보다 좋은 성능을 보였고 overfitting문제를 더 잘 해결 할 수 있었습니다.	Zoom meetings
10.29.2021	Hyunmok Park	Scaling Graph-based Deep Learning models to larger networks, Ferriol-Galmés et al. (link, video) summary Graph Neural Networks (GNN) have shown a strong potential to be integrated into commercial products for network control and management. However, the Graph Neural Networking challenge 2021 brings a practical limitation of existing GNN-based solutions for networking: the lack of generalization to larger networks. The goal of this challenge is to create a scalable Network Digital Twin (i.e., a network model) based on neural networks, which can accurately estimate QoS performance metrics given a network state snapshot. The proposed GNN-Based solution pursues two main objectives: (1) Finding a good representation for the network components supported by the model. (2) Exploit scale-independent features of networks. There are two main properties for networks become larger: (i) higher link capacities and (ii) longer paths. By defining link capacities as the product of a virtual reference link capacity and a scale factor, GNN can learn to make accurate estimates on any combination of these two features. By defining Lmax as a configurable parameter and split flows exceeding Lmax into different queue-link sequences, the model can independently learn each representation. The model was trained on graphs size of \|V\| = 25 ~ 300 and tested with graphs size of \|V\| = 50 ~ 300. To generate a wide set of topologies of variable size, all graph topologies have been artificially generate using Power-Law Out-Degree Algorithm. By using grid search data augmentation, the model was trained to cover a broad combination of virtual reference link capacity and a scale factor. As a result, the model obtains better accuracy in topologies seen during the training phase (50 to 99 nodes), achieving an average error of 4.5%. As the topology size increases, the average error stabilizes to ≈10%. This paper have presented a novel GNN-based model that addresses a main limitation of existing ML-based models applied to networks: generalizing accurately to considerably larger networks.	Zoom meetings
11.05.2021	Juhyeon Kim	Connecting the dots: Multivariate time series forecasting with graph neural networks, Wu et al. (link, video) summary Modeling multivariate time series has long been a subject that has attracted researchers from a diverse range of fields. Until now, modeling multivariate time series forecasting they assume that each variable depends on its historical values. But in real world data each variable depends not only on its historical values but also on other variables. And existing methods fail to fully exploit latent spatial dependencies between pairs of variables. This paper propose “MTGNN” that can simultaneously learn the graph structure and the GNN for time series in an end-to-end framework. The Main frameworks of “MTGNN” are 1. Graph Learning Layer - Extract sparse graph adjacency matrix 2. Graph Convolution module - Address spatial dependencies among Variables 3. Temporal Convolution module - Capture temporal patterns MTGNN achieves great performance on two traffic dataset and show that learning dependencies between variables are more capable of indicating extreme traffic conditions of the central node in advance.	Zoom meetings
11.12.2021		AISTATS Review
11.19.2021	Haesung Pyeon	Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks, Csordás et al. (link, video) summary Neural Networks whose subnetworks implement reusable functions are expected to offer numerous advantages. Modularity provides a way of achieving compositionality which is essential for systemic generalization. This paper defines functional modules whether modules implementing specific functionality emerge. By training probabilistic, binary but differentiable masks for all weight, the result shows the module necessary to perform target function. This paper investigates whether NN uses different modules for very different functions and reuses the same modules for the same functions. Many typical NNs are good at specializing modules well but not reusing modules for identical functions. FNN and LSTM share weights depending more on the location of the inputs/outputs than similarity of the performed operations. CNN mostly depend on class-exclusive, non-shared weights in feature detection. Reusing property does not emerge naturally and the same functionality is re-learned (redundant and potentially harmful). There are several potential explanations about poor generalization: NN find it hard to learn to represent data similarly along different information routes that can in principle be processed by a single module. NN might not have learned the correct algorithm to solve some problems.	Zoom meetings
11.26.2021	Taehoon Park	Perturbation theory approach to study the latent space degeneracy of Variational Autoencoders, Andrés-Terré & Lió (link, video) summary VAEs provide a lower dimensional representation of the data, in the form of a set of generative functions capable to reproduce and reconstruct the original input. Inspired by perturbation analysis in quantum physics, we have developed an approach to unveil structures and the energy spectrum encoded in the data by using the generative functions extracted from a VAE Applying a certain perturbed Hamiltonian over the generated function. For implementation, encoder returns the function that returns unique solution for the system. To generate different energy level, perturbation approach linearly combines unperturbed energy and Hamiltonian. The embeddings generated from the fully trained VAE provide a space where the clusters are clearly seperable. This work is very beginning of the line of study, and further studies will promising for understanding dynamic system with new approach.	Zoom meetings
12.03.2021	Donghee Kang	Towards Efficient Point Cloud Graph Neural Netwokrs Through Architectural Simplification, Tailor et al. (link, video) summary This papers suggests a efficient method to learn point cloud data that reduces latency and memory consumption. Point cloud architectures incorporate geometric priors in the first layer but as it goes deeper geometric priors are lost. So this paper suggests featrue extractor at first layer with several geometric priors. (target postion, source position, relative postion, euclidean distance) By using feature extractor and simplifed DGCNN, we could get good performance in less time and memory.	Zoom meetings