소식 게시판목록 | 성균관대학교 인공지능대학원

본문 바로가기
주메뉴 바로가기
서브메뉴 바로가기

커뮤니티
뉴스

커뮤니티

뉴스

이지형 교수, 이은호 교수, AI 딥러닝 이용한 자동차 금형 CAD 설계도면 자동 검도기술 개발

이지형 교수, 이은호 교수, AI 딥러닝 이용한 자동차 금형 CAD 설계도면 자동 검도기술 개발
- 작성일 2024-01-12
- 조회수 0
허재필 교수 연구실, AAAI 2024 논문 4편 게재 승인

비주얼컴퓨팅연구실 (지도교수: 허재필)의 논문 4편이 인공지능 분야의 우수 학술대회인 AAAI Conference on Artificial Intelligence 2024 (AAAI-24)에 게재 승인되었습니다. 논문 #1: “Towards Squeezing-Averse Virtual Try-On via Sequential Deformation” (인공지능학과 박사과정 심상헌, 인공지능학과 석사과정 정지우) 논문 #2: "Noise-free Optimization in Early Training Steps for Image Super-Resolution" (인공지능학과 박사과정 이민규) 논문 #3: “VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting” (인공지능학과 석사과정 강승구, 인공지능학과 박사과정 문원준, 인공지능학과 석사졸업 김의연) 논문 #4: “Task-disruptive Background Suppression for Few-Shot Segmentation” (소프트웨어학과/기계공학부 학사과정 박수호, 인공지능학과 박사과정 이수빈, 인공지능학과 박사과정 현상익, 인공지능학과 박사과정 성현석) Towards Squeezing-Averse Virtual Try-On via Sequential Deformation" 논문에서는 고해상도 가상시착 영상생성 분야에서의 시각적 품질 저하 문제를 다루고 있습니다. 구체적으로, 그림 1(a)의 위쪽 행에서 볼 수 있듯이, 소매 부분에서 옷의 텍스쳐가 압착되는 문제가 있었습니다. 이 문제의 주요 원인은 해당 분야에서 필수적으로 사용되는 두 가지 손실 함수인 TV (Total Variation loss) 손실과 적대적 손실 (adversarial loss) 사이의 기울기 충돌 때문입니다. TV 손실은 와핑된 옷 마스크에서 소매와 몸통 사이의 경계를 분리하는 것을 목표로 하는 반면, 적대적 손실은 둘 사이의 결합을 목표로 합니다. 이러한 반대되는 목표는 잘못된 기울기를 계단식 외관 흐름 추정(Cascaded appearance flow estimation)으로 피드백하여 소매 압착 아티팩트를 발생시킵니다. 이를 해결하기 위해, 해당 논문에서는 네트워크의 레이어 간 연결의 관점으로 접근하였습니다. 구체적으로, 기존 계단식 외관 흐름 추정이 잔류 연결 (residual connection) 구조로 연결되어 적대적 손실 함수의 영향을 많이 받기 때문에 소매 압착이 발생한다고 진단하였고, 이를 줄이기 위해 계단식 외관 흐름 간의 순차적 연결 (sequential connection) 구조를 네트워크의 마지막 레이어에 도입하였습니다. 한편, 그림 1(a)의 아래쪽 행은 허리 주변의 다른 유형의 압착 아티팩트를 보여줍니다. 이를 해결하기 위해, 본 연구에서는 옷을 와핑할 때, 우선 내어 입는 스타일 (tucked-out shirts style)로 와핑한 후, 초기 와핑 결과에서 텍스쳐를 부분적으로 삭제할 것을 제안하고 이를 위한 연산을 구현하였습니다. 제안된 기술은 두 유형의 아티팩트를 성공적으로 해결하는 것을 확인하였습니다. “Noise-free Optimization in Early Training Steps for Image Super-Resolution” 논문에서는 이미지 초해상화 문제에서의 기존 학습 방법론과 지식 전이(Knowledge Distillation)의 한계점을 다루고 있습니다. 구체적으로, 하나의 고해상도 이미지를 두 개의 핵심 요소인 최적 평균(optimal centroid)과 잠재 노이즈(inherent noise)로 분리 및 분석했습니다. 이를 통해, 학습 데이터의 잠재 노이즈가 초반 학습의 불안정성을 유도하는 것을 확인했습니다. 해당 문제를 해결하기 위해, Mixup 기술과 기학습된 네트워크를 활용하여 학습 과정에서 잠재 노이즈를 제거하여 보다 안정적인 학습 기술을 제안했습니다. 제안된 기술은 Fidelity-oriented single image super-resolution 분야에서 여러 모델에 걸쳐 일관된 성능 향상을 가져오는것을 확인했습니다. "VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting" 논문에서는 이미지에서 텍스트로 지정된 객체의 개수를 세는 문제를 다루고 있습니다. 해당 논문은 선행 연구의 two-stage 방법은 방대한 연산량과 에러 전파의 가능성이라는 문제를 제기하였습니다. 앞선 문제의 해결을 위해 one-stage baseline인 VLBase와 세 주요 기술로 확장된 VLCounter를 제안합니다. 첫째로, 기학습된 거대 모델인 CLIP을 재학습하는 대신 Visual Prompt Tuning(VPT)을 도입하였습니다. 추가로, VPT의 학습 가능한 토큰에 텍스트 정보를 추가하여 해당하는 개체가 강조된 이미지 피쳐를 얻게 합니다. 둘째로, 객체 영역의 전체가 아닌 중요한 부분만을 강조하는 유사도 맵을 얻기 위해 미세 조정이 이루어졌습니다. 이로써 모델은 객체 중심의 활성화를 높일 수 있습니다. 셋째로, 모델의 일반화 능력 향상과 정확한 객체 위치 파악을 위해 이미지 인코더 피쳐를 디코딩에 통합하고 앞선 유사도 맵을 피쳐에 곱하여 객체 영역에 집중합니다. 제안된 기술은 기존 방법의 성능을 크게 상회할 뿐만 아니라, 가벼운 모델로 학습 및 추론 속도를 2배 향상시켰습니다. “Task-disruptive Background Suppression for Few-shot Segmentation” 논문에서는 적은 수의 이미지(Support)와 마스크를 참고하여 새로운 이미지(Query)안의 물체를 찾아내는 few-shot segmentation문제에서 Support의 배경을 효율적으로 다루기 위한 방법을 다루고 있습니다. 기존 모델에서는 segmentation을 하기 위해서 Support와 Query를 비교하는데, 각각의 배경을 비교할 경우 다음과 같은 문제점이 있습니다. 첫번째로, Support와 Query의 배경이 많이 다를 경우 이는 모델이 segmentation을 하는데 방해가 될 수 있습니다. 두번째로, Support의 배경에 segmentation하고자 하는 물체와 비슷한 물체가 있는 경우도 방해가 될 수 있습니다. 따라서 본 논문은 방해가 될 수 있는 이 두 가지 배경의 요소를 Query-relevant score와 Target-relevant score를 통해 제거하였습니다. 따라서 결과적으로 Query의 배경과 관련된 Support의 배경만 남도록 하여 Support의 배경을 더욱 효율적으로 참고하도록 하였습니다. 제안된 방법은 여러 Few-shot Segmentation 모델에서 성능 향상이 있는 것을 확인했습니다. [논문 #1 정보] Towards Squeezing-Averse Virtual Try-On via Sequential Deformation Sang-Heon Shim, Jiwoo Chung, and Jae-Pil Heo Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI), 2024 Abstract: In this paper, we first investigate a visual quality degradation problem observed in recent high-resolution virtual try-on approach. The tendency is empirically found that the textures of clothes are squeezed at the sleeve, as visualized in the upper row of Fig.1(a). A main reason for the issue arises from a gradient conflict between two popular losses, the Total Variation (TV) and adversarial losses. Specifically, the TV loss aims to disconnect boundaries between the sleeve and torso in a warped clothing mask, whereas the adversarial loss aims to combine between them. Such contrary objectives feedback the misaligned gradients to a cascaded appearance flow estimation, resulting in undesirable squeezing artifacts. To reduce this, we propose a Sequential Deformation (SD-VITON) that disentangles the appearance flow prediction layers into TV objective-dominant (TVOB) layers and a task-coexistence (TACO) layer. Specifically, we coarsely fit the clothes onto a human body via the TVOB layers, and then keep on refining via the TACO layer. In addition, the bottom row of Fig.1(a) shows a different type of squeezing artifacts around the waist. To address it, we further propose that we first warp the clothes into a tucked-out shirts style, and then partially erase the texture from the warped clothes without hurting the smoothness of the appearance flows. Experimental results show that our SD-VITON successfully resolves both types of artifacts and outperforms the baseline methods. [논문 #2 정보] Noise-free Optimization in Early Training Steps for Image Super-Resolution MinKyu Lee and Jae-Pil Heo Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI), 2024 Abstract: Recent deep-learning-based single image super-resolution (SISR) methods have shown impressive performance whereas typical methods train their networks by minimizing the pixel-wise distance with respect to a given high-resolution (HR) image. However, despite the basic training scheme being the predominant choice, its use in the context of ill-posed inverse problems has not been thoroughly investigated. In this work, we aim to provide a better comprehension of the underlying constituent by decomposing target HR images into two subcomponents: (1) the optimal centroid which is the expectation over multiple potential HR images, and (2) the inherent noise defined as the residual between the HR image and the centroid. Our findings show that the current training scheme cannot capture the ill-posed nature of SISR and becomes vulnerable to the inherent noise term, especially during early training steps. To tackle this issue, we propose a novel optimization method that can effectively remove the inherent noise term in the early steps of vanilla training by estimating the optimal centroid and directly optimizing toward the estimation. Experimental results show that the proposed method can effectively enhance the stability of vanilla training, leading to overall performance gain. [논문 #3 정보] VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting Seunggu Kang, WonJun Moon, Euiyeon Kim, and Jae-Pil Heo Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI), 2024 Abstract Zero-Shot Object Counting (ZSOC) aims to count referred instances of arbitrary classes in a query image without human-annotated exemplars. To deal with ZSOC, preceding studies proposed a two-stage pipeline: discovering exemplars and counting. However, there remains a challenge of vulnerability to error propagation of the sequentially designed two-stage process. In this work, we propose an one-stage baseline, Visual-Language Baseline (VLBase), exploring the implicit association of the semantic-patch embeddings of CLIP. Subsequently, we extend the VLBase to Visual-language Counter (VLCounter) by incorporating three modules devised to tailor VLBase for object counting. First, we introduce Semantic-conditioned Prompt Tuning (SPT) within the image encoder to acquire target-highlighted representations. Second, Learnable Affine Transformation (LAT) is employed to translate the semantic-patch similarity map to be appropriate for the counting task. Lastly, we transfer the layer-wisely encoded features to the decoder through Segment-aware Skip Connection (SaSC) to keep the generalization capability for unseen classes. Through extensive experiments on FSC147, CARPK, and PUCPR+, we demonstrate the benefits of our end-to-end framework, VLCounter. [논문 #4 정보] Task-disruptive Background Suppression for Few-Shot Segmentation Suho Park, SuBeen Lee, Sangeek Hyun, Hyun Seok Seong, and Jae-Pil Heo Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI), 2024 Abstract Few-shot segmentation aims to accurately segment novel target objects within query images using only a limited number of annotated support images. The recent works exploit support background as well as its foreground to precisely compute the dense correlations between query and support. However, they overlook the characteristics of the background that generally contains various types of objects. In this paper, we highlight this characteristic of background which can bring problematic cases as follows: (1) when the query and support backgrounds are dissimilar and (2) when objects in the support background are similar to the target object in the query. Without any consideration of the above cases, adopting the entire support background leads to a misprediction of the query foreground as background. To address this issue, we propose Task-disruptive Background Suppression (TBS), a module to suppress those disruptive support background features based on two spatial-wise scores: query-relevant and target-relevant scores. The former aims to mitigate the impact of unshared features solely existing in the support background, while the latter is to reduce the influence of target-similar support background features. Based on these two scores, we define a query background relevant score which captures the similarity between the backgrounds of the query and the support, and utilize it to scale support background features to adaptively restrict the impact of disruptive support backgrounds. Our proposed method achieves state-of-the-art performance on PASCAL and COCO datasets on 1-shot segmentation.
- 작성일 2023-12-14
- 조회수 1729
데이터 지능 및 학습 연구실(지도교수: 이종욱), SIGIR, CIKM, EMNLP 2023 논문 7편 게재

데이터 지능 및 학습(Data Intelligence and Learning, DIAL) 연구실은 세계 최고 권위 정보검색 학회인 SIGIR 2023에 총 3편의 논문이 최종 게재 승인되었으며, 지난 7월 23일 스페인 마드리드에서 논문을 발표하였습니다. 지난 10월 21일에는 세계 최고 권위 데이터마이닝 학회인 CIKM 2023에 총 2편의 논문이 최종 게재 승인되어 영국 버밍엄에서 논문을 발표하였습니다. 또한, 세계 최고 권위 자연어처리 학회인 EMNLP 2023에 총 2편의 논문이 최종 게재 승인되었으며, 오는 12월에 싱가포르에서 논문을 발표할 예정입니다. [논문 목록] 1. It’s Enough: Relaxing Diagonal Constraints in Linear Autoencoders for Recommendation (SIGIR'23) 2. uCTRL: Unbiased Contrastive Representation Learning via Alignment and Uniformity for Collaborative Filtering (SIGIR'23) 3. ConQueR: Contextualized Query Reduction using Search Logs (SIGIR'23) 4. Forgetting-aware Linear Bias for Attentive Knowledge Tracing (CIKM'23) 5. Toward a Better Understanding of Loss Functions for Collaborative Filtering (CIKM'23) 6. GLEN: Generative Retrieval via Lexical Index Learning (EMNLP'23) 7. It Ain't Over: A Multi-aspect Diverse Math Word Problem Dataset (EMNLP'23) 연구 1: Jaewan Moon, Hye-young Kim, and Jongwuk Lee, “It’s Enough: Relaxing Diagonal Constraints in Linear Autoencoders for Recommendation”, 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023 본 연구는 선형 오토인코더 (linear autoencoder) 기반 추천 시스템에서의 대각 제약조건 (diagonal constraint)에 대한 이론적 분석을 수행하고, 대각 제약조건을 완화한 relaxed linear autoencoders (RLAE)를 제안합니다. 선형 오토인코더 모델은 L2 정규화 (L2 regularization) 및 대각 성분 제거 제약조건 (zero-diagonal constraint)을 사용한 볼록 최적화 (convex optimization)를 통해 항목 간 가중치 행렬을 학습합니다. 본 논문은 선형 오토인코더 모델에서 두 가지 제약 조건의 특성을 이론적으로 이해하는 것을 목표로 합니다. 특이값 분해(singular value decomposition, SVD)와 주성분 분석(principal component analysis, PCA)을 활용한 가중치 행렬에 대한 분석을 통해 L2 정규화가 높은 순위의 주성분의 효과를 촉진한다는 사실을 밝힙니다. 반면, 대각선 성분 제거 제약조건은 순위가 낮은 주성분의 영향을 감소시켜 인기 없는 항목의 성능 저하로 이어질 수 있음을 보였습니다. 이러한 분석 결과로부터 영감을 얻어 대각선 부등식 제약 조건을 사용하는 간단하면서도 효과적인 선형 오토인코더 모델인 Relaxed Linear AutoEncoder (RLAE)와 Relaxed Denoising Linear AutoEncoder (RDLAE)를 제안합니다. 또한 대각 제약조건의 정도를 조정하는 제안 방법은 기존 선형 모델을 일반화한 형태임에 대한 증명을 제공합니다. 실험 결과는 6개의 벤치마크 데이터 세트에서 우리의 모델이 최첨단 선형 및 비선형 모델과 비슷하거나 더 우수하다는 것을 보여줍니다. 이는 대각 제약조건에 대한 이론적 통찰력을 뒷받침하며, 특히 인기도가 낮은 항목과 인기도 편향을 제거한 평가 (unbiased evaluation)에서 상당한 성능 향상이 발생됨을 확인하였습니다. 본 논문에 관한 자세한 내용을 알고 싶으시다면, 다음 주소를 참고해주세요. https://dial.skku.edu/blog/sigir2023_itsenough <그림 1: Oral session 발표 사진> <그림 2: 제안 모델 RDLAE> 연구 2: Jae-woong Lee, Seongmin Park, Mincheol Yoon, and Jongwuk Lee, “uCTRL: Unbiased Contrastive Representation Learning via Alignment and Uniformity for Collaborative Filtering”, 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR, short paper), 2023 본 연구는 추천 시스템에서 암묵적 피드백(e.g., 클릭 등)을 이용하여 학습할 때, 암묵적 피드백이 주로 인기있는 사용자와 항목에 편향되어 있어서 이로 인해 사용자와 항목의 표현(representation)이 실제 사용자와 항목의 선호도와 다르게 학습되는 문제를 해결한 논문입니다. 이 연구에서 우리는 기존의 추천에서 편향을 제거하는 연구들이 (i) 더 나은 표현 학습을 위해 널리 사용되는 대조 손실 함수(contrastive loss)를 고려하지 않는다는 것과 (ii) 편향 제거 시, 사용자와 항목을 모두 고려하지 않는다는 것을 지적하고 이를 보완하였습니다. 이 연구에서 우리는 Unbiased ConTrastive Representation Learning (uCTRL)을 제안합니다. 먼저, 기존에 추천 시스템에서 대조 손실 함수를 이용한 모델인 DirectAU 에서 영감을 받아, 대조적인 표현 학습을 정렬(alignment)과 균일성(uniformity) 두 가지 손실 함수로 나타냅니다. 정렬 함수는 사용자-항목 상호작용에 대해 사용자와 항목의 표현을 비슷하게 합니다. 균일성 함수는 각 사용자 및 항목 분포를 균등하게 나타냅니다. 우리는 정렬 함수가 사용자와 항목의 인기도에 편향되어 있다는 것을 확인하였으며, 편향을 추정한 뒤에 이를 이용하여 편향을 제거하는 방법인 IPW(inverse propensity weighting)를 이용하여 편향을 제거합니다. 추가적으로 우리는 IPW에서 사용되는 사용자와 항목을 모두 고려하여 편향을 추정하는 새로운 방법을 개발하여 하였습니다. 우리의 실험 결과는 제안된 uCTRL이 네 개의 벤치마크 데이터셋(MovieLens 1M, Gowalla, Yelp 및 Yahoo! R3)에서 최신 편향제거 모델보다 우수한 성능을 보인다는 것을 보여주었습니다. 본 논문에 관한 자세한 내용을 알고 싶으시다면, 다음 주소를 참고해주세요. https://dial.skku.edu/blog/2023_uctrl <그림 3: Poster session 발표 사진> <그림 4: 제안 방법 uCTRL 구조> 연구 3: Hye-young Kim*, Minjin Choi*, Sunkyung Lee, Eunseong Choi, Young-In Song, and Jongwuk Lee, “ConQueR: Contextualized Query Reduction using Search Logs”, 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR, short paper), 2023 본 연구는 사전학습된 언어 모델(Pre-trained language model)을 활용한 질의 축약 모델을 제안합니다. 질의 축약이란 사용자가 너무 긴 질의(검색어)를 입력하여 사용자의 의도에 맞는 적절한 결과를 얻지 못하였을 때, 질의에서 불필요한 단어를 제거하여 원하는 검색 결과를 찾는 방법을 의미합니다. 제안 모델 ConQueR는 (i)핵심 용어 추출과 (ii) 하위 질의 선택의 두 가지 관점에서 이를 해결합니다. 핵심 용어 추출 방법은 단어 수준에서 기존 질의의 핵심 용어를 추출하고 하위 질의 선택 방법은 주어진 하위 질의가 기존 질의의 올바른 축약인지 문장 수준에서 결정합니다. 두가지 관점은 서로 다른 레벨에서 작동하여 상호보완적인 관계를 가졌기 때문에 제안 모델 ConQueR는 최종적으로 이를 결합하여 올바른 축약을 얻습니다. 더불어 검색 로그에서 빈번하게 발생할 수 있는 잘못된 샘플을 처리하기 위해 truncated loss 학습 방식을 도입하여 학습이 원활히 이루어지도록 설계하였습니다. 실제 검색 엔진에서 수집한 검색 로그 데이터에 대한 성능 실험과 만족도 조사를 통해, 제안 모델이 효과적으로 질의 축약을 수행하였음을 입증하였습니다. 본 논문에 관한 자세한 내용을 알고 싶으시다면, 다음 주소를 참고해주세요. https://dial.skku.edu/blog/2023_conquer <그림 5: Poster session 발표 사진> <그림 6: 제안 방법 ConQueR 구조> 연구 4: Yoonjin Im*, Eunseong Choi*, Heejin Kook, and Jongwuk Lee, “Forgetting-aware Linear Bias for Attentive Knowledge Tracing”, The 32nd ACM International Conference on Information and Knowledge Management (CIKM, short paper), 2023 지식 추적은 학습자의 순차적인 과거 문제 풀이 기록을 기반으로 새로운 목표 문제의 정오답을 예측하는 과업을 통해 숙련도를 모델링합니다. 학습자의 숙련도를 정확히 예측하기 위해서는 문제 간의 상관관계와 학습자의 특성(예: 망각 행동)을 학습하는 것이 중요합니다. 따라서 일부 집중 메카니즘(attention mechanism) 기반의 지식 추적 모델들은 절대적 위치 정보(absolute position embeddings) 대신 상대적 시간 간격 편향(relative time interval bias)을 도입하여 학습자의 망각 행동(forgetting behavior)을 모델링했습니다. 이는 현재 시점에서 오래된 문제 풀이 기록일수록 모델의 집중도를 낮춤으로써 망각 행동을 구현합니다. 하지만 기존 방법론들은 문제 풀이 기록이 길어질수록 망각 행동 모델링의 효과가 줄어드는 문제가 나타납니다. 본 연구에서는 일반화된 수식 분석을 통해 기존의 상대적 시간 간격 편향 계산에 문제 간의 상관관계가 불필요하게 개입된다고 판단하고, 이를 해결하기 위해 서로를 분리할 수 있는 선형 편향 기반의 FoLiBi (Forgetting aware Linear Bias for Attentive Knowledge Tracing)를 제안합니다. 제안하는 방법론은 기존의 집중 메카니즘 기반의 지식 추적 모델에 쉽게 적용될 수 있으며, 간단한 방법임에도 불구하고 4개의 벤치마크 데이터 세트에서 최신 지식추적 모델에 비해 최대 2.58%까지 일관되게 AUC를 개선했습니다. 본 논문에 관한 자세한 내용을 알고 싶으시다면, 다음 주소를 참고해주세요. https://dial.skku.edu/blog/2023_folibi <그림 7: 집중 메커니즘 기반 지식 추적 모델 구조> 연구 5: Seongmin Park, Mincheol Yoon, Jae-woong Lee, Hogun Park, and Jongwuk Lee, “Toward a Better Understanding of Loss Functions for Collaborative Filtering”, The 32nd ACM International Conference on Information and Knowledge Management (CIKM), 2023 본 연구는 추천 시스템의 한 축인 협업 필터링(Collaborative filtering)에서 사용하는 다양한 손실 함수(Loss function) 간의 수식적인 관계를 분석하고, 이 관계를 기반으로 새로운 손실함수를 제안합니다. 협업 필터링은 최신 추천 시스템에서 핵심적인 기술이며, 협업 필터링 모델의 학습 과정은 일반적으로 상호작용 인코더, 손실 함수, 네거티브 샘플링의 세 가지 구성 요소로 이루어집니다. 기존의 많은 연구에서 정교한 상호작용 인코더를 설계하기 위해 다양한 협업 필터링 모델을 제안했지만, 최근 연구에서는 단순히 손실 함수를 교체하는 것만으로도 큰 성능 향상을 달성할 수 있음을 보여주고 있습니다. 이 논문에서는 기존 손실 함수 간의 관계를 분석하여, 기존의 손실 함수들을 정렬(Alignment) 및 균일(Unifomrity)으로 해석할 수 있음을 밝혀냈습니다. (i) 정렬은 사용자와 항목 표현을 일치시키고, (ii) 균일은 사용자와 항목 분포를 분산시키는 역할을 합니다. 이 분석에서 영감을 얻어 데이터셋의 고유한 패턴을 고려하여 정렬과 균일성의 설계를 개선하는 새로운 손실 함수(Margin-aware Alignment and Weighted Uniformity, MAWU)를 제안합니다. (i) Margin-aware Alignment(MA)는 사용자/아이템별 인기도 편향을 완화하고, (ii) Weighted Uniformity(WU)는 데이터셋의 고유한 특성을 반영하기 위해 사용자 및 항목 균일성을 조정합니다. 실험 결과, MAWU를 탑재한 MF와 LightGCN은 세 가지 벤치마크 데이터셋에서 다양한 손실 함수를 사용하는 최신 협업 필터링 모델과 비슷하거나 더 우수한 것으로 나타났습니다. 본 논문에 관한 자세한 내용을 알고 싶으시다면, 다음 주소를 참고해주세요. https://dial.skku.edu/blog/2023_mawu <그림 8: 여덟 개의 손실 함수간 수식적 관계도> 연구 6: Sunkyung Lee*, Minjin Choi*, Jongwuk Lee (* : equal contribution), “GLEN: Generative Retrieval via Lexical Index Learning”, The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023 (To appear) 본 연구는 어휘 색인 학습을 통한 새로운 생성 검색 모델 GLEN (Generative retrieval model via LExical INdex Learning)을 제안합니다. 생성 검색(Generative retrieval)은 문서 검색의 새로운 패러다임으로, 질의에 대한 관련 문서의 식별자를 직접 생성하는 것을 목표로 합니다. 그러나, 기존 생성 검색 연구들은 두 가지 주요한 한계를 가집니다. 첫 번째는 문서의 식별자 생성이 기존의 자연어 생성과 의미적으로 많이 다르지만 이를 고려하지 않는다는 점입니다. 두 번째는 학습 시 식별자 생성만을 집중하지만, 추론 시 비슷한 문서 간의 순위를 매겨야 함으로 인해 발생하는 학습-추론 불일치입니다. 이를 극복하기 위해, 본 연구는 어휘 색인(lexical index)을 동적으로 학습하는 새로운 생성 검색 방법론을 제안합니다. 제안 방법론은 2단계 인덱스 학습 전략(two-phase lexical index learning)을 통해 (i) 키워드 기반의 고정된 문서 식별자를 생성하는 추가 사전 학습 단계를 수행하며, (ii) 동적 문서 식별자를 질의 및 문서 간의 관련성을 통해 학습하도록 합니다. 실험 결과, 제안 모델 GLEN이 NQ320k, MS MARCO, BEIR 등 다양한 벤치마크 데이터셋에서 기존의 생성 검색 모델 또는 전통적인 검색 모델 대비 최상 또는 경쟁력 있는 성능을 달성한다는 것을 증명하였습니다. 코드는 https://github.com/skleee/GLEN 에서 확인할 수 있습니다. 본 논문에 관한 자세한 내용을 알고 싶으시다면, 다음 주소를 참고해주세요. https://dial.skku.edu/blog/2023_glen <그림 9: 제안 모델 GLEN 구조> 연구 7: Jiwoo Kim, Youngbin Kim, Ilwoong Baek, JinYeong Bak, Jongwuk Lee, “It Ain't Over: A Multi-aspect Diverse Math Word Problem Dataset”, The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023 (To appear) 본 연구는 LLM(Large Language Model)의 수학적 추론 능력을 분석하고, 이를 개선하기 위한 새로운 10K 데이터셋 DMath (Diverse Math Word Problems)를 제안합니다. 수학 문장형 문제 (MWP, Math Word Problem) 과제는 자연어 모델에게 자연어 문장에 대한 깊은 이해와 논리적 추론을 요구하는 복잡하고 흥미로운 과제이자 자연어 모델의 추론 능력을 평가하기 위해 주로 사용되어 온 과제입니다. 최근 들어 거대 언어 모델(LLM)이 등장하면서 기존 수학 문장형 문제 벤치마크에서 높은 성능을 거두었으며, 이를 통해 LLM이 좋은 수학적 추론 능력을 가지고 있다고 알려져 있습니다. 그러나 이는 제한적인 벤치마크로 인한 결과로 본 논문에서는 기존 벤치마크의 낮은 다양성을 지적하며 이를 높여야 함을 보입니다. 본 논문은 수학 문장형 문제 데이터셋이 가져야 하는 다양성을 총 네 가지로 정의합니다. 이는 추론 유형(problem types), 어휘 사용 패턴(lexical usage patterns), 언어(languages), 그리고 중간 풀이 과정(intermediate solution forms)입니다. 추론 유형을 정의하기 위해 본 연구는 미국과 한국의 수학 교육 과정을 참고하였고, 산술 연산(arithmetic calculation), 비교(comparison), 상관관계(correspondence), 도형(geometry), 확률(possibility)로 정의하였습니다. 이전 연구들은 산술 연산에 집중하였기 때문에 다른 유형의 수학적 추론 능력에 대해 LLM이 어떤 결과를 내보이는지 알려진 바가 적었습니다. 본 연구의 실험 결과 LLM의 추론 능력은 추론 유형에 따라 많은 차이를 보입니다. 어휘 사용 패턴, 언어, 중간 풀이 과정에 대해서도 높은 다양성을 추구하였고, 이러한 특징으로 인해 DMath는 이전 연구들보다 더 도전적인 데이터셋이라고 볼 수 있습니다. 또한 데이터를 구성하고 구축하는 과정에서 43명의 사람들이 참여했으며, 정교한 검증을 통해 높은 품질을 추구하였습니다. 높은 다양성으로 인해 DMath는 LLM의 다양한 추론 능력을 검사하고 평가하는데 도움이 될 수 있습니다. 관련 데이터는 https://github.com/JiwooKimAR/dmath 에서 확인하실 수 있습니다. 본 논문에 관한 자세한 내용을 알고 싶으시다면, 다음 주소를 참고해주세요. https://dial.skku.edu/blog/2023_dmath <그림 10: 2개의 LLM에 대한 MWP 데이터셋 성능 비교> <그림11: 여러 추론 유형에 대한 2개의 LLM 성능>
- 작성일 2023-11-24
- 조회수 1604
이강윤 교수 제16회 반도체의 날 행사 산업부 장관 표창 수상

전자전기공학부/인공지능학과 이강윤 교수가 26일 오후 서울 더케이호텔에서 개최된 '제16회 반도체의 날'에서 성균관대학교 교원창업 기업인 스카이칩스 대표로서 산업부 표창을 수상하였다. 반도체의 날은 반도체 수출 100억 달러를 달성한 1994년 10월 29일을 기념해 2008년부터 개최한 행사로 국가 수출액 기준 1위로 대한민국 국가 중추 산업(10년 연속 글로벌 2위)인 반도체 산업의 성장을 위해 힘쓴 유공자들에게 매년 포상을 시상하고 있다.
- 작성일 2023-10-27
- 조회수 991
박은병 교수 연구팀, 물리정보 신경망의 정확도와 학습 속도를 대폭 개선한 방법론 개발

박은병 교수 연구팀, 물리정보 신경망의 정확도 및 학습속도 대폭 개선한 방법론 개발 - 학습 속도 50배 이상 향상 및 정확도 개선
- 작성일 2023-10-17
- 조회수 0
우사이먼성일 교수 DASH 연구실, CIKM 2023 국제 학술대회 논문 3편 게재 승인 및 이상탐지관련 워크셥 개최

DASH 연구실 박은주 박사과정, Binh M. L e 박사과정, 조범상 석사과정, 이상용 인공지능학과 석사과정, 백승연 인공지능학과 석사과정, 김지원 인공지능학과 석사과정 의 논문 3편이 인공지능 및 정보검색 분야의 top-tier 국제학술대회인 CIKM (Conference on Information and Knowledge Management) 2023에 최종 논문 게재가 승인되어 10월에 발표될 예정입니다. 1. 호주 CSIRO Data61과 Deepfake 관련 연구 2. 신분증 진위 분류를 위한 데이터셋 관련 연구 3. Machine Unlearning 연구 또한, DASH 연구실 우사이먼성일 교수님이 주축이 되어 제 1회 위성 및 무인비행체의 이상탐지에 관한 워크샵이 CIKM 2023에서 개최됩니다. 1. Beomsang Cho, Binh M. Le, Jiwon Kim, Simon S. Woo , Shahroz Tariq, Alsharif Abuadbba, and Kristen Moore , “Toward Understanding of Deepfake Videos in the Wild”, Proceedings of the 32nd ACM International Conference on Information & Knowledge Management. 2023. 본 연구는 최근 증가하는 Deepfake 문제를 다루며, 기존 Dataset이 최신 기술을 충분히 반영하지 못하는 한계를 극복하기 위한 목적으로 시작되었습니다. 우리는 RWDF-23이라는 최신 Deepfake Dataset을 제안합니다. RWDF-23은 Reddit, Youtube, TikTok, Bilibili등에서 수집한 2,000개의 Deepfake 비디오로 구성되며, 4가지의 다른 언어를 대상으로 수집하였습니다. 이를 통하여 이전 Dataset의 범위를 확장하고, 현재 온라인 플랫폼에서 얼마나 많은 최신 Deepfake 기술들이 사용되고 있는지 분석하며 Deepfake를 제작하는 사람들의 분석뿐만 아니라 시청자들의 의견과 상호 작용 데이터를 수집하여 Deepfake를 시청하는 사람들의 상호작용 방식을 조사합니다. 이러한 풍부한 정보를 고려하여 계속해서 진화하는 Deepfake와 현실 온라인 플랫폼에 미치는 영향에 대한 포괄적인 이해를 제공합니다. Deepfakes have become a growing concern in recent years, prompting researchers to develop benchmark datasets and detection algorithms to tackle the issue. However, existing datasets suffer from significant drawbacks that hamper their effectiveness. Notably, these datasets fail to encompass the latest deepfake videos produced by state-of-the-art methods that are being shared across various platforms. This limitation impedes the ability to keep pace with the rapid evolution of generative AI techniques employed in real-world deepfake production. Our contributions in this IRB-approved study are to bridge this knowledge gap from current real-world deepfakes by providing in-depth analysis. We first present the largest and most diverse and recent deepfake dataset (RWDF-23) collected from the wild to date, consisting of 2,000 deepfake videos collected from 4 platforms targeting 4 different languages span created from 21 countries: Reddit, YouTube, TikTok, and Bilibili. By expanding the dataset’s scope beyond the previous research, we capture a broader range of real-world deepfake content, reflecting the ever-evolving landscape of online platforms. Also, we conduct a comprehensive analysis encompassing various aspects of deepfakes, including creators, manipulation strategies, purposes, and real-world content production methods. This allows us to gain valuable insights into the nuances and characteristics of deepfakes in different contexts. Lastly, in addition to the video content, we also collect viewer comments and interactions, enabling us to explore the engagements of internet users with deepfake content. By considering this rich contextual information, we aim to provide a holistic understanding of the evolving deepfake phenomenon and its impact on online platforms. 2. Eun-Ju Park, Seung-Yeon Back, Jeongho Kim, and Simon S. Woo, ”KID34K: A Dataset for Online Identity Card Fraud Detection”, Proceedings of the 32nd ACM International Conference on Information & Knowledge Management. 2023. 본 연구에서는 모바일 신분증 검증 시스템의 보안 강화를 위한 데이터셋을 제공합니다. 최근 모바일 플랫폼에서의 본인인증절차는 신분증을 기반으로 진행되고 있는데, 비대면 금융거래가 증가함에 따라 금융거래의 당사자가 신분증 명의자임을 증명하는 절차도 중요해지고 있습니다. 그러나, 현재의 시스템은 금융거래 이용자가 제출하는 사진이 본인의 신분증을 직접 찍는 것인지, 아니면 모니터나 종이에 출력한 다른 사람의 신분증 사진을 찍은 것인지 구별하지 않습니다. 본 연구는 이러한 신분증 검증 시스템의 안정성 강화라는 측면과, 또한 신분증의 개인정보 유출을 막기위한 측면 두 가지를 고려한 신분증 이미지 데이터셋을 제공합니다. Though digital financial systems have provided users with convenient and accessible services, such as supporting banking or payment services anywhere, it is necessary to have robust security to protect against identity misuse. Thus, online digital identity (ID) verification plays a crucial role in securing financial services on mobile platforms. One of the most widely employed techniques for digital ID verification is that mobile applications request users to take and upload a picture of their own ID cards. However, this approach has vulnerabilities where someone takes pictures of the ID cards belonging to another person displayed on a screen, or printed on paper to be verified as the ID card owner. To mitigate the risks associated with fraudulent ID card verification, we present a novel dataset for classifying cases where the ID card images that users upload to the verification system are genuine or digitally represented. Our dataset is replicas designed to resemble real ID cards, making it available while avoiding privacy issues. Through extensive experiments, we demonstrate that our dataset is effective for detecting digitally represented ID card images, not only in our replica dataset but also in the dataset consisting of real ID cards. 3. Sanyong Lee and Simon Woo, “UNDO: Effective and Accurate Unlearning Method for Deep Neural Networks”, Proceedings of the 32nd ACM International Conference on Information & Knowledge Management. 2023. 본 연구에서는 간단하면서 효과적인 machine unlearning 기법 UNDO를 제안합니다. 이 기법은 학습된 모델에서 한 클래스의 정보를 지우기 위해 두 스텝으로 이뤄져있습니다. 먼저 coarse-grained level로서, 잊으려고 하는 데이터에 다른 레이블을 부여하여 한 에폭(epoch)만 짧게 학습하므로서 결정 경계를 허뭅니다. 그런 다음 fine-grained level로서, 앞선 단계에서 차마 잊지 못한 데이터를 잊으면서, 남길 데이터에 대한 부작용을 개선하기 위한 학습을 합니다. 이때 남길 데이터는 학습에 사용하지 않은 소량만 사용하여 학습 속도를 빠르게 합니다. 다양한 실험을 통하여 본 논문에서 제안하는 UNDO는 기존 machine unlearning 기법들 보다 빠르고 효과적임을 보여줍니다. Machine learning has evolved through extensive data usage, including personal and private information. Regulations like GDPR highlight the "Right to be forgotten" for user and data privacy. Research in machine unlearning aims to remove specific data from pre-trained models. We introduce a novel two-step unlearning method, UNDO. First, we selectively disrupt the decision boundary of forgetting data at the coarse-grained level. However, this can also inadvertently affect the decision boundary of other remaining data, lowering the overall performance of the classification task. Hence, we subsequently repair and refine the decision boundary for each class at the fine-grained level by introducing a loss to maintain the overall performance while completely removing the class. Our approach is validated through experiments on two datasets, outperforming other methods in effectiveness and efficiency. 4. The 1st International Workshop on Anomaly and Novelty detection in Satellite and Drones systems (ANSD '23) 제 1회 위성 및 무인비행체의 이상탐지에 관한 워크샵이 CIKM 2023에서 개최됩니다. 본 워크샵은 우사이먼성일 성균관대 교수, Shahroz Tariq CSIRO’s Data61 소속, 신유진 가톨릭대 교수, 정대원 한국항공우주연구원 소속이 주축이 되어 무인비행체의 시계열 및 이미지 데이터에 대한 이상을 탐지하는 것과 관련된 내용을 주제로 합니다. The workshop on Anomaly and Novelty Detection in Drones and Satellite data at CIKM 2023 aims to bring together researchers, practitioners, and industry experts to discuss the latest advancements and challenges in detecting anomalies and novelties in drone and satellite data. With the increasing availability of such data, the workshop seeks to explore the potential of machine learning and data mining techniques to enable the timely and accurate detection of unexpected events or changes. The workshop will include presentations of research papers, keynote talks, panel discussions, and poster sessions, with a focus on promoting interdisciplinary collaboration and fostering new ideas for tackling real-world problems. 문의사항이나 질문은 DASH Lab(https://dash.skku.edu)의 우사이먼교수(swoo@g.skku.edu)에게 연락부탁드립니다.
- 작성일 2023-09-19
- 조회수 1824
인공지능학과 이지형 교수 연구실(IIS Lab.) 2023 AI 대학원 챌린지 with KT 믿음 대회 수상

인공지능학과 IIS Lab. 나철원(석박통합과정 8기), 안지민(석사과정 3기), 김한별(석사과정 1기) 팀(팀명 ‘해 치원나’)과 김효준(석사과정 4기), 양정안(석사과정 2기), 이지형(석사과정 1기) 팀(팀명 ‘차별없는사회’)이 KT와 과학기술정보통신부가 공동 주최하는 '2023 AI 대학원 챌린지 with KT 믿음’ 최종결선에서 각각 KT CTO상(2위)과 우수상(4위)을 수상하였다. AI 대학원 챌린지는 KT 초거대 AI '믿음(Mi:dm)'을 이용해 해결할 수 있는 신규 과제 아이디어를 제안하고 이를 적용할 AI 모델을 개발하는 챌린지로, 실무형 AI 핵심 인재 발굴을 위해 개최되었다. 대회는 예선과 본선으로 이루어졌으며, 예선 주제는 초거대 AI 믿:음을 활용한 새로운 task를 제안하는 하는 것이다. 이후 예선을 통과한 10개 팀은 제안한 task를 적용할 AI 모델을 튜닝하고 개발하는 본선을 가졌다. ‘해 치원나’ 팀은 새로운 유형의 보이스 피싱에 대응 가능한 Few-shot 보이스 피싱 탐지 모델을 개발하였으며, ‘차별없는사회’ 팀은 차별 문장 탐지를 통한 차별 완화 모델을 개발하여 각각 KT CTO상(2등)과 KT초거대 AI믿:음 우수상(4등)을 수상하였다.
- 작성일 2023-09-13
- 조회수 2221
소프트웨어학과 김유성 교수 연구실 (CSI Lab.) 2023 스펙트럼 챌린지 대회 1등

소프트웨어학과 김유성 교수 연구실 (CSI Lab.) 2023 스펙트럼 챌린지 대회 1등 - CSI Lab 팀, 2020년부터 4년 연속 1등 - 차세대 와이파이 환경에서 효율적인 주파수 공동 사용 방안 마련 소프트웨어학과 CSI Lab. 박태건(석사과정), 나인호(학부연구원), 허찬용(인턴), 김유성 교수 팀이 한국전자통신연구원(ETRI)에서 주최하는 2023 스펙트럼 챌린지 최종결선 대회에서 1등을 거머쥐었다. 스펙트럼 챌린지는 정부 주도형 연구·개발 체계를 개방된 도전 경쟁형 연구·개발 체계로 발전시키려는 목적으로 국가연구개발계획에 의거하여 2019년도부터 매년 실시되었다. 올해 스펙트럼 챌린지는 그동안 대회에 참가한 40개 팀에서 승자전으로 진출한 우수 팀 간의 최종결선 대회로 5주간 치러졌다. 도전 문제는 2가지 유형으로 유형1은 주어진 전파 환경에서 이용 중인 전파 특성을 AI 기술을 통해 식별하는 문제이며 유형2는 동적 환경변화에 강인한 자원 할당 및 스케줄링을 위한 강화학습 알고리즘을 개발하는 문제이다. 이번 대회에서는 우리 대학을 포함하여 한양대, 한동대, 고려대가 우수 팀으로 선발되었다. 성균관대학교 CSI Lab(Computer Systems and Intelligence Lab) 팀은 인공지능 기법을 이용해 비면허 대역의 통신환경 속에서 무선 서비스 기기가 전파 이용효율을 최적화하여 통신할 수 있는 전파자원 할당 기법을 찾는 알고리즘 개발 유형2 에서 1등을 수상하였다. 특히 CSI Lab 팀은 4년 연속 1등이라는 쾌거를 이룩했다. 기사 본문 전자신문 https://www.etnews.com/20230825000155 뉴스1 https://www.news1.kr/articles/?5151113
- 작성일 2023-09-05
- 조회수 1341
고종환 교수 연구실, ICCV 2023 논문 2편 게재 승인

IRIS 연구실의 고종환 교수와 인공지능학과 박사과정 정문찬의 논문 2편이 인공지능 및 컴퓨터 비전 분야의 최우수 국제 학술대회(BK21 CS IF=4)인 International Conference on Computer Vision (ICCV 2023)에 게재 승인되었습니다. 논문 #1: "Multi-Scale Bidirectional Recurrent Network with Hybrid Correlation for Point Cloud-Based Scene Flow Estimation", 정문찬, 고종환 논문 #2: "HandR2N2: Iterative 3D Hand Pose Estimation Using a Residual Recurrent Neural Network", 정문찬, 고종환 논문 "Multi-Scale Bidirectional Recurrent Network with Hybrid Correlation for Point Cloud-Based Scene Flow Estimation"은 자율주행 환경에서 3차원 움직임을 정확하고 효율적으로 예측하기 위한 양방향 순환 기법을 제안합니다. 지난 해 ECCV 2022 학회에서 선보인 연구를 더욱 발전시켜 자연어처리의 양방향 순환 구조(Bidirectional Recurrent Network)를 도입하여 예측 에러를 2배 이상 감소시켰으며, 기존 SOTA 순환 기법들보다 3배 이상 빠른 속도를 보였습니다. 논문 "HandR2N2: Iterative 3D Hand Pose Estimation Using a Residual Recurrent Neural Network"은 AR 글래스를 위한 정확한 3차원 손 자세 추정 기법을 제안합니다. 제안된 기법은 새로운 순환 모듈(Residual Recurrent Unit)을 활용하여 각 관절의 추정 위치를 반복적으로 최적화시키며, 다양한 디바이스의 계산 자원에 따라 동적으로 최적화 횟수와 연산량을 조절할 수 있는 유연한 방법을 제시합니다. 이는 다양한 벤치마크 데이터셋에서 최고의 성능을 보여주며, 연산 효율성과 유연성을 동시에 보여주었습니다.
- 작성일 2023-08-14
- 조회수 1629
우사이먼 교수 연구실 (DASH Lab) ICCV 2023 논문 게재 승인

DASH Lab의 우사이먼 교수와 소프트웨어학과 박사과정 Binh M. Le의 논문이 컴퓨터 비전 분야의 최우수 국제 학술대회인 IEEE/CVF International Conference on computer Vision (ICCV)에 게재 승인되었습니다. 논문은 2023년 10월 프랑스 파리에서 발표될 예정입니다. 현재 딥페이크(Deep + Fake)를 활용한 사회적 악용사례 및 범죄는 점차 늘고 있는 추세이며, 딥페이크 탐지 관련 많은 연구가 진행되고 있지만, 저화질 딥페이크 이미지는 정보량이 적어 고화질 이미지에 비해 탐지가 훨씬 더 어렵고, 성능이 높은 일반화된 탐지모델을 만드는 것은 challenging합니다. 이 연구에서 다양한 화질의 딥페이크 이미지를 동시에 효율적으로 탐지할 수 있는 Quality Agonistic Deepfake 탐지 모델을 제안합니다. 이는 고화질과 저화질 딥페이크를 Hilbert-Schmidt Independence Criterion (HSIC)를 이용한 intermediate representations간 기하학적 유사성을 극대화함으로서 다양한 input corruption하의 강건함을 증가시켜 모델의 일반성을 높여 좋은 다양한 벤치마크 데이터셋에 최고의 성능을 보여 주었습니다. [Abstract] Deepfake has recently raised a plethora of societal concerns over its possible security threats and dissemination of fake information. Much research on deepfake detection has been undertaken. However, detecting low quality as well as simultaneously detecting different qualities of deepfakes still remains a grave challenge. Most SOTA approaches are limited by using a single specific model for detecting certain deepfake video quality type. When constructing multiple models with prior information about video quality, this kind of strategy incurs significant computational cost, as well as model and training data overhead. Further, it cannot be scalable and practical to deploy in real-world settings. In this work, we propose a universal intra-model collaborative learning framework to enable the effective and simultaneous detection of different quality of deepfakes. That is, our approach is the quality-agnostic deepfake detection method, dubbed QAD. In particular, by observing the upper bound of general error expectation, we maximize the dependency between intermediate representations of images from different quality levels via Hilbert-Schmidt Independence Criterion. In addition, an Adversarial Weight Perturbation module is carefully devised to enable the model to be more robust against image corruption while boosting the overall model's performance. Extensive experiments over seven popular deepfake datasets demonstrate the superiority of our QAD model over prior SOTA benchmarks. Contact for Questions: swoo@g.skku.edu
- 작성일 2023-07-28
- 조회수 1704