Content Tags

There are no tags.

Matching Long Text Documents via Graph Convolutional Networks.

RSS Source
Authors
Bang Liu, Ting Zhang, Di Niu, Jinghong Lin, Kunfeng Lai, Yu Xu

Identifying the relationship between two text objects is a core researchproblem underlying many natural language processing tasks. A wide range of deeplearning schemes have been proposed for text matching, mainly focusing onsentence matching, question answering or query document matching. We point outthat existing approaches do not perform well at matching long documents, whichis critical, for example, to AI-based news article understanding and event orstory formation. The reason is that these methods either omit or fail to fullyutilize complicated semantic structures in long documents. In this paper, wepropose a graph approach to text matching, especially targeting long documentmatching, such as identifying whether two news articles report the same eventin the real world, possibly with different narratives. We propose the ConceptInteraction Graph to yield a graph representation for a document, with verticesrepresenting different concepts, each being one or a group of coherent keywordsin the document, and with edges representing the interactions between differentconcepts, connected by sentences in the document. Based on the graphrepresentation of document pairs, we further propose a Siamese Encoded GraphConvolutional Network that learns vertex representations through a Siameseneural network and aggregates the vertex features though Graph ConvolutionalNetworks to generate the matching result. Extensive evaluation of the proposedapproach based on two labeled news article datasets created at Tencent for itsintelligent news products show that the proposed graph approach to longdocument matching significantly outperforms a wide range of state-of-the-artmethods.

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.