Published online: 2016
Abstract
Text document clustering is a technique that groups documents into several clusters based on similarities. Most clustering algorithms build disjoint clusters, but clusters should be overlapped because documents may belong to two or more categories in the real world. For example, an article discussing the Apple Watch may be categorized into either 3C, Fashion, or Clothing and Shoes. This paper proposes an overlapping clustering algorithm by using the Formal Concept Analysis, which could make a document assigned to two or more clusters. Moreover, our algorithm reduced the vector space dimensions and performed more efficiently than existing clustering methods. |