DenForestis an incremental density-based clustering algoritm for data streams (sliding window model).- It supports
insert/batch_insertanddelete/batch_deleteoperations to incrementally update clusters over data streams. - It supports
labelAndReturnto compute the labels of the data points in the window.
- maven
mvn package -DskipTests- JAR file will be created in the
targetfolder.
mvn test- Run several tests using
sample_dataset.
You can find example codes in following files.
src/test/java/example/Denforest_test.javasrc/test/java/example/Denforest_optimized_test.java
DenForest-SIGMOD2022
├── pom.xml // for maven
├── README.md
├── sample_dataset
└── src
├── main
│ └── java
│ └── denforest
│ ├── datapoint // datapoint class
│ ├── DenForest.java // denforest algorithm without optimization
│ ├── DenForestStrided.java // denforest algorithm with batch optimization
│ ├── linkcuttree // link-cut tree implementation
│ ├── rtree // rtree implementation
│ └── unionfind // union-find implementation
└── test
└── java
├── denforest
│ └── DenForestTest.java // test code using below examples
└── example
├── Denforest_test.java // example code fore denforest
└── Denforest_optimized_test.java // example code for batch-optimized denforest
- The density-based clustering is utilized for various applications such as hot spot detection or segmentation. To serve those applications in real time, it is desired to update clusters incrementally by capturing only the recent data.
- The previous incremental density-based clustering algorithms often represent clusters as a graph and suffer serious performance degradation.
- In order to address the problem of slow deletion, we proposes a novel incremental density-based clustering algorithm called
DenForest. - DenForest` can determine efficiently and accurately whether a cluster is to be split by a point removed from the window in logarithmic time.
- With extensive evaluations, it is demonstrated that
DenForestoutperforms the state-of-the-art density-based clustering algorithms significantly and achieves the clustering quality comparable with that of DBSCAN.
Bogyeong Kim, Kyoseung Koo, Undraa Enkhbat, and Bongki Moon. 2022. DenForest: Enabling Fast Deletion in Incremental Density-Based Clustering over Sliding Windows. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD ’22), June 12–17, 2022, Philadelphia, PA, USA. ACM, New York, NY, USA, 14 pages. https://doi.org/10.1145/3514221.3517833