ScSLINT is a scalable interlinking framework designed for time and memory efficiency. ScSLINT can be used to detect all co-referent resources between given repositories: Rsource and Rtarget. ScSLINT is developed as an generalization of SLINT+. ScSLINT is written in C++ and currently is compatible with Windows and Linux system.
Summary

The architecture of ScSLINT
- Step 1. Property alignment generator creates the property alignments between Rsource and Rtarget.
- Step 2. Similarity function generator assigns similarity measures for each property alignment.
- Step 3. Candidate generator detects the candidates of potentially co-referent instances.
- Step 4. Configuration creator generates a default configuration. A configuration describes similarity functions, similarity aggregator, and co-reference filter.
- Step 5. Similarity aggregator executes similarity functions and computes final matching score.
- Step 6. Co-reference filter applies declared contrainsts (e.g. thresholding and stable matching) on matching scores and produces the final result.
- Remark: ScSLINT can work automatically and also support intervention (e.g., from user) on each step. Using ScSLINT, user can introduce any new mechanism (e.g., similarity functions and similarity aggregator) for any step.
Systems and Algorithms that use ScSLINT as the base framework
- cLearn: Heuristic-based configuration learning algorithm. cLearn finds the optimal configuration using labeled pairs of instances.
- cLink: Supervised instance matching system.
- ScLink: Scalable supervised instance matching system.
Benchmark
The following table is the result of ScSLINT when tested with OAEI 2012 dataset, using default configuration generated by ScSLINT. Two complex similarity functions used for strings: Levenshtein and TF-IDF Cosine. Time (Step 1, 3, 5, 6) is measured on Intel core i7 4770 CPU, 8GB RAM.
You can scroll horizontally.
Dataset | Size (x10^9) | Candidates (x10^6) | Similarity functions |
Step 1 | Step 3 | Step 5 | Step 6 |
---|---|---|---|---|---|---|---|
nyt.loc.gn | 32.69 | 32.2 | 12 | 37s | 7s | 70s | 3s |
nyt.loc.db | 16.06 | 38.2 | 25 | 43s | 8s | 268s | 6s |
nyt.org.db | 25.47 | 61.7 | 17 | 46s | 11s | 404s | 6s |
nyt.peo.db | 41.66 | 46.9 | 22 | 46s | 11s | 251s | 6s |
nyt.loc.fr | 154.97 | 222.7 | 23 | 14s | 111s | 641s | 29s |
nyt.org.fr | 245.7 | 357.4 | 16 | 14s | 268s | 1023s | 46s |
nyt.peo.fr | 401.89 | 620.1 | 18 | 15s | 507s | 1578s | 78s |
Download
- Binary
- Source code
- Datasets
We provide binary indexed data that is compatible with ScSLINT. For originally raw data (e.g., plain RDF format), please download from original data providers.- E-commerce data (small size)
- OAEI 2010 (12GB, 48GB uncompressed)
- OAEI 2012 (38GB, 144GB uncompressed)
- Other tools
- KViewer is a GUI module that supports large scale exploration on ScSLINT compatible format
- KIndexer is an indexing module that converts NTriple format (*.nt) to ScSLINT compatible format
Click here to download KViewer and KIndexer
Send us a request mail for the dataset files.
Contact
Khai Nguyen:
Ryutaro Ichise:
Reference
- Khai Nguyen, Ryutaro Ichise, Bac Le. Interlinking Linked Data Sources Using a Domain-Independent System. In Proceedings of the 2nd Joint International Semantic Technology Conference. LNCS, vol. 7774, pp. 113-128. Springer (2013)
- Khai Nguyen, Ryutaro Ichise, Bac Le. SLINT: A Schema-Independent Linked Data Interlinking System. In Proceedings of the 7th Ontology Matching, CEUR-WS.org, vol. 946. (2012)
- Khai Nguyen, Ryutaro Ichise. ScSLINT: Time and Memory Efficient Interlinking Framework for Linked Data. In Proceedings of the 14th Internation Semantic Web Conference Posters and Demonstrations Track, CEUR-WS.org, vol. 1486. (2015)
- Khai Nguyen, Ryutaro Ichise. A Heuristic Approach for Configuration Learning of Supervised Instance Matching. In Proceedings of the 14th Internation Semantic Web Conference Posters and Demonstrations Track. (2015)
- Khai Nguyen, Ryutaro Ichise. Heuristic-based Configuration Learning for Linked Data Instance Matching. In Proceedings of the 5th Joint International Semantic Technology Conference. LNCS, vol. 9544, pp. 56-72. Springer (2015)
- Khai Nguyen, Ryutaro Ichise. Linked Data Entity Resolution System Enhanced by Configuration Learning Algorithm. IEICE Transaction on Information System, Vol.E99-D, No.6, pp. 1521-1530. (2016)
- Khai Nguyen, Ryutaro Ichise. ScLink: supervised instance matching system for heterogeneous repositories. Journal of Intelligent Information Systems, DOI: 10.1007/s10844-016-0426-3. Springer. (2016)