Data Use Detector

This Space demonstrates our fine-tuned GLiNER model’s ability to spot dataset mentions and relations in any input text. It identifies dataset names via NER, then extracts relations such as publisher, acronym, publication year, data geography, and more.

How it works

NER: Recognizes dataset names in your text.
RE: Links each dataset to its attributes (e.g., publisher, year, acronym).
Visualization: Highlights entities and relation spans inline.

Instructions

Paste or edit your text in the box below.
Tweak the NER & RE confidence sliders.
Click Submit to see highlights.
Click Get Model Predictions to view the raw JSON output.

Resources

Model: rafmacalaba/datause-extraction-v0
Paper: Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers – ArXiv: 2502.10263
GLiNER GitHub Repo
Project Docs

Input Text

Try one of these example texts

NER Threshold

Minimum confidence for named-entity spans.

0 1

RE Threshold

Minimum confidence for relation extractions.

0 1

Annotated Entities

Model Predictions (JSON)