Soin works with structured markdown files extracted
from art world websites. Each file should follow
this format:
Header Block
# [ID] — [Source Name]
- **URL**: https://example.com/page
- **Source type**: institution
- **Location**: City, Country
- **Description**: Brief description
Source types: institution, gallery,
festival, residency, prize, collective, project,
network
People Section
## People
### Full Name
- **Role**: Position — Organization Name
- **Other affiliations**: Org 1, Org 2
- **Bio**: Background info
- **Works/shows**: Artwork titles
Organizations Section
## Organizations
- Organization Name (COUNTRY_CODE) -- description
- Multi-country: Org (FR/BE) -- description
Use ISO country codes: FR, AT, DE,
US, IT, etc.
Artworks Section
## Artworks
### Title (year)
- **By**: Artist Name
- **Medium**: installation, video, etc.
- **Shown at**: Venue Name
Exhibitions Section
## Exhibitions
- Exhibition Name (year) — Venue, City
What Soin Extracts
-
Entities: People,
organizations, places, events, artworks
-
Relations: member-of,
affiliated-with, created-by, exhibited-at,
located-in, co-occurs
-
Deduplication: Same entities
across files are automatically merged
Discover Tab — CV Confidence Scoring
Each search result is scored 0–100 on how likely it is an artist CV:
- Year patterns (up to +35): CVs contain many years (exhibitions, education). 20+ years = +35, 10+ = +25, 5+ = +15
- CV keywords (up to +35): "solo exhibition", "residency", "education", "award", "bibliography", "born in", etc. 8+ hits = +35
- List structure (up to +20): Lines starting with years or bullets. 15+ list lines = +20
- Length bonus (up to +10): Ideal CV length is 2K–30K chars
- PDF bonus (+15): PDFs are commonly CVs
- URL contains "cv" (+15): URLs with /cv, /curriculum, =cv
Extract High-Confidence picks the single best result per person (score >= 70), preferring PDFs and CV-containing URLs. Brave Search API: free tier = 1000 queries/month ($5 plan). Counter shown next to the button.
Graph Tab
- Delete Entity: Click a node, then "Delete Entity" in the detail panel. Removes the entity and all its relations (entries are kept).
- Merge Entities: Click a node, then "Select for Merge". Click a second node and "Select for Merge" again. A merge bar appears — click "Merge" to combine them. The first entity is merged into the second (target keeps its name).
- Exclude Names: Type comma-separated names in the "Exclude names" input to hide specific entities (e.g., "wikipedia, artforum").
- "UNKNOWN" entities are always filtered out automatically.
Tips
-
Section names are flexible: "People", "Team
Members", "Artworks / Projects" all work
- All header fields except title are optional
- Source types are optional — sources don't need to be categorized
-
Files with no sections are treated as single
entries
-
Unknown sections skip entity extraction (plain
text)
-
Batch upload related files for cross-file entity
linking
Full documentation:
/media/lvk/4tbexfat/common/soin/MARKDOWN_FORMAT.md