soin

art world knowledge base

soin
File preview

Select a file from the right panel to preview its content.

No files selected. Click "Add Files" to begin.

Click "Compute" to generate the embedding map. Needs at least 2 ingested entries.

Selected entry

Click a dot on the map to see the full text and extracted entities.

All ingested entries
Entry detail

Click an entry to see full text and entities.

Types:

Click "Load Graph" to visualize the entity network.

Entity detail

Click a node to see entity details, relations, and entries.

Extraction result
API Keys
People & CV results

Extract names from an institutional page to begin.


Markdown Format Guide

Soin works with structured markdown files extracted from art world websites. Each file should follow this format:

Header Block

# [ID] — [Source Name]
- **URL**: https://example.com/page
- **Source type**: institution
- **Location**: City, Country
- **Description**: Brief description

Source types: institution, gallery, festival, residency, prize, collective, project, network

People Section

## People

### Full Name
- **Role**: Position — Organization Name
- **Other affiliations**: Org 1, Org 2
- **Bio**: Background info
- **Works/shows**: Artwork titles

Organizations Section

## Organizations
- Organization Name (COUNTRY_CODE) -- description
- Multi-country: Org (FR/BE) -- description

Use ISO country codes: FR, AT, DE, US, IT, etc.

Artworks Section

## Artworks

### Title (year)
- **By**: Artist Name
- **Medium**: installation, video, etc.
- **Shown at**: Venue Name

Exhibitions Section

## Exhibitions
- Exhibition Name (year) — Venue, City

What Soin Extracts

  • Entities: People, organizations, places, events, artworks
  • Relations: member-of, affiliated-with, created-by, exhibited-at, located-in, co-occurs
  • Deduplication: Same entities across files are automatically merged

Discover Tab — CV Confidence Scoring

Each search result is scored 0–100 on how likely it is an artist CV:

  • Year patterns (up to +35): CVs contain many years (exhibitions, education). 20+ years = +35, 10+ = +25, 5+ = +15
  • CV keywords (up to +35): "solo exhibition", "residency", "education", "award", "bibliography", "born in", etc. 8+ hits = +35
  • List structure (up to +20): Lines starting with years or bullets. 15+ list lines = +20
  • Length bonus (up to +10): Ideal CV length is 2K–30K chars
  • PDF bonus (+15): PDFs are commonly CVs
  • URL contains "cv" (+15): URLs with /cv, /curriculum, =cv

Extract High-Confidence picks the single best result per person (score >= 70), preferring PDFs and CV-containing URLs. Brave Search API: free tier = 1000 queries/month ($5 plan). Counter shown next to the button.

Graph Tab

  • Delete Entity: Click a node, then "Delete Entity" in the detail panel. Removes the entity and all its relations (entries are kept).
  • Merge Entities: Click a node, then "Select for Merge". Click a second node and "Select for Merge" again. A merge bar appears — click "Merge" to combine them. The first entity is merged into the second (target keeps its name).
  • Exclude Names: Type comma-separated names in the "Exclude names" input to hide specific entities (e.g., "wikipedia, artforum").
  • "UNKNOWN" entities are always filtered out automatically.

Tips

  • Section names are flexible: "People", "Team Members", "Artworks / Projects" all work
  • All header fields except title are optional
  • Source types are optional — sources don't need to be categorized
  • Files with no sections are treated as single entries
  • Unknown sections skip entity extraction (plain text)
  • Batch upload related files for cross-file entity linking

Full documentation: /media/lvk/4tbexfat/common/soin/MARKDOWN_FORMAT.md