soin

File preview

Select a file from the right panel to preview its content.

No files selected. Click "Add Files" to begin.

Click "Compute" to generate the embedding map. Needs at least 2 ingested entries.

Selected entry

Click a dot on the map to see the full text and extracted entities.

All ingested entries

Entry detail

Click an entry to see full text and entities.

Types: person org event place artwork date Color by community Min connections:

Click "Load Graph" to visualize the entity network.

Entity detail

Click a node to see entity details, relations, and entries.

Extraction result

API Keys

Anthropic

Mistral

Brave Search

Fetch URL (optional)

Or paste page text

People & CV results

Extract names from an institutional page to begin.

Seed URL (gallery, institution, biennial...)

Or paste page text

Preview

Soin works with structured markdown files extracted from art world websites. Each file should follow this format:

Header Block

# [ID] — [Source Name]
- **URL**: https://example.com/page
- **Source type**: institution
- **Location**: City, Country
- **Description**: Brief description

Source types: institution, gallery, festival, residency, prize, collective, project, network

People Section

## People

### Full Name
- **Role**: Position — Organization Name
- **Other affiliations**: Org 1, Org 2
- **Bio**: Background info
- **Works/shows**: Artwork titles

Organizations Section

## Organizations
- Organization Name (COUNTRY_CODE) -- description
- Multi-country: Org (FR/BE) -- description

Use ISO country codes: FR, AT, DE, US, IT, etc.

Artworks Section

## Artworks

### Title (year)
- **By**: Artist Name
- **Medium**: installation, video, etc.
- **Shown at**: Venue Name

Exhibitions Section

## Exhibitions
- Exhibition Name (year) — Venue, City

What Soin Extracts

Entities: People, organizations, places, events, artworks
Relations: member-of, affiliated-with, created-by, exhibited-at, located-in, co-occurs
Deduplication: Same entities across files are automatically merged

Discover Tab — CV Confidence Scoring

Each search result is scored 0–100 on how likely it is an artist CV:

Year patterns (up to +35): CVs contain many years (exhibitions, education). 20+ years = +35, 10+ = +25, 5+ = +15
CV keywords (up to +35): "solo exhibition", "residency", "education", "award", "bibliography", "born in", etc. 8+ hits = +35
List structure (up to +20): Lines starting with years or bullets. 15+ list lines = +20
Length bonus (up to +10): Ideal CV length is 2K–30K chars
PDF bonus (+15): PDFs are commonly CVs
URL contains "cv" (+15): URLs with /cv, /curriculum, =cv

Extract High-Confidence picks the single best result per person (score >= 70), preferring PDFs and CV-containing URLs. Brave Search API: free tier = 1000 queries/month ($5 plan). Counter shown next to the button.

Graph Tab

Delete Entity: Click a node, then "Delete Entity" in the detail panel. Removes the entity and all its relations (entries are kept).
Merge Entities: Click a node, then "Select for Merge". Click a second node and "Select for Merge" again. A merge bar appears — click "Merge" to combine them. The first entity is merged into the second (target keeps its name).
Exclude Names: Type comma-separated names in the "Exclude names" input to hide specific entities (e.g., "wikipedia, artforum").
"UNKNOWN" entities are always filtered out automatically.

Tips

Section names are flexible: "People", "Team Members", "Artworks / Projects" all work
All header fields except title are optional
Source types are optional — sources don't need to be categorized
Files with no sections are treated as single entries
Unknown sections skip entity extraction (plain text)
Batch upload related files for cross-file entity linking

Full documentation: /media/lvk/4tbexfat/common/soin/MARKDOWN_FORMAT.md