Methodology

How Pair works.

Pair is a directory of 1,790 ingredients and 9,304 ranked pairings, derived from the Epicure embeddings (Radzikowski & Chen, 2026).

§ I · The three siblings

The three sibling embeddings

Epicure trains three different 300-dimensional Metapath2Vec models on the same ingredient vocabulary but with different random-walk schemas. The result is three distinct lenses on the same 1,790 ingredients:

Cooc: Recipe co-occurrence. Walks the ingredient–ingredient NPMI graph derived from 4M recipes. Neighbours are ingredients that get cooked with the seed. Best for "what else do I cook with X."
Core: Blended. Typed FlavorDB compound walks blended with injected ingredient-ingredient walks. The middle ground: chemistry-aware but keeps recipe context.
Chem: Aroma chemistry. Pure FlavorDB compound metapaths. Neighbours are flavor-profile peers — things that share volatile compounds with the seed. Best for substitution.

§ II · Reading the scores

What the scores mean

All scores are cosine similarity in 300-D space, on the unit sphere. A score of 1.0 is identity; 0.0 is orthogonal (no learned relationship); negative scores are possible but rare in these embeddings.

Practical ranges differ by sibling. Cooc scores peak around 0.55 (recipes are sparse, so co-occurrence cosines stay moderate). Chem scores peak around 0.81 (FlavorDB compounds cluster tightly). A 0.40 Cooc and a 0.40 Chem are not the same strength of signal — they're at different percentiles of their own distribution.

§ III · Classification

The four quadrants

Every pair page classifies the relationship by where it sits on the cooc/chem grid. Thresholds are calibrated to the ~70th percentile per axis (cooc ≥ 0.30, chem ≥ 0.40):

Classichigh · high: Cooked together often, share aroma chemistry. The "of course" pairs.
Complementaryhigh · low: Cooked together despite chemistry differences. Work through contrast. The most editorially interesting category.
Substitutelow · high: Similar chemistry but rarely in the same dish. Candidates for one-for-one swaps.
Neutrallow · low: No strong signal either way. Combine creatively at your own risk.

§ IV · Editorial selection

Why we ranked these specific pairs

There are 1,601,155 possible ingredient pairs in the vocabulary. We can't materialize all of them as static pages, so we rank and surface the most interesting 9,304.

Stratified ranking: take the top 5,000 by Cooc score, plus the top 4,000 by Chem score, plus a 2,000-pair "complementarity bonus" tier (pairs whose Cooc − Chem gap is largest). Union, dedupe, keep the top result count. This guarantees every quadrant gets editorial surface area — naive max(cooc, chem) ranking starved the complementary category, which is the most interesting.

§ V · Caveats

What this is not

Not a recipe site. Pair tells you what relates to what; it doesn't give you the dish.
Not a chemistry lab. Chem scores come from FlavorDB compound co-membership, not from your specific batch of tomatoes. They're a useful prior, not ground truth.
Not editorial. Every score here is reproducible from the published embeddings. No hand-tuning, no taste preferences baked in.

§ VI · Data, license & attribution

Where the data comes from

The embeddings are the Epicure release by Radzikowski & Chen (KAIKAKU.AI), used under CC BY 4.0. We redistribute the published vectors with modifications — L2 normalisation, pairwise cosine scoring, the four-quadrant classification, and an editorial selection of ranked pairs. The vectors themselves are unaltered and not retrained.

Epicure was in turn trained on FlavorDB (Garg et al., 2018) for the chem/core chemistry signal, and on recipe corpora including RecipeNLG (Bień et al., 2020) for the cooc signal. Those source datasets carry their own (largely non-commercial) terms and are not redistributed here.

Radzikowski, J. & Chen, J. (2026). Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings. arXiv:2605.22391. Paper · Embeddings on HF