Pair
Appendix · descriptive statistics

The data behind the directory.

Two views, side by side: the full population of 1,601,155 candidate pairs (everything the embedding can produce), and the curated 9,304 pairs surfaced as pages. The curated set is a deliberate editorial slice — these numbers tell you how it differs from the underlying population, and where the biases come from.

Ingredients
1,790
Pairs scanned
1,601,155
Ranked & curated
9,304
Flavor modes
543
Source corpus: the Epicure embeddings were trained on 4,135,189 recipes across 8 languages (en, zh, ru, vi, es, tr, id, de). The language mix is uneven and this shows up in the data — Asian ingredients tend to appear more in dense cooc clusters than European or Latin American ones.

Where pairs sit, before any curation

Quadrant distribution — full population (1.6M pairs)

Across all candidate pairs, 99.04% are neutral (no strong signal either way). The "interesting" pairs — classic, complementary, substitute — together account for just 0.96%. The curated set below cherry-picks from these.

Classic: 2,912 · Complementary: 9,222 · Substitute: 3,244 · Neutral: 1,585,777
Classic2,912
Complementary9,222
Substitute3,244
Neutral1,585,777
Quadrant distribution — curated 9,304 (what the site exposes)

After stratified ranking, the curated set deliberately over-represents classic and complementary pairs. This is editorial choice, not a property of the data.

Classic: 2,551 · Complementary: 3,407 · Substitute: 1,837 · Neutral: 1,509
Classic2,551
Complementary3,407
Substitute1,837
Neutral1,509
Per-sibling score summary

Min / median / p95 / p99 / max for each cosine measure. Population row is the truth; curated row shows how aggressive the cut is. Negative cosines exist in all three — most users never see them because the curated set is far above the median.

SiblingSetMinMedianp95p99Max
Cooks with
cooc
Population-0.1800.0910.2210.2890.642
Curated0.0400.3360.4150.4650.642
Blended
core
Population-0.0880.3360.5310.6360.931
Curated-0.0310.6250.7950.8430.931
Tastes like
chem
Population-0.1390.1070.2340.3350.810
Curated-0.0840.3810.5680.6540.810
The shape of the curated subset

2,000 of the curated pairs plotted on (cooks-with, tastes-like). Sample is deterministic across builds. Dashed lines mark the quadrant thresholds (cooc ≥ 0.30, chem ≥ 0.40). The full 1.6M population would be one dense blob near the origin — this is the editorial selection on top.

Scatter of 2000 pairs on (cooks-with, tastes-like). 559 classic, 720 complementary, 392 substitute, 329 neutral. Quadrant thresholds: cooc 0.3, chem 0.4.ClassicComplementarySubstituteNeutral0.000.250.500.751.000.000.250.500.751.00Cooc — recipe co-occurrenceChem — aroma chemistry

Score distributions — population (top row) vs curated (bottom row)

Same x-axis range in each column so you can read the curation cut directly. Each sibling has its own range because the geometries differ. Out-of-range counts: cooc 0 · core 0 · chem 0.

Cooks with — population

All 1,601,155 pairs. Range [-0.20, 0.65].

Histogram, 30 bins from -0.20 to 0.65, 1601155 total observations, peak 284689 at 0.05–0.08.-0.20.20.7cosine
Cooks with — curated

9,304 pairs. Same range, so heights compare directly.

Histogram, 30 bins from -0.20 to 0.65, 9304 total observations, peak 2393 at 0.34–0.37.-0.20.20.7cosine
Blended — population

All 1,601,155 pairs. Range [-0.10, 0.95].

Histogram, 30 bins from -0.10 to 0.95, 1601155 total observations, peak 244016 at 0.29–0.32.-0.10.41.0cosine
Blended — curated

9,304 pairs. Same range, so heights compare directly.

Histogram, 30 bins from -0.10 to 0.95, 9304 total observations, peak 1033 at 0.67–0.71.-0.10.41.0cosine
Tastes like — population

All 1,601,155 pairs. Range [-0.15, 0.85].

Histogram, 30 bins from -0.15 to 0.85, 1601155 total observations, peak 399394 at 0.08–0.12.-0.10.30.8cosine
Tastes like — curated

9,304 pairs. Same range, so heights compare directly.

Histogram, 30 bins from -0.15 to 0.85, 9304 total observations, peak 1190 at 0.42–0.45.-0.10.30.8cosine
Most-present ingredients in the curated set

Number of curated pairs each ingredient appears in. NOT a measure of 'culinary centrality' — it's a measure of which ingredients survived the stratified top-9,304 cut. Asian ingredients dominate because the source corpus has dense cooc clusters around Asian dishes; reranking the curation tier would shift this list.

The 543 flavor modes

Modes by property axis

The paper labels each mode with an axis: either a continuous flavor property (sweet_score, nova_level, fatty_score…) or one of the emergent factor axes (F_0, F_3, …). Count per property tells you which axes were most prolific in the clustering.

cf_savory20
nova_level18
cf_minty17
F_817
bitter_score15
fatty_score15
F_015
F_315
F_915
F_1015
F_1315
sweet_score14
pungent_score14
F_714
F_1114
F_1214
cf_sweet13
cf_balsamic13
cf_citrus13
cf_woody13
F_213
F_413
umami_score12
usda_fiber_g12
F_112
cf_meaty11
sour_score11
F_611
F_1411
cf_earthy10
usda_protein_g10
usda_caloric_density10
fg_Vegetable10
F_1510
F_510
fg_Pantry9
fg_Beverage9
fg_Spice9
fg_Dairy9
usda_sugars_g8
fg_Grain8
F_198
fg_Fruit6
F_185
F_174
F_163
Does the embedding get culinary canon right?

Hand-curated list of obvious classic pairings. Each row shows the actual cosine scores and the quadrant the model assigns. Look for surprises — these reveal where the embedding's recipe corpus is weak.

Canon pairCooks withTastes likeBlendedVerdict
Tomato + Basil0.320.150.47Complementary
Lemon + Garlic0.260.130.45Neutral
Beef + Rosemary0.220.060.39Neutral
Chocolate + Strawberry0.310.320.38Complementary
Peanut + Chocolate0.150.180.35Neutral
Apple + Cinnamon0.390.220.46Complementary
Coffee + Cream0.340.340.31Complementary
Bacon + Egg0.110.240.44Neutral
Lime + CilantroNot in vocabulary
Honey + Thyme0.260.100.31Neutral
Tomato + Mozzarella cheese0.270.230.51Neutral
Chicken + Lemon0.140.090.28Neutral
Pork + Apple0.100.100.36Neutral
Miso + Ginger0.180.220.46Neutral
Soy sauce + Ginger0.470.230.55Complementary
Vanilla + Chocolate0.490.400.51Complementary
Mint + Lamb0.250.180.42Neutral
Fish + Lemon0.220.130.42Neutral
Onion + Garlic0.450.530.73Classic
Olive oil + Garlic0.400.280.69Complementary
Dill + Salmon0.270.170.39Neutral
Caraway + Rye flourNot in vocabulary
Fennel + Sausage0.130.130.44Neutral
Maple syrup + Bacon0.160.220.49Neutral
Banana + Peanut0.200.060.34Neutral

The corpus is heavily East-Asian-weighted (39% of recipes). Western European classics like lemon+garlic and beef+rosemary score weaker than intuition expects — this is the corpus bias showing through, not a model failure.

How much do the three siblings agree?

Pearson r over the full 1.6M pair population. Higher = the two siblings say similar things about pair strength. Lower = the three-lens framing has real signal.

Cooks ↔ Tastes
0.541
Moderate overlap
Cooks ↔ Blended
0.542
Moderate overlap
Tastes ↔ Blended
0.619
Moderate overlap

All three correlate moderately (0.540.62) — they don't collapse into one another, which justifies showing all three on every pair page. Blended (Core) correlates highest with both, as expected: it's designed to blend the signals.

Outlier pairs in the curated set

Highest Cooc — most-co-occurring

Often near-substitutes: things that always appear together because they're regional cousins.

Highest Chem — most aromatic overlap

The strongest aroma-chemistry matches in the curated set.

Biggest surprise — siblings disagree most

Largest |cooc − chem| gap. Where one lens says yes and the other says no.

Recipe corpus by cuisine — the bias source

The Epicure corpus is heavily skewed: East Asian recipes make up ~38% of the source data. This shapes every cooc-based statistic on this page.

RegionRecipesShareModesTraditions
East Asian1,549,03467.5%77Chinese, Korean
Western Atlantic198,0868.6%7American, British, German, Scandinavian
Mediterranean164,1077.1%41Italian, French, Iberian, Greek, Levantine, North African, Turkish
Eastern European154,4796.7%1Russian, Ukrainian, Polish, Hungarian, Georgian
Southeast Asian107,9644.7%6Thai, Vietnamese, Filipino, Indonesian, Malay
South Asian47,4622.1%8Indian, Pakistani, Sri Lankan, Bangladeshi
Latin American40,6181.8%14Mexican, Caribbean, Brazilian, Peruvian, Colombian
Japanese33,9231.5%2Japanese
Vocabulary by ingredient type

Rule-based classifier (regex over canonical names). 736 of 1790 (41%) didn't match any category — heuristics are conservative. Real coverage is better than this number suggests.

other736
vegetable152
fruit119
herb spice118
seafood86
sauce condiment82
grain starch80
cheese64
wine spirit63
oil fat53
dairy egg47
sweetener41
meat33
nut seed29
vinegar acid25
beverage25
poultry25
broth stock6
bean legume6
Pairs per ingredient in the curated set

Distribution of how many curated pairs each ingredient appears in. Of 1,790 ingredients, 1,671 appear in at least one curated pair; 119 are absent from the curated set (they exist in the vocab but no pair survived the cut).

Histogram, 30 bins from 0.00 to 72.00, 1671 total observations, peak 322 at 0.00–2.40.03672curated pairs per ingredient

Mode cohesion — how tight are the clusters?

Mean intra-mode cosine across all member pairs. Higher = members really do cluster together. Range: 0.10 (loosest single mode) to 0.61 (tightest); median 0.20.

What these numbers don't tell you

  • Recipe-language bias: the corpus is multilingual but uneven. Chinese and other Asian recipes are over-represented in the dense cooc clusters, which inflates Asian ingredients in every cooc-driven stat above.
  • Ingredient-cut bias: the vocab uses base entries (chicken, beef, pork) without cut-level subdivisions. "Chicken breast" and "chicken thigh" both fold into "chicken." Cuisines with fine-grained cut vocabulary lose detail.
  • Chemistry coverage: Chem is from FlavorDB, which catalogs aroma compounds for ~1,500 ingredients globally. Anything outside FlavorDB has a sparse Chem profile — particularly fermented and processed ingredients.
  • No quantitative recipe weight: the cooc graph treats "1 cup of garlic" and "1 clove of garlic" identically. It counts co-presence, not relative importance.
  • Quadrant thresholds are pragmatic: cooc ≥ 0.30, chem ≥ 0.40 are calibrated to ~70th percentiles per axis, not derived from theory. Adjusting them reshapes every quadrant count.

Last regenerated