Multi-modal Prediction
Age 55-89 | FreeSurfer + Cognitive Assessment
MRI: 56 brain regions (FreeSurfer), Total Intracranial Volume
RAVLT Memory: Learning trials (A1-A7), interference (B1), delayed recall
Language: Naming & comprehension tasks (accuracy + reaction time)
Speech Fluency: 154 variables (pause, phonation, rate, voicing)
Demographics: Age, sex, education, MoCA, MMSE, depression
Dataset: 12,992 rows × 76 columns
Structure: 232 participants × 56 brain regions = 12,992 observations
Format: Each participant has 56 rows (one per brain region)
| ID | Region | GM_Vol | age | sex | MoCA | naming_acc_% | RAVLT_A1 | RAVLT_A5 | RAVLT_A7 |
|---|---|---|---|---|---|---|---|---|---|
| AA1010 | lSupFroG | 29.9 | 74 | F | 24 | 90% | 7 | 14 | 12 |
| AA1010 | rSupFroG | 28.1 | 74 | F | 24 | 90% | 7 | 14 | 12 |
| AA1010 | lMidFroG | 20.9 | 74 | F | 24 | 90% | 7 | 14 | 12 |
| ... 53 more brain regions for AA1010 ... | |||||||||
| AB4388 | lSupFroG | 31.2 | 76 | F | 26 | 100% | 5 | 11 | 9 |
| AB4388 | rSupFroG | 29.8 | 76 | F | 26 | 100% | 5 | 11 | 9 |
Note: Cognitive scores (MoCA, Naming, RAVLT) are repeated across all brain regions for each participant. Brain volume varies by region.
| Variable | Range | Mean ± SD | Missing |
|---|---|---|---|
| Age | 55 - 89 years | 70.2 ± 7.0 | 0% |
| Education | 0 - 21 years | 14.4 ± 2.7 | 0.9% |
| MoCA (cognitive screening) | 13 - 29 / 30 | 23.9 ± 3.2 | 0.4% |
| Naming accuracy | 3% - 100% | 89.0 ± 14.2% | 9.1% |
| RAVLT A1 (first trial) | 1 - 10 words | 5.1 ± 1.7 | 0% |
| RAVLT A5 (fifth trial) | 0 - 15 words | 10.8 ± 2.9 | 0% |
| RAVLT A7 (delayed recall) | 0 - 15 words | 8.8 ± 3.6 | 0% |
| Gray matter volume | 1 - 107 mm³ | Varies by region | 0% |
| Scenario | Data Merged | N | Purpose |
|---|---|---|---|
| Scenario 1 | MRI + RAVLT + Language | 232 | Brain-memory-language |
| Scenario 2 | MRI + Language + Speech | 197 | Brain-speech production |
| Scenario 3 | All modalities | 183 | Rigorous multi-modal validation |
Data Structure: Long format with 12,992 rows (232 participants × 56 brain regions)
Coverage: 112 MRI regions + RAVLT memory scores + Language tasks (naming, comprehension) + Demographics
Design: Phase 1-3 (9 initial tasks, 42 models) + Phase 4 (18 new tasks)
| Description | N | Features | Target | Type | Best Model | Performance |
|---|---|---|---|---|---|---|
| MRI → MoCA | 231 | 56 GM + 56 WM regions | MoCA total score | Regression | ExtraTrees | R²=0.126 |
| MRI → MoCA<26 | 232 | 56 GM + 56 WM regions | MoCA < 26 | Classification | ExtraTrees | AUC=0.658 |
| MRI → Naming accuracy | 211 | 56 GM + 56 WM regions | naming_accuracy_% | Regression | ElasticNet | R²=0.016 |
| MRI → Naming binary | 232 | 56 GM + 56 WM regions | naming_acc < 80% | Classification | GaussianNB | AUC=0.615 |
| Cognition → Naming | 210 | MoCA + MMSE + age | naming_accuracy_% | Regression | ElasticNet | R²=0.098 |
| MRI+Cog → RAVLT A5 | 231 | 56 GM + 56 WM + MoCA + age | RAVLT_A5 (trial 5) | Regression | ExtraTrees | R²=0.195 |
| MRI+Demo → Age group | 231 | 56 GM + 56 WM + sex + edu | age_group (<70 vs ≥70) | Classification | SVC-RBF | AUC=0.768 ✓ |
| MRI clustering | 232 | 56 GM + 56 WM regions | - | Clustering | KMeans-2 | Sil=0.223 |
| Time+MRI → RAVLT A5 | 232 | 56 GM + 56 WM + time_diff | RAVLT_A5 | Regression | ExtraTrees | R²=-0.003 |
Phase 1-3 success rate: 1/9 (11%) | ~150 experiments (9 tasks × 18 models)
| Category | Description | N | Features | Target | Type | Performance |
|---|---|---|---|---|---|---|
| Behavioral | A7 delayed recall prediction | 204 | A1-A5,B6+naming/comp acc+MoCA+MMSE+age+edu+sex | RAVLT_A7 | Regression | R²=0.782 ✓ |
| RAVLT learning curve clustering | 232 | A1,A2,A3,A4,A5,A7,A8,B6 | cluster_label | Clustering | Sil=0.405 ✓ | |
| Fast/slow learners | 232 | A1,A2,A3,A4,A5,A7,A8,B6 | learning_speed (binary) | Classification | AUC=0.69 | |
| RT consistency analysis | 211 | naming_rt, comprehension_rt | Pearson correlation | Correlation | r=0.382*** ✓ | |
| RT-accuracy tradeoff × age | 210 | naming_rt, naming_acc, age | RT×age interaction | ANCOVA | t=2.87** ✓ | |
| Memory → Language | 210 | A1,A2,A3,A4,A5,A7,A8,B6 | naming_accuracy_% | Regression | R²=0.08 | |
| Education × Age interaction | 231 | edu_years, age, edu×age | MoCA total | Regression | R²=0.07 | |
| MRI refined | ROI-specific prediction | 231 | Temporal+IFG GM regions | naming_accuracy_% | Regression | R²<0 |
| Gray matter vs white matter | 231 | 56 GM vs 56 WM (separate models) | MoCA total | Regression | R²<0 | |
| Multi-modal | Age → Hippocampus → Memory mediation | 231 | age → hippocampus_GM → RAVLT_A5 | indirect_effect | Mediation | 27.2% ⚠ |
Phase 4 success rate: 4/10 analyzed (40%) | Key pattern: Behavioral 40% vs MRI 0%
Overall Scenario 1: 4/27 tasks succeeded (14.8%)
Data Structure: Long format with 11,032 rows (197 participants × 56 brain regions)
Coverage: 112 MRI regions + 154 speech features + Language tasks + HADS mental health
Design: 7 categories (40 tasks × 11 models) | 3-round methodological audit
| Description | Features | Target | Type | Best Model | Performance |
|---|---|---|---|---|---|
| MRI → Speech rate | 56 GM + 56 WM regions | speech_rate (words/sec) | Regression | - | R² < 0 |
| MRI → Speech clarity | 56 GM + 56 WM regions | speech_clarity_score | Regression | - | R² < 0 |
| MRI → Naming accuracy | Temporal+IFG GM regions | naming_accuracy_% | Regression | - | R² < 0 |
| MRI → Age group | 56 GM + 56 WM regions | age_group (<70 vs ≥70) | Classification | LogisticReg | AUC=0.759 ✓ |
| MRI → Cognitive risk | 56 GM + 56 WM regions | cognitive_risk (MoCA<26 | naming<80%) | Classification | - | AUC < 0.65 |
| MRI → Depression risk | 56 GM + 56 WM regions | depression_risk (HADS_D ≥ 8) | Classification | Lasso | AUC=0.655 |
| Description | Features | Target | Type | Best Model | Performance |
|---|---|---|---|---|---|
| Speech → Brain health index | 154 speech fluency vars (pause/phonation/rate/voicing) | brain_health_index (MRI composite) | Regression | - | R² < 0 |
| Behavior → Cognitive risk | 154 speech + naming/comp + MoCA + demographics | cognitive_risk (binary) | Classification | AdaBoost | AUC=0.685 ✓ |
| Speech → Frontal GM | 154 speech fluency vars | frontal_lobe_GM_volume | Regression | - | R² < 0 |
| Speech → Temporal GM | 154 speech fluency vars | temporal_lobe_GM_volume | Regression | - | R² < 0 |
| Behavior → Age | 154 speech + naming/comp + MoCA | age (continuous) | Regression | - | R² < 0 |
| Speech rate → Gender | speech_rate (single feature) | sex (M/F) | Classification | ElasticNet | AUC=0.669 |
Category A+B: 3/14 tasks succeeded (21%) | Pattern: MRI predicts age well, behavior→brain mostly fails
| Description | Features | Target | Type | Best Model | Performance |
|---|---|---|---|---|---|
| Speech → HADS Depression score | 154 speech fluency vars | HADS_depression_score (0-21) | Regression | - | R² < 0 |
| Speech → HADS Anxiety score | 154 speech fluency vars | HADS_anxiety_score (0-21) | Regression | - | R² < 0 |
| Pause composite → Depression risk | pause_composite (single feature) | depression_risk (HADS_D ≥ 8) | Classification | Multiple | AUC=0.621 ✓ MOST RELIABLE |
| MRI+Speech → Depression risk | 56 GM + 56 WM + selected speech vars (60+ total) | depression_risk (HADS_D ≥ 8) | Classification | - | AUC=0.640 |
| Mental health → Speech | HADS_depression + HADS_anxiety | 154 speech features (multi-target) | Regression | - | R² < 0 |
| Demographics+Mental → Cognitive risk | age + sex + edu + HADS_D + HADS_A | cognitive_risk (binary) | Classification | - | AUC < 0.65 |
Category E: 1/6 tasks succeeded (17%) | Depression detectable from speech pauses
| Category | Features → Target | Count | Type | Success | Result |
|---|---|---|---|---|---|
| C: Speech → Language/Cognition | 154 speech fluency → naming_acc, comp_acc, naming_rt, comp_rt | 8 | Reg/Class | 0/8 | All failed (R² < 0) |
| D: Language → Speech | naming_acc, comp_acc, naming_rt, comp_rt → speech_rate/clarity/fluency | 4 | Regression | 0/4 | All failed (R² < 0) |
| F: Multi-modal Fusion | (56 GM + 56 WM) + 154 speech + naming/comp → cognitive_risk | 5 | Mixed | 1/5 | Cognition (AUC=0.695) ✓ |
| G: Simple Baseline | age + sex → age_group, brain_health, speech_disfluency | 3 | Reg/Class | 0/3 | All failed (corrected) |
Overall Scenario 2: 2/40 tasks succeeded (5.0%) | 428 experiments (40 tasks × 11 models)
Methodological note: Results affected by 5 types of data leakage; extensive corrections applied in 3-round audit
Data Structure: Long format with 10,864 rows → 194 participants × 230+ variables
Complete multi-modal: All sources merged (MRI + RAVLT + Language + Speech + Demographics)
Goal: Gold-standard methodology with 5×5 nested CV, permutation testing, multiple comparison correction
| Tier | Purpose | Method |
|---|---|---|
| Tier 1 Confirmatory |
Verify Scenario 1 findings with strict controls | Bonferroni correction Permutation testing (200 iter) |
| Tier 2 Exploratory |
Test new hypotheses on learning efficiency | FDR correction 95% confidence intervals |
| Tier 3 Fusion |
Test multi-modal integration benefits | Nested cross-validation Multiple random seeds |
✓ 5×5 nested cross-validation (vs basic 5-fold)
✓ Pre-planned corrections (vs post-hoc)
✓ Pre-emptive leakage prevention (vs post-detection)
| Tier | Features | Target | N | Result |
|---|---|---|---|---|
| Tier 1: Confirmatory |
A1,A2,A3,A4,A5,A8,B6 + naming_acc + comp_acc + MoCA + MMSE + age + edu | A7 (delayed recall) | 183 | R² = 0.8148 ✓ |
| MoCA + MMSE + age + sex + edu | depression_risk (HADS_D ≥ 8) | 183 | AUC = 0.62 ✓ | |
| 20 selected MRI regions (GM volumes) | A7 (delayed recall) | 183 | R² = -0.03 | |
| 20 MRI + A1,A2,A3,A4,A5,A8,B6 + naming/comp + demographics (33 total) | A7 (delayed recall) | 183 | R² = 0.81 (no gain) | |
| Tier 2: Exploratory |
learning_efficiency = (A5-A1)/4 | learning_efficiency_score | 194 | R² = 0.41 ✓ |
| naming_rt | comprehension_rt (correlation) | 189 | r = 0.38 ✓ | |
| L_temporal_GM - R_temporal_GM (asymmetry) | language_lateralization | 189 | R² < 0 | |
| age_group (55-65 vs 66-75 vs 76-89) | A7, MoCA, naming (group comparisons) | 194 | No age effects | |
| Tier 3: Fusion |
20 MRI + RAVLT + Language + 154 Speech (200+ total) | A7 (delayed recall) | 183 | R² = 0.78 (worse) |
| 20 MRI + pause_composite + speech_rate (22 total) | depression_risk | 183 | AUC = 0.59 (worse) |
Success rate: 4/10 (40.0%) | Pattern: Behavioral-only succeeds, MRI fails, multi-modal hurts
Cognitive/behavioral features (RAVLT, language) consistently outperform brain morphometry for predicting cognitive outcomes in healthy aging.
MRI succeeds for discrete outcomes (age group: AUC=0.76) but fails for continuous prediction (all R² < 0).
Adding MRI to behavioral features provides no benefit and often degrades performance due to overfitting (high p/n ratios).
5 types identified in Scenario 2. Rigorous pre-emptive design essential.
N=183-232 limits feature capacity. More features ≠ better prediction.
Nested CV + permutation testing + correction methods reduce false positives.
Best result: R²=0.81 for RAVLT A7 prediction using behavioral features only