1.1.5. Reconciliation Overview¶
Understanding and using WHG’s reconciliation system to link your data with existing place records.
1.1.5.1. Note to Documentation Team¶
Reconciliation is critical but conceptually challenging. Consider:
Visual flowcharts showing reconciliation workflow
Before/after examples of reconciled vs unreconciled data
Interactive tutorial allowing users to practice reconciliation decisions
Decision trees: “Should I accept this match?”
Common pitfalls and how to avoid them
Metrics showing value of reconciliation (network effects)
Case studies of successful reconciliation projects
Video demonstrations of reconciliation UI
Comparison to other reconciliation services (OpenRefine, etc.)
Technical documentation for API-based reconciliation
Explanation of matching algorithms (without overwhelming users)
Discussion of false positives vs false negatives tradeoffs
1.1.5.2. What is Reconciliation?¶
Definition: Reconciliation is the process of identifying when your place data refers to the same place as an existing WHG record, and creating a link between them.
Why It Matters:
Enrichment: Your places inherit context from linked records
Disambiguation: Clarify which “Paris” or “Springfield” you mean
Network Effects: Your data becomes part of the knowledge graph
Discovery: Your data appears when others search for that place
Research Value: Enables cross-dataset analysis
Reduce Duplication: Avoid creating redundant records
Analogy: Think of reconciliation like connecting your family tree to a genealogical database - you identify where your ancestors match people already in the system, enriching both datasets.
1.1.5.3. Reconciliation in Practice¶
1.1.5.3.1. Example Scenario¶
You have a dataset of 200 medieval trading posts with:
Names in various languages
Approximate coordinates
Date ranges
Without Reconciliation:
Your 200 places exist in isolation
Duplicate records may exist in WHG
No connections to other datasets
Limited discoverability
With Reconciliation:
150 match existing WHG places → linked
50 are new → added as new records
Your data now:
Inherits alternate names from other sources
Connects to trade networks
Appears in searches for those places
Contributes to scholarly consensus
1.1.5.4. The Reconciliation Workflow¶
1.1.5.4.1. High-Level Process¶
1. Upload Your Data
↓
2. WHG Suggests Matches (automated)
↓
3. You Review Suggestions (manual)
↓
4. Accept, Reject, or Defer Each Match
↓
5. Unmatched Records → New Places
↓
6. Matched Records → Linked to Existing
↓
7. Publication
1.1.5.4.2. Step 1: Automated Matching¶
When you upload data, WHG automatically:
Searches for potential matches using:
Name similarity (fuzzy matching, transliteration)
Geographic proximity
Temporal overlap
Type similarity
External identifier matches (if you provide them)
Generates match confidence scores:
0.95 - 1.0: Very likely match
0.80 - 0.95: Probable match
0.60 - 0.80: Possible match
Below 0.60: Unlikely (not shown by default)
Presents ranked suggestions:
Top 1-5 candidates per place
With evidence for each match
Visual comparison tools
1.1.5.4.3. Step 2: Manual Review¶
You review each suggested match and decide:
Accept:
Yes, this is the same place
Creates link between your record and WHG record
Your attestations are added to that place
Reject:
No, these are different places
Your place will become a new record
Rejection is recorded to improve matching
Defer:
Unsure, need more research
Mark for later review
Record remains unprocessed
Split Decision:
Partially the same (rare)
Some attributes match, others don’t
Requires consultation with WHG team
1.1.5.4.4. Step 3: Post-Reconciliation Actions¶
After review:
For Matched Places:
Your data contributes attestations
Place gets richer, more comprehensive
You’re credited as contributor
For Unmatched Places:
New place records created
Still searchable and discoverable
May match future contributions
For Deferred:
Saved for later review
Can consult with experts
Not blocking publication (if you choose)
1.1.5.5. Reconciliation Strategies¶
1.1.5.5.1. Strategy 1: Conservative (High Precision)¶
Approach: Only accept very confident matches (≥0.9)
Best For:
Datasets where precision is critical
When false positives would be problematic
Initial passes on unfamiliar data
Tradeoff: May miss valid matches → more new records created
1.1.5.5.2. Strategy 2: Aggressive (High Recall)¶
Approach: Accept lower confidence matches (≥0.7)
Best For:
Well-known, unambiguous places
When you can verify matches externally
Datasets with reliable coordinates
Tradeoff: May create false positive matches → requires careful review
1.1.5.5.3. Strategy 3: Mixed¶
Approach:
Auto-accept very high confidence (≥0.95)
Manually review medium confidence (0.75-0.95)
Auto-reject low confidence (<0.75)
Best For: Most use cases
Tradeoff: Balanced, but requires time investment
1.1.5.5.4. Strategy 4: Identifier-Based¶
Approach: If you have external identifiers (Pleiades, GeoNames, Wikidata), use them
Best For:
Datasets derived from existing gazetteers
Data already linked to authority files
Tradeoff: Limited to places with existing identifiers
1.1.5.6. What Makes a Good Match?¶
1.1.5.6.1. Strong Evidence For Match¶
✅ Name Match: Identical or very similar name ✅ Geographic Proximity: Within expected margin of error ✅ Temporal Overlap: Date ranges overlap or are adjacent ✅ Type Agreement: Same or compatible place types ✅ Source Overlap: Same external sources mentioned ✅ External ID: Matching Pleiades/GeoNames/Wikidata ID
Example Good Match:
Your Record:
Name: "Konstantinopolis"
Coordinates: 41.01°N, 28.98°E
Dates: 330-1453 CE
Type: City
WHG Candidate:
Name: "Constantinople"
Coordinates: 41.01°N, 28.98°E (± 1km)
Dates: 330-1930 CE
Type: Imperial Capital
Match Score: 0.98 → ACCEPT
1.1.5.6.2. Weak Evidence Against Match¶
⚠️ Name Similarity: Common names (e.g., “Springfield”, “San Juan”) ⚠️ Geographic Distance: >50km apart (depends on precision) ⚠️ Temporal Gap: No overlap in date ranges ⚠️ Type Mismatch: Incompatible types (city vs river) ⚠️ Source Conflict: Sources explicitly distinguish them
Example False Positive:
Your Record:
Name: "Alexandria"
Coordinates: 31.20°N, 29.92°E (± 50km)
Dates: 300 BCE - 700 CE
Type: City
WHG Candidate:
Name: "Alexandria"
Coordinates: 38.80°N, -77.05°E
Dates: 1749 CE - present
Type: City
Match Score: 0.75 (name match, but wrong location)
→ REJECT (different places, same name)
1.1.5.7. Handling Ambiguous Cases¶
1.1.5.7.1. Ambiguity Type 1: Name Reuse¶
Problem: Same name applied to different places
Example: Multiple cities named “Alexandria” founded by Alexander
Solution:
Check geographic coordinates carefully
Verify temporal context
Look at type and regional context
Consult external sources
Decision Rule: When in doubt, reject - false negative better than false positive
1.1.5.7.2. Ambiguity Type 2: Coordinate Uncertainty¶
Problem: Historical coordinates often imprecise
Example: Your data says “45.5°N, 10.2°E ± 20km”, candidate is at 45.6°N, 10.3°E
Solution:
Consider uncertainty margins
Look for other evidence (names, dates)
Check if uncertainty regions overlap
Decision Rule: If uncertainty regions overlap AND other evidence aligns, accept
1.1.5.7.3. Ambiguity Type 3: Temporal Edge Cases¶
Problem: Uncertain whether date ranges represent same vs successive places
Example:
Your record: “Sarai” (1250-1395 CE)
Candidate: “New Sarai” (1380-1502 CE)
Solution:
Research whether these are same place (name change) or different places (relocation)
Check sources for clarification
Consider “succeeded_by” relationship instead of equivalence
Decision Rule: Defer and consult sources; when in doubt, use succession relation
1.1.5.7.4. Ambiguity Type 4: Changing Boundaries¶
Problem: Place expanded, contracted, or moved over time
Example: A city whose limits changed dramatically
Solution:
Consider whether core identity remained constant
Multiple geometries can coexist on same place record
Accept match, contribute new temporal geometry
Decision Rule: Usually accept - temporal change is captured via attestations
1.1.5.8. Tools for Reconciliation¶
1.1.5.8.1. WHG Reconciliation Interface¶
Features:
Side-by-side comparison view
Map overlay showing coordinate distance
Timeline showing temporal overlap
Confidence score with explanation
Quick accept/reject/defer buttons
Bulk operations for confident matches
Notes field for decision rationale
See Manual Reconciliation for interface guide
1.1.5.8.2. Automated Reconciliation¶
For larger datasets:
Set confidence threshold
Auto-accept above threshold
Auto-reject below threshold
Manual review middle range
1.1.5.8.3. Reconciliation API¶
For programmatic access:
Submit place data
Receive match candidates
Return decisions
Integrated into your workflow
1.1.5.8.4. External Tools Integration¶
OpenRefine Reconciliation Service:
WHG provides OpenRefine-compatible endpoint
Reconcile in OpenRefine, then upload to WHG
Familiar interface for experienced users
1.1.5.9. Reconciliation Best Practices¶
1.1.5.9.1. Before You Start¶
Clean your data - Fix obvious errors in names, coordinates
Standardize formats - Consistent date formats, coordinate systems
Add external IDs if available - Pleiades, GeoNames, etc.
Document uncertainty - Mark uncertain coordinates/dates
Review a sample - Manually check 10-20 records to understand your data
1.1.5.9.2. During Reconciliation¶
Start with high-confidence matches - Build pattern recognition
Use filtering - Review similar types together
Take notes - Document ambiguous decisions
Take breaks - Decision fatigue is real
Consult sources - When uncertain, go back to original sources
Ask for help - Use WHG forums for difficult cases
Be conservative - When truly unsure, defer rather than guess
1.1.5.9.3. After Reconciliation¶
Review statistics - Match rate, rejection rate (red flags?)
Spot check - Review sample of accepted matches
Document methodology - For transparency and reproducibility
Report issues - If matching algorithm seems problematic
Update records - As you learn more, can revisit decisions
1.1.5.10. Common Pitfalls¶
1.1.5.10.1. Pitfall 1: Over-Trusting Automation¶
Problem: Accepting all high-score matches without review
Why Bad: Algorithms can be wrong; local knowledge matters
Solution: Always review, especially for historically complex regions
1.1.5.10.2. Pitfall 2: Under-Trusting Automation¶
Problem: Rejecting obvious matches due to minor discrepancies
Why Bad: Wastes opportunity for enrichment; creates duplicates
Solution: Allow for spelling variations, coordinate imprecision
1.1.5.10.4. Pitfall 4: Ignoring Temporal Context¶
Problem: Matching based on name alone without checking dates
Why Bad: Many places reuse names across time
Solution: Always verify temporal overlap
1.1.5.10.5. Pitfall 5: Batch Accepting Without Sampling¶
Problem: Auto-accepting hundreds of matches without spot-checking
Why Bad: Errors compound; hard to undo
Solution: Sample review 10% before bulk operations
1.1.5.11. Measuring Reconciliation Success¶
1.1.5.11.1. Key Metrics¶
Match Rate: Percentage of your records matched
Good: 60-80% for well-known places
Expected: 30-60% for specialized datasets
Low: <30% may indicate matching issues or genuinely new places
False Positive Rate: Incorrectly accepted matches
Goal: <5%
Monitor via post-review sampling
False Negative Rate: Incorrectly rejected matches
Harder to measure
Look for duplicates in published data
Review Time: Hours per 100 records
Varies widely: 0.5-5 hours
Decreases with experience
1.1.5.11.2. Quality Indicators¶
✅ Matched records have strong supporting evidence ✅ Rejection rationales documented ✅ Deferred cases have research notes ✅ No obvious duplicates post-publication ✅ Community feedback is positive
1.1.5.12. Getting Help with Reconciliation¶
1.1.5.12.1. Resources¶
Documentation: This guide and Manual Reconciliation
Video Tutorials: [link to video library]
Community Forum: Ask experienced contributors
Office Hours: Weekly reconciliation Q&A sessions
Consultation: Email reconciliation@whgazetteer.org
1.1.5.12.2. When to Seek Help¶
Unfamiliar historical/geographic context
Systematic matching problems
Ambiguous case patterns
Technical issues with interface
Large dataset (>1000 records) planning
1.1.5.13. Next Steps¶
Learn the Interface: → Manual Reconciliation Guide
Automate at Scale: → Automated Reconciliation
Use External Tools: → Reconciliation API
Start Practicing: → Reconciliation Tutorial
Review Matches: → Reviewing Matches