hensei-api/docs/parsers.md

468 lines
No EOL
9.5 KiB
Markdown

# Wiki Parsers Documentation
The parser system extracts and processes data from the Granblue Fantasy Wiki. It fetches wiki pages, parses wikitext format, and extracts structured data for characters, weapons, and summons.
## Architecture
### Base Parser
All parsers inherit from `BaseParser` which provides:
- Wiki page fetching via MediaWiki API
- Redirect handling
- Wikitext parsing
- Template extraction
- Error handling and debugging
- Local cache support
### Wiki Client
The `Wiki` class handles API communication:
- MediaWiki API integration
- Page content fetching
- Redirect detection
- Rate limiting
- Error handling
### Available Parsers
#### CharacterParser
Extracts character data from wiki pages.
**Extracted Data:**
- Character stats (HP, ATK)
- Skills and abilities
- Charge attack details
- Voice actor information
- Release dates
- Character metadata
**Usage:**
```ruby
character = Character.find_by(granblue_id: "3040001000")
parser = Granblue::Parsers::CharacterParser.new(character)
# Fetch and parse wiki data
data = parser.fetch(save: false)
# Fetch, parse, and save to database
parser.fetch(save: true)
# Use local cached wiki data
parser = Granblue::Parsers::CharacterParser.new(character, use_local: true)
data = parser.fetch
```
#### WeaponParser
Extracts weapon data from wiki pages.
**Extracted Data:**
- Weapon stats (HP, ATK)
- Weapon skills
- Ougi (charge attack) effects
- Crafting requirements
- Upgrade materials
**Usage:**
```ruby
weapon = Weapon.find_by(granblue_id: "1040001000")
parser = Granblue::Parsers::WeaponParser.new(weapon)
data = parser.fetch(save: true)
```
#### SummonParser
Extracts summon data from wiki pages.
**Extracted Data:**
- Summon stats (HP, ATK)
- Call effects
- Aura effects
- Cooldown information
- Sub-aura details
**Usage:**
```ruby
summon = Summon.find_by(granblue_id: "2040001000")
parser = Granblue::Parsers::SummonParser.new(summon)
data = parser.fetch(save: true)
```
#### CharacterSkillParser
Parses individual character skills.
**Extracted Data:**
- Skill name and description
- Cooldown and duration
- Effect values by level
- Skill upgrade requirements
**Usage:**
```ruby
parser = Granblue::Parsers::CharacterSkillParser.new(skill_text)
skill_data = parser.parse
```
#### WeaponSkillParser
Parses weapon skill information.
**Extracted Data:**
- Skill name and type
- Effect percentages
- Skill level scaling
- Awakening effects
**Usage:**
```ruby
parser = Granblue::Parsers::WeaponSkillParser.new(skill_text)
skill_data = parser.parse
```
## Rake Tasks
### Fetch Wiki Data
```bash
# Fetch all characters
rake granblue:fetch_wiki_data
# Fetch specific type
rake granblue:fetch_wiki_data type=Weapon
rake granblue:fetch_wiki_data type=Summon
# Fetch specific item
rake granblue:fetch_wiki_data type=Character id=3040001000
# Force re-fetch even if data exists
rake granblue:fetch_wiki_data force=true
```
### Parameters
| Parameter | Values | Default | Description |
|-----------|--------|---------|-------------|
| `type` | Character, Weapon, Summon | Character | Type of object to fetch |
| `id` | Granblue ID | all | Specific item or all |
| `force` | true/false | false | Re-fetch even if wiki_raw exists |
## Wiki Data Storage
### Database Fields
Each model has wiki-related fields:
- `wiki_en` - English wiki page name
- `wiki_jp` - Japanese wiki page name (if available)
- `wiki_raw` - Raw wikitext cache
- `wiki_updated_at` - Last fetch timestamp
### Caching Strategy
1. **Initial Fetch**: Wiki data fetched from API
2. **Raw Storage**: Wikitext stored in `wiki_raw`
3. **Local Parsing**: Parsers use cached data when available
4. **Refresh**: Force flag bypasses cache
## Wikitext Format
### Templates
Wiki pages use templates for structured data:
```
{{Character
|id=3040001000
|name=Katalina
|element=Water
|rarity=SSR
|hp=1680
|atk=7200
}}
```
### Tables
Stats and skills in table format:
```
{| class="wikitable"
! Level !! HP !! ATK
|-
| 1 || 280 || 1200
|-
| 100 || 1680 || 7200
|}
```
### Skills
Skill descriptions with effects:
```
|skill1_name = Blade of Light
|skill1_desc = 400% Water damage to one enemy
|skill1_cd = 7 turns
```
## Parser Implementation
### Basic Parser Structure
```ruby
module Granblue
module Parsers
class CustomParser < BaseParser
def parse_content(wikitext)
data = {}
# Extract template data
template = extract_template(wikitext)
data[:name] = template['name']
data[:element] = parse_element(template['element'])
# Parse tables
tables = extract_tables(wikitext)
data[:stats] = parse_stat_table(tables.first)
# Parse skills
data[:skills] = parse_skills(wikitext)
data
end
private
def parse_element(element_text)
case element_text.downcase
when 'fire' then 2
when 'water' then 3
when 'earth' then 4
when 'wind' then 1
when 'light' then 6
when 'dark' then 5
else 0
end
end
end
end
end
```
### Template Extraction
```ruby
def extract_template(wikitext)
template_match = wikitext.match(/\{\{(\w+)(.*?)\}\}/m)
return {} unless template_match
template_name = template_match[1]
template_content = template_match[2]
params = {}
template_content.scan(/\|(\w+)\s*=\s*([^\|]*)/) do |key, value|
params[key] = value.strip
end
params
end
```
### Table Parsing
```ruby
def extract_tables(wikitext)
tables = []
wikitext.scan(/\{\|.*?\|\}/m) do |table|
rows = []
table.scan(/\|-\s*(.*?)(?=\|-|\|\})/m) do |row|
cells = row[0].split('||').map(&:strip)
rows << cells unless cells.empty?
end
tables << rows
end
tables
end
```
## Error Handling
### Redirect Handling
When a page redirects:
```ruby
# Automatic redirect detection
redirect_match = wikitext.match(/#REDIRECT \[\[(.*?)\]\]/)
if redirect_match
# Update wiki_en to new page
object.update!(wiki_en: redirect_match[1])
# Fetch new page
fetch_wiki_info(redirect_match[1])
end
```
### API Errors
Common errors and handling:
```ruby
begin
response = wiki_client.fetch(page_name)
rescue Net::ReadTimeout
Rails.logger.error "Wiki API timeout for #{page_name}"
return nil
rescue JSON::ParserError => e
Rails.logger.error "Invalid wiki response: #{e.message}"
return nil
end
```
### Parse Errors
Safe parsing with defaults:
```ruby
def safe_parse_integer(value, default = 0)
Integer(value.to_s.gsub(/[^\d]/, ''))
rescue ArgumentError
default
end
```
## Best Practices
### 1. Cache Wiki Data
```bash
# Fetch and cache all wiki data first
rake granblue:fetch_wiki_data type=Character
rake granblue:fetch_wiki_data type=Weapon
rake granblue:fetch_wiki_data type=Summon
# Then parse using cached data
parser = CharacterParser.new(character, use_local: true)
```
### 2. Handle Missing Pages
```ruby
if object.wiki_en.blank?
Rails.logger.warn "No wiki page for #{object.name_en}"
return nil
end
```
### 3. Validate Parsed Data
```ruby
data = parser.fetch
if data[:hp].nil? || data[:atk].nil?
Rails.logger.error "Missing required stats for #{object.name_en}"
end
```
### 4. Rate Limiting
```ruby
# Add delays between requests
objects.each do |object|
parser = CharacterParser.new(object)
parser.fetch
sleep(1) # Respect wiki rate limits
end
```
### 5. Error Recovery
```ruby
begin
data = parser.fetch(save: true)
rescue => e
Rails.logger.error "Parse failed: #{e.message}"
# Try with cached data
parser = CharacterParser.new(object, use_local: true)
data = parser.fetch
end
```
## Debugging
### Enable Debug Mode
```ruby
parser = Granblue::Parsers::CharacterParser.new(
character,
debug: true
)
data = parser.fetch
```
Debug output shows:
- API requests made
- Template data extracted
- Parsing steps
- Data transformations
### Inspect Raw Wiki Data
```ruby
# In Rails console
character = Character.find_by(granblue_id: "3040001000")
puts character.wiki_raw
# Check for specific content
character.wiki_raw.include?("charge_attack")
```
### Test Parsing
```ruby
# Test with sample wikitext
sample = "{{Character|name=Test|hp=1000}}"
parser = CharacterParser.new(character)
data = parser.parse_content(sample)
```
## Advanced Usage
### Custom Field Extraction
```ruby
class CustomParser < BaseParser
def parse_custom_field(wikitext)
# Extract custom pattern
if match = wikitext.match(/custom_pattern:\s*(.+)/)
match[1].strip
end
end
end
```
### Batch Processing
```ruby
# Process in batches to avoid memory issues
Character.find_in_batches(batch_size: 100) do |batch|
batch.each do |character|
next if character.wiki_raw.present?
parser = CharacterParser.new(character)
parser.fetch(save: true)
sleep(1)
end
end
```
### Parallel Processing
```ruby
require 'parallel'
characters = Character.where(wiki_raw: nil)
Parallel.each(characters, in_threads: 4) do |character|
ActiveRecord::Base.connection_pool.with_connection do
parser = CharacterParser.new(character)
parser.fetch(save: true)
end
end
```
## Troubleshooting
### Wiki Page Not Found
1. Verify wiki_en field has correct page name
2. Check for redirects on wiki
3. Try searching wiki manually
4. Update wiki_en if page moved
### Parsing Returns Empty Data
1. Check wiki_raw has content
2. Verify template format hasn't changed
3. Enable debug mode to see parsing steps
4. Check for wiki page format changes
### API Timeouts
1. Increase timeout in Wiki client
2. Add retry logic
3. Use cached data when available
4. Process in smaller batches
### Data Inconsistencies
1. Force re-fetch with `force=true`
2. Clear wiki_raw and re-fetch
3. Check wiki edit history for changes
4. Compare with other items of same type