hensei-api/docs/parsers.md

9.5 KiB

Wiki Parsers Documentation

The parser system extracts and processes data from the Granblue Fantasy Wiki. It fetches wiki pages, parses wikitext format, and extracts structured data for characters, weapons, and summons.

Architecture

Base Parser

All parsers inherit from BaseParser which provides:

  • Wiki page fetching via MediaWiki API
  • Redirect handling
  • Wikitext parsing
  • Template extraction
  • Error handling and debugging
  • Local cache support

Wiki Client

The Wiki class handles API communication:

  • MediaWiki API integration
  • Page content fetching
  • Redirect detection
  • Rate limiting
  • Error handling

Available Parsers

CharacterParser

Extracts character data from wiki pages.

Extracted Data:

  • Character stats (HP, ATK)
  • Skills and abilities
  • Charge attack details
  • Voice actor information
  • Release dates
  • Character metadata

Usage:

character = Character.find_by(granblue_id: "3040001000")
parser = Granblue::Parsers::CharacterParser.new(character)

# Fetch and parse wiki data
data = parser.fetch(save: false)

# Fetch, parse, and save to database
parser.fetch(save: true)

# Use local cached wiki data
parser = Granblue::Parsers::CharacterParser.new(character, use_local: true)
data = parser.fetch

WeaponParser

Extracts weapon data from wiki pages.

Extracted Data:

  • Weapon stats (HP, ATK)
  • Weapon skills
  • Ougi (charge attack) effects
  • Crafting requirements
  • Upgrade materials

Usage:

weapon = Weapon.find_by(granblue_id: "1040001000")
parser = Granblue::Parsers::WeaponParser.new(weapon)
data = parser.fetch(save: true)

SummonParser

Extracts summon data from wiki pages.

Extracted Data:

  • Summon stats (HP, ATK)
  • Call effects
  • Aura effects
  • Cooldown information
  • Sub-aura details

Usage:

summon = Summon.find_by(granblue_id: "2040001000")
parser = Granblue::Parsers::SummonParser.new(summon)
data = parser.fetch(save: true)

CharacterSkillParser

Parses individual character skills.

Extracted Data:

  • Skill name and description
  • Cooldown and duration
  • Effect values by level
  • Skill upgrade requirements

Usage:

parser = Granblue::Parsers::CharacterSkillParser.new(skill_text)
skill_data = parser.parse

WeaponSkillParser

Parses weapon skill information.

Extracted Data:

  • Skill name and type
  • Effect percentages
  • Skill level scaling
  • Awakening effects

Usage:

parser = Granblue::Parsers::WeaponSkillParser.new(skill_text)
skill_data = parser.parse

Rake Tasks

Fetch Wiki Data

# Fetch all characters
rake granblue:fetch_wiki_data

# Fetch specific type
rake granblue:fetch_wiki_data type=Weapon
rake granblue:fetch_wiki_data type=Summon

# Fetch specific item
rake granblue:fetch_wiki_data type=Character id=3040001000

# Force re-fetch even if data exists
rake granblue:fetch_wiki_data force=true

Parameters

Parameter Values Default Description
type Character, Weapon, Summon Character Type of object to fetch
id Granblue ID all Specific item or all
force true/false false Re-fetch even if wiki_raw exists

Wiki Data Storage

Database Fields

Each model has wiki-related fields:

  • wiki_en - English wiki page name
  • wiki_jp - Japanese wiki page name (if available)
  • wiki_raw - Raw wikitext cache
  • wiki_updated_at - Last fetch timestamp

Caching Strategy

  1. Initial Fetch: Wiki data fetched from API
  2. Raw Storage: Wikitext stored in wiki_raw
  3. Local Parsing: Parsers use cached data when available
  4. Refresh: Force flag bypasses cache

Wikitext Format

Templates

Wiki pages use templates for structured data:

{{Character
|id=3040001000
|name=Katalina
|element=Water
|rarity=SSR
|hp=1680
|atk=7200
}}

Tables

Stats and skills in table format:

{| class="wikitable"
! Level !! HP !! ATK
|-
| 1 || 280 || 1200
|-
| 100 || 1680 || 7200
|}

Skills

Skill descriptions with effects:

|skill1_name = Blade of Light
|skill1_desc = 400% Water damage to one enemy
|skill1_cd = 7 turns

Parser Implementation

Basic Parser Structure

module Granblue
  module Parsers
    class CustomParser < BaseParser
      def parse_content(wikitext)
        data = {}

        # Extract template data
        template = extract_template(wikitext)
        data[:name] = template['name']
        data[:element] = parse_element(template['element'])

        # Parse tables
        tables = extract_tables(wikitext)
        data[:stats] = parse_stat_table(tables.first)

        # Parse skills
        data[:skills] = parse_skills(wikitext)

        data
      end

      private

      def parse_element(element_text)
        case element_text.downcase
        when 'fire' then 2
        when 'water' then 3
        when 'earth' then 4
        when 'wind' then 1
        when 'light' then 6
        when 'dark' then 5
        else 0
        end
      end
    end
  end
end

Template Extraction

def extract_template(wikitext)
  template_match = wikitext.match(/\{\{(\w+)(.*?)\}\}/m)
  return {} unless template_match

  template_name = template_match[1]
  template_content = template_match[2]

  params = {}
  template_content.scan(/\|(\w+)\s*=\s*([^\|]*)/) do |key, value|
    params[key] = value.strip
  end

  params
end

Table Parsing

def extract_tables(wikitext)
  tables = []
  wikitext.scan(/\{\|.*?\|\}/m) do |table|
    rows = []
    table.scan(/\|-\s*(.*?)(?=\|-|\|\})/m) do |row|
      cells = row[0].split('||').map(&:strip)
      rows << cells unless cells.empty?
    end
    tables << rows
  end
  tables
end

Error Handling

Redirect Handling

When a page redirects:

# Automatic redirect detection
redirect_match = wikitext.match(/#REDIRECT \[\[(.*?)\]\]/)
if redirect_match
  # Update wiki_en to new page
  object.update!(wiki_en: redirect_match[1])
  # Fetch new page
  fetch_wiki_info(redirect_match[1])
end

API Errors

Common errors and handling:

begin
  response = wiki_client.fetch(page_name)
rescue Net::ReadTimeout
  Rails.logger.error "Wiki API timeout for #{page_name}"
  return nil
rescue JSON::ParserError => e
  Rails.logger.error "Invalid wiki response: #{e.message}"
  return nil
end

Parse Errors

Safe parsing with defaults:

def safe_parse_integer(value, default = 0)
  Integer(value.to_s.gsub(/[^\d]/, ''))
rescue ArgumentError
  default
end

Best Practices

1. Cache Wiki Data

# Fetch and cache all wiki data first
rake granblue:fetch_wiki_data type=Character
rake granblue:fetch_wiki_data type=Weapon
rake granblue:fetch_wiki_data type=Summon

# Then parse using cached data
parser = CharacterParser.new(character, use_local: true)

2. Handle Missing Pages

if object.wiki_en.blank?
  Rails.logger.warn "No wiki page for #{object.name_en}"
  return nil
end

3. Validate Parsed Data

data = parser.fetch
if data[:hp].nil? || data[:atk].nil?
  Rails.logger.error "Missing required stats for #{object.name_en}"
end

4. Rate Limiting

# Add delays between requests
objects.each do |object|
  parser = CharacterParser.new(object)
  parser.fetch
  sleep(1) # Respect wiki rate limits
end

5. Error Recovery

begin
  data = parser.fetch(save: true)
rescue => e
  Rails.logger.error "Parse failed: #{e.message}"
  # Try with cached data
  parser = CharacterParser.new(object, use_local: true)
  data = parser.fetch
end

Debugging

Enable Debug Mode

parser = Granblue::Parsers::CharacterParser.new(
  character,
  debug: true
)
data = parser.fetch

Debug output shows:

  • API requests made
  • Template data extracted
  • Parsing steps
  • Data transformations

Inspect Raw Wiki Data

# In Rails console
character = Character.find_by(granblue_id: "3040001000")
puts character.wiki_raw

# Check for specific content
character.wiki_raw.include?("charge_attack")

Test Parsing

# Test with sample wikitext
sample = "{{Character|name=Test|hp=1000}}"
parser = CharacterParser.new(character)
data = parser.parse_content(sample)

Advanced Usage

Custom Field Extraction

class CustomParser < BaseParser
  def parse_custom_field(wikitext)
    # Extract custom pattern
    if match = wikitext.match(/custom_pattern:\s*(.+)/)
      match[1].strip
    end
  end
end

Batch Processing

# Process in batches to avoid memory issues
Character.find_in_batches(batch_size: 100) do |batch|
  batch.each do |character|
    next if character.wiki_raw.present?

    parser = CharacterParser.new(character)
    parser.fetch(save: true)
    sleep(1)
  end
end

Parallel Processing

require 'parallel'

characters = Character.where(wiki_raw: nil)
Parallel.each(characters, in_threads: 4) do |character|
  ActiveRecord::Base.connection_pool.with_connection do
    parser = CharacterParser.new(character)
    parser.fetch(save: true)
  end
end

Troubleshooting

Wiki Page Not Found

  1. Verify wiki_en field has correct page name
  2. Check for redirects on wiki
  3. Try searching wiki manually
  4. Update wiki_en if page moved

Parsing Returns Empty Data

  1. Check wiki_raw has content
  2. Verify template format hasn't changed
  3. Enable debug mode to see parsing steps
  4. Check for wiki page format changes

API Timeouts

  1. Increase timeout in Wiki client
  2. Add retry logic
  3. Use cached data when available
  4. Process in smaller batches

Data Inconsistencies

  1. Force re-fetch with force=true
  2. Clear wiki_raw and re-fetch
  3. Check wiki edit history for changes
  4. Compare with other items of same type