This is a utility library for reading and manipulating treebanks that use the PROIEL annotation scheme and the PROIEL XML-based interchange format.
This library requires Ruby >= 2.7 (as this is what Nokogiri 1.14.x requires).
Install as
gem install proielThe recommended way to use this library in your application is with bundler.
Create a Gemfile with the following content:
source 'https://rubygems.org'
gem 'proiel', '~> 1.0'and then execute
bundleTo download a sample treebank, initialize a new git repository and add the PROIEL treebank as a submodule:
git init
mkdir vendor
git submodule add --depth 1 https://github.com/proiel/proiel-treebank.git vendor/proiel-treebankHere is a skeleton programme to get you started. Save this as myproject.rb:
#!/usr/bin/env ruby
require 'proiel'
tb = PROIEL::Treebank.new
Dir[File.join('vendor', 'proiel-treebank', '*.xml')].each do |filename|
puts "Reading #{filename}..."
tb.load_from_xml(filename)
end
tb.sources.each do |source|
source.divs.each do |div|
div.sentences.each do |sentence|
sentence.tokens.each do |token|
# Do something
end
end
end
endYou can now run this as:
bundle exec ruby myproject.rbproiel aims to adhere to Semantic Versioning 2.0.0. This means that a patch version or minor version should not break backward compatibility of a public API, and that breaking changes should only be introduced with new major versions. When specifying a dependency on this gem it is best practice to use a pessimistic version constraint with two digits of precision:
spec.add_dependency 'proiel', '~> 1.0'Check out the git repository from GitHub and run bundle install to install
all development dependencies. Then run bundle exec rake to run the tests.
To install a development version of this gem, run bundle exec rake install.
To release a new version:
- Update the version number in
lib/proiel/version.rb. - Run
bundle exec rake release. This will:- Create a git tag for the version.
- Push git commits and tags to the remote repository.
- Push the
.gemfile to rubygems.org.
Documentation can be generated using YARD:
yardBug reports and pull requests are welcome on GitHub at https://github.com/syntacticus/proiel.