Go to file
Louis Mullie 8c6450a0d7 Merge pull request #120 from indentlabs/master
Use S3's static-public-assets bucket instead of louismullie.com
2017-05-05 18:31:43 -04:00
bin Manifest baby! 2012-06-17 01:52:01 -04:00
files Update files and manifests. 2012-06-29 16:44:37 -04:00
lib Merge pull request #120 from indentlabs/master 2017-05-05 18:31:43 -04:00
models Manifest baby! 2012-06-17 01:52:01 -04:00
spec tweak multi-line should expectations to be consistent with other specs 2014-01-28 12:08:44 +00:00
tmp Added manifests. 2012-06-16 01:50:26 -04:00
.gitignore Ignore benchmark results. 2012-10-21 16:59:51 -04:00
.rspec fixed line in .rspec file to invoke correct formatter and not cause a TravisCI bug 2014-08-25 23:13:29 -07:00
.travis.yml Update .travis.yml for Ruby 2+ 2015-05-28 18:42:37 -03:00
.treat Add tentative configuration options. 2013-01-02 20:25:43 -05:00
Gemfile Make nokogiri a required dependency, #121 2017-04-22 16:15:35 -04:00
LICENSE Bump version to 2.0.0. 2012-12-05 15:24:06 -05:00
README.md update code climate badge in readme 2014-05-07 18:18:57 +10:00
RELEASE Begin release notes for 2.0.0rc1. 2012-12-04 02:04:39 -05:00
Rakefile Put all specs under the Treat::Specs module. 2013-01-06 19:26:36 -05:00
treat.gemspec Add \n after post-install output 2016-05-24 14:04:48 -05:00

README.md

Build Status Code Climate

Treat Logo

New in v2.0.5: OpenNLP integration and Yomu support

Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language parsing, part-of-speech tagging, keyword extraction and named entity recognition. Learn more by taking a quick tour or by reading the manual.

Features

  • Text extractors for PDF, HTML, XML, Word, AbiWord, OpenOffice and image formats (Ocropus).
  • Text chunkers, sentence segmenters, tokenizers, and parsers (Stanford & Enju).
  • Lexical resources (WordNet interface, several POS taggers for English).
  • Language, date/time, topic words (LDA) and keyword (TF*IDF) extraction.
  • Word inflectors, including stemmers, conjugators, declensors, and number inflection.
  • Serialization of annotated entities to YAML, XML or to MongoDB.
  • Visualization in ASCII tree, directed graph (DOT) and tag-bracketed (standoff) formats.
  • Linguistic resources, including language detection and tag alignments for several treebanks.
  • Machine learning (decision tree, multilayer perceptron, LIBLINEAR, LIBSVM).
  • Text retrieval with indexation and full-text search (Ferret).

Contributing

I am actively seeking developers that can help maintain and expand this project. You can find a list of ideas for contributing to the project here.

Authors

Lead developper: @louismullie [Twitter]

Contributors:

  • @bdigital
  • @automatedtendencies
  • @LeFnord
  • @darkphantum
  • @whistlerbrk
  • @smileart
  • @erol

License

This software is released under the GPL License and includes software released under the GPL, Ruby, Apache 2.0 and MIT licenses.