All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Real DataFlows, see operations tutorial and usage examples
- Async helper concurrently nocancel optional keyword argument which, if set is a set of tasks not to cancel when the concurrently execution loop completes.
- FileSourceTest has a
test_labelmethod which checks that a FileSource knows how to properly load and save repos under a given label. - Test case for Merge CLI command
- Repo.feature method to select a single piece of feature data within a repo.
- Dev service to help with hacking on DFFML and to create models from templates in the skel/ directory.
- Classification type parameter to DNNClassifierModelConfig to specifiy data type of given classification options.
- util.cli CMD classes have their argparse description set to their docstring.
- util.cli CMD classes can specify the formatter class used in
argparse.ArgumentParservia theCLI_FORMATTER_CLASSproperty. - Skeleton for service creation was added
- Simple Linear Regression model from scratch
- Scikit Linear Regression model
- Community link in CONTRIBUTING.md.
- Explained three main parts of DFFML on docs homepage
- Documentation on how to use ML models on docs Models plugin page.
- Mailing list info
- Issue template for questions
- Multiple Scikit Models with dynamic config
- Entrypoint listing command to development service to aid in debugging issues with entrypoints.
- HTTP API service to enable interacting with DFFML over HTTP. Currently includes APIs for configuring and using Sources and Models.
- MySQL protocol source to work with data from a MySQL protocol compatible db
- shouldi example got a bandit operation which tells users not to install if there are more than 5 issues of high severity and confidence.
- dev service got the ability to run a single operation in a standalone fashion.
- About page to docs.
- Tensorflow DNNEstimator based regression model.
- feature/codesec became it's own branch, binsec
- BaseOrchestratorContext
run_operationsstrict is default to true. With strict as true errors will be raised and not just logged. - MemoryInputNetworkContext got an
saddmethod which is shorthand for creating a MemoryInputSet with a StringInputSetContext. - MemoryOrchestrator
basic_configmethod takes list of operations and optional config for them. - shouldi example uses updated
MemoryOrchestrator.basic_configmethod and includes more explanation in comments. - CSVSource allows for setting the Repo's
src_urlfrom a csv column - util Entrypoint defines a new class for each loaded class and sets the
ENTRY_POINT_LABELparameter within the newly defined class. - Tensorflow model removed usages of repo.classifications methods.
- Entrypoint prints traceback of loaded classes to standard error if they fail to load.
- Updated Tensorflow model README.md to match functionality of DNNClassifierModel.
- DNNClassifierModel no longer splits data for the user.
- Update
pipin Dockerfile. - Restructured documentation
- Ran
blackon whole codebase, including all submodules - CI style check now checks whole codebase
- Merged HACKING.md into CONTRIBUTING.md
- shouldi example runs bandit now in addition to safety
- The way safety gets called
- Switched documentation to Read The Docs theme
- Models yield only a repo object instead of the value and confidence of the prediction as well. Models are not responsible for calling the predicted method on the repo. This will ease the process of making predict feature specific.
- Updated Tensorflow model README.md to include usage of regression model
- Docs get version from dffml.version.VERSION.
- FileSource zipfiles are wrapped with TextIOWrapper because CSVSource expects the underlying file object to return str instances rather than bytes.
- FileSourceTest inherits from SourceTest and is used to test json and csv sources.
- A temporary directory is used to replicate
mktemp -ufunctionality so as to provide tests using a FileSource with a valid tempfile name. - Labels for JSON sources
- Labels for CSV sources
- util.cli CMD's correcly set the description of subparsers instead of their
help, they also accept the
CLI_FORMATTER_CLASSproperty. - CSV source now has
entry_pointdecoration - JSON source now has
entry_pointdecoration - Strict flag in df.memory is now on by default
- Dynamically created scikit models get config args correctly
- Renamed
DNNClassifierModelContextfirst init arg fromconfigtofeatures - BaseSource now has
base_entry_pointdecoration
- Repo objects are no longer classification specific. Their
classify,classified, andclassificationmethods were removed.
- Definition spec field to specify a class representative of key value pairs for definitions with primitives which are dictionaries
- Auto generation of documentation for operation implementations, models, and sources. Generated docs include information on configuration options and inputs and outputs for operation implementations.
- Async helpers got an
aenter_stackmethod which creates and returns andcontextlib.AsyncExitStackafter entering all the context's passed to it. - Example of how to use Data Flow Facilitator / Orchestrator / Operations by writing a Python meta static analysis tool, shouldi
- OperationImplementation
add_labelandadd_orig_labelmethods now use op.name instead ofENTRY_POINT_ORIG_LABELandENTRY_POINT_NAME. - Make output specs and remap arguments optional for Operations CLI commands.
- Feature skeleton project is now operations skeleton project
- MemoryOperationImplementationNetwork instantiates OperationImplementations
using their
withconfig()method. - MemorySource now decorated with
entry_point - MemorySource takes arguments correctly via
config_setandconfig_get - skel modules have
long_description_content_typeset to "text/markdown" - Base Orchestrator
__aenter__and__aexit__methods were moved to the Memory Orchestrator because they are specific to that config. - Async helper
aenter_stackusesinspect.isfunctionso it will bind lambdas
- Support for zip file source
- Async helper for running tasks concurrently
- Gitter badge to README
- Documentation on the Data Flow Facilitator subsystem
- codesec plugin containing operations which gather security related metrics on code and binaries.
- auth plugin containing an scrypt operation as an example of thread pool usage.
- Standardized the API for most classes in DFFML via inheritance from dffml.base
- Configuration of classes is now done via the args() and config() methods
- Documentation is now generated using Sphinx
- Corrected maxsplit in util.cli.parser
- Check that dtype is a class in Tensorlfow DNN
- CI script no longer always exits 0 for plugin tests
- Corrected render type in setup.py to markdown
- Contribution guidelines
- Logging documentation
- Example usage of Git features
- New Model and Feature creation script
- New Feature skeleton directory
- New Model skeleton directory
- New Feature creation tutorial
- New Model creation tutorial
- Added update functionality to the CSV source
- Added support for Gzip file source
- Added support for bz2 file source
- Travis checks for additions to CHANGELOG.md
- Travis checks for trailing whitespace
- Added support for lzma file source
- Added support for xz file source
- Added Data Flow Facilitator
- Restructured documentation to docs folder and moved from rST to markdown
- Git feature cloc logs if no binaries are in path
- Enable source.file to read from /dev/fd/XX
- Corrected formatting in README for PyPi
- Feature class to collect a feature in a dataset
- Git features to collect feature data from Git repos
- Model class to wrap implementations of machine learning models
- Tensorflow DNN model for generic usage of the DNN estimator
- CLI interface and framework
- Source class to manage dataset storage