- The Natural Capital Project changed its name to the Natural Capital Alliance. References to the old name have been updated to reflect this change. (#113)
- Using
importlib.metadataorimportlib_metadata, depending on the python version, to read the version from package metadata. This is in response topkg_resourcesbeing deprecated. (#100)
- Adding
pyproject.tomlfor our build definitions. - Python 3.6 has reached end-of-life and is no longer maintained, so it has been removed from the automated tests.
- Python 3.7 has reached end-of-life and is no longer maintained, so it has been removed from automated tests.
- Python 3.11 has been released, so
taskgraphis now tested against this new version of the language. - Python 3.12 has been released, so
taskgraphis now tested against this new version of the language.
- Testing against python 3.10 in github actions and officially noting support
for 3.10 in
setup.py. - Testing against python 3.9 in github actions and noting support in
setup.py. - Fixed an issue where exceptions raised during execution where the task
completed before
TaskGraph.join()was called would not be raised. Now, if a task raises an exception, its exception will always be raised when eitherTask.join()andTaskGraph.join()is called. - Fixed an issue where tasks with
hash_algorithm='sizetimestamp'would, under certain conditions, fail to re-execute when they should. This only occurred when a graph writing the same amount of , but possibly different, data is executed successively, with less than about 1.5 seconds between task executions. - After many years with the Natural Capital Project, Rich Sharp has stepped
down from the Project and as the maintainer of
taskgraph. James Douglass is taking his place, and this change is now reflected insetup.py. - Fixes an issue that causes an
EOFErrororBrokenPipeErrorto occur when theTaskGraphterminates. - Updated the
taskgraphexample in the README for the latest API changes and to clarify the need forif __name__ == '__main__': - Fixed an issue that could cause the
TaskGraphobject to hang if duplicateTaskobjects were created. - Fixed an issue that was causing TaskGraph to ignore a changed
hash_algorithmif the TaskGraph was created on one run, was deconstructed, then restarted. If the user chose a different hash, TaskGraph would use the hash that the target file was originally hashed under rather than the new algorithm. - Removed
copy_duplicate_artifactandhardlink_allowedparameters and functionality from TaskGraph. This is to address a design error that TaskGraph is not well suited for caching file results to avoid recomputation. Rather than add additional complexity around the limitations of this feature it is being removed to guide a design toward a standalone cache library if needed.
- Fixed issue that could cause combinatorial memory usage leading to poor
runtime or
MemoryErrorif a dictionary were passed that had thousands of elements. - Fixed issue that would cause
TaskGraphto not recognize a directory that was meant to be ignored and in some cases causeTaskto unnecessarily reexecute.
- Fixed an issue that would raise an exception when __del__ was
deconstructing a taskgraph object and a thread
join()would cause a deadlock.
- Fixed an issue that would ignore the state of a
transient_runflag if a previous Task run had run it with that flag set to False. - Removed a limit on the number of times
TaskGraphcan attempt to update its database up to 5 minutes of continuous failures. This is to address expected issues when many parallel threads may compete for an update. Relevant information about why the database update fails is logged. - Fixed an issue where the logging queue would always report an exception even if the logging thread shut down correctly.
- Fixed several race conditions that could cause the
TaskGraphobject to hang on an otherwise ordinary termination. - Changed logging level to "INFO" on cases where the taskgraph was not
precalculated since it's an expected path of execution in
TaskGraph. - Adding a
hardlink_allowedparameter toadd_taskthat allows the attempt to hardlink a file in a case where acopy_artifact=Truemay permit one. This will save on disk space as well as computation time if large files are not needed to copy. - Adding a
store_resultflag toadd_taskthat conditionally stores thefuncresult in the database for later.get. This was added to guard against return types that were not picklable and would otherwise cause an exception when being executed normally. - Fixed issue that would cause the logger thread to continue reporting status after all tasks were complete and the graph was closed.
- Fixed issue that would cause an infinite loop if a
TaskGraphobject were created with a database from an incompatible previous version. Behavior now is to log the issue, delete the old database, and create a new compatible one. - Fixed issue that would cause some rare infinite loops if
TaskGraphwere to fail due to some kinds of task exceptions. - Adding open source BSD-3-Clause license.
- Updating primary repository URL to GitHub.
- Adding support for Python 3.8.
- Removing the
EncapsulatedOpabstract class. In practice the development loop that encouraged the use ofEncapsulatedOpis flawed and can lead to design errors. - Removing unnecessary internal locks which will improve runtime performance of processing many small Tasks.
- Refactor to support separate TaskGraph objects that use the same database.
- Removed the
n_retriesparameter fromadd_task. Users are recommended to handle retries within functions themselves. - Added a
hash_target_filesflag toadd_taskthat when set to False, causes TaskGraph to only note the existence of target files after execution or as part of an evaluation to determine if the Task was precalculated. This is useful for operations that initialize a file but subsequent runs of the program modify it such as a new database or a downloaded file. - Fixed an issue on the monitor execution thread that caused shutdown of a TaskGraph object to be delayed up to the amount of delay in the monitor reporting update.
- Added a
.get()function forTaskobjects that returns the result of the respectivefunccall. This value is cached in the TaskGraph database and hence can be used to avoid repeated execution. Note the addition of this function changes the functionality of callingadd_taskwith no target path list. In previous versions the Task would execute once per TaskGraph instance, now successiveTaskobjects with the same execution signature will use cached results. - To support the addition of the
.get()function atransient_runparameter is added toadd_taskthat causes TaskGraph to avoid recording a completedTaskeven if the execution hash would have been identical to a previously completed run where the target artifacts still existed.
- Dropped support for Python 2.7.
- Fixed an issue where paths in
ignore_pathswere not getting ignored in the case ofcopy_duplicate_artifact=True. - Fixed an issue where the "percent completed" in the logging monitor would sometimes exceed 100%. This occurred when a duplicate task was added to the TaskGraph object.
- Fixed an issue where a relative path set as a target path would always cause TaskGraph to raise an exception after the task was complete.
- Fixed an issue where kwargs that were unhashable were not considered when determining if a Task should be re-run.
- Fixed an issue where files with almost identical modified times and sizes would hash equal in cases even when the filenames were different.
- Fixed an exception that occurred when two tasks were constructed that targeted the same file but one path was relative and the other was absolute.
- Fixed an issue that would cause TaskGraph to raise an IOError if an
add_taskcall was marked forcopy_duplicate_artifactbut the base target file was missing. - Fixed an issue that would prevent the source distribution from installing.
- Taskgraph is now tested against python versions 2.7, 3.6 and 3.7.
- Adjusted logging levels so most chatty information is lowered to debug and
oddness in
__del__shutdown are degraded fromerrortodebugso as not to cause alarm.
- Fixed an issue that would cause a deadlock if two tasks were added that had the same function signature except different target paths.
- Fixed a race condition that would sometimes cause an exception when multiple threads attempted to read or write to the completed Task Database.
- Fixed an issue that could cause an exception in
__del__to print to stderr during Python interpreter shutdown. - Added a
hash_algorithmparameter toadd_taskthat is a string of either 'sizetimestamp' or anything inhashlib.algorithms_available. This option tells TaskGraph how to fingerprint input and target files to determine the need for recomputation. - Added a
copy_duplicate_artifactparameter toadd_taskthat when True tells TaskGraph to copy duplicate target results to a new target so long as all the parameters and base/target files fingerprint to the same value. This can save significant computation time when use in scenarios where there are small changes in a workflow, but otherwise significant changes in filenames. This often occurs when putting timestamps or other suffixes on files that otherwise have identical content.
- TaskGraph now stores all task completion information in a single SQLite database stored in its cache directory. In previous versions TaskGraph would write a small text file for each task in a highly branching directory tree. This structure made removal of those directory trees computationally difficult.
- Fixed an issue that would cause TaskGraph to reexecute if the target path was included in the argument list and that path was not normalized to the operating system's path style.
- Fixed a deadlock in some cases where Tasks failed while other tasks checked for pre-execution clauses.
- Fixed an issue where very long strings might be interpreted as paths and Windows crashes because the path is too long.
- Fixed a deadlock issue where a Task might raise an unhandled exception as a new task was added to the TaskGraph.
- Fixed the occasional
BrokenPipeErrorthat could occur when a Task encountered an unhandled exception. - Added an
n_retriesparameter toadd_taskthat lets TaskGraph attempt to reexecute a failing Task up ton_retriestimes before terminating the TaskGraph. - Removed the
delayed_startoption.
- Resolving an issue with duplicate logging being printed to stdout when
n_workers > 0. Logging is now only handled in the process that contains the TaskGraph instance. - Updated main logging message to indicate which tasks, by task name, are currently active and how many tasks are ready to execute but can't because there is not an open worker.
- Attempted to fix an issue where processes in the process pool were not terminating on a Linux system by aggressively joining all threads and processes when possible.
- Fixed an issue that would cause tasks that had been previously calculated to prematurely trigger children tasks even if the parent tasks of the current task needed to be reexecuted.
- Added a
delayed_startflag to TaskGraph to allow for delayed execution of taskgraph tasks. If enabled on threaded or multiprocess mode, calls toadd_taskwill not execute tasks until thejoinmethod is invoked ontaskgraph. This allows for finer control over execution order when tasks are passed non-equivalentprioritylevels. - Fixing an issue where a non-JSON serializeable object would cause
add_taskto crash. Now TaskGraph is more tolerant of non-JSON serializeable objects and will log warnings when parameters cannot be serialized. - TaskGraph constructor has an option to report a ongoing logging message at a set interval. The message reports how many tasks have been committed and completed.
- Fixed a bug that would cause TaskGraph to needlessly reexecute a task if
the only change was the order of the
target_path_listordependent_task_listvariables. - Fixed a bug that would cause a task to reexecute between runs if input argument was a file that would be generated by a task that had not yet executed.
- Made a code change that makes it very likely that tasks will be executed in priority order if added to a TaskGraph in delayed execution mode.
- Refactored internal TaskGraph scheduling to fix a design error that made it likely tasks would be needlessly reexecuted. This also simplified TaskGraph flow control and cause slight performance improvements.
- Fixed an issue discovered when a
scipy.sparsematrix was passed as an argument andadd_taskcrashed on infinite recursion. Type checking of arguments has been simplified and now iteration only occurs on the Pythonset,dict,list, andtupletypes. - Fixed an issue where the
TaskGraphwas notjoining the worker process pool on a closed/join TaskGraph, or when theTaskGraphobject was being deconstructed. This would occasionally cause a race condition where the TaskGraph may still have a cache.jsonfile open. Discovered through a flaky build test. - Added functionality to the
TaskGraphobject to propagate log messages from workers back to the parent process. This only applies for cases where aTaskGraphinstance is started withn_workers > 0. - Fixed an issue where a function that was passed as an argument would cause
a reexecution on a separate run because the
__repr__of a function includes its pointer address. - Adjusted logging levels so that detailed task information is shown on DEBUG but basic status updates are shown in INFO.
- Fixing an issue where a Task would hang on a
joinif the number of workers in TaskGraph was -1 and a call toadd_taskhas a non-Nonepassed totarget_path_listand the resulting task was\.joined after a second run of the same program.
- Fixing an issue where TaskGraph would hang on a
joinif the number of workers was -1 and a call toadd_taskhasNonepassed totarget_path_list.
- Taskgraph now supports python versions 2 and 3 (tested with python 2.7, 3.6).
- Fixed an issue with
taskgraph.TaskGraphthat prevented a multiprocessed graph from executing on POSIX systems whenpsutilwas installed. - Adding matrix-based test automation (python 2.7, python 3.6, with/without
psutil) viatox. - Updating repository path to
https://bitbucket.org/natcap/taskgraph.
- Auto-versioning now happens via
setuptools_scm, replacing previous calls tonatcap.versioner. - Added an option to
TaskGraphconstructor to allow negative values in then_workersargument to indicate that the entire object should run in the main thread. A value of 0 will indicate that no multiprocessing will be used but concurrency will be allowed for non-blockingadd_task. - Added an abstract class
task.EncapsulatedTaskOpthat can be used to instance a class that needs scope in order to be used as an operation passed to a process. The advantage of usingEncapsulatedTaskOpis that the__name__hash used byTaskGraphto determine if a task is unique is calculated in the superclass and the subclass need only worry about implementation of__call__. - Added a
priorityoptional scalar argument toTaskGraph.add_taskto indicates the priority preference of the task to be executed. A higher priority task whose dependencies are satisfied will executed before one with a lower priority.
- Refactor of core scheduler. Old scheduler used asynchronicity to attempt to test if a Task was complete, occasionally testing all Tasks in potential work queue per task completion. Scheduler now uses bookkeeping to keep track of all dependencies and submits tasks for work only when all dependencies are satisfied.
- TaskGraph and Task
.joinmethods now have a timeout parameter. Additionallyjoinnow also returns False ifjointerminates because of a timeout. - More robust error reporting and shutdown of TaskGraph if any tasks fail during execution using pure threading or multiprocessing.
- Fixed a critical error from the last hotfix that prevented
taskgraphfrom avoiding recomputation of already completed tasks.
- Fixed an issue from the previous hotfix that could cause
taskgraphto exceed the number of available threads if enough tasks were added with long running dependencies. - Additional error checking and flow control ensures that a TaskGraph will catastrophically fail and report useful exception logging a task fails during runtime.
- Fixed a deadlock issue where a failure on a subtask would occasionally cause a TaskGraph to hang.
Task.is_completeraises a RuntimeError if the task is complete but failed.- More efficient handling of topological progression of task execution to attempt to maximize total possible CPU load.
- Fixing an issue from the last release that caused the test cases to fail. (Don't use 0.2.5 at all).
- Fixed a bug where tasks with satisfied dependencies or no dependencies were blocked on dependent tasks added to the task graph earlier in the main thread execution.
- Indicating that
psutilis an optional dependency through thesetupfunction.
- Empty release. Possible bug with PyPI release, so re-releasing with a bumped up version.
- More robust testing on a chain of tasks that might fail because an ancestor failed.
- Changed how TaskGraph determines of work is complete. Now records target paths in file token with modified time and file size. When checking if work is complete, the token is loaded and the target file stats are compared for each file.
- Handling cases where a function might be an object or something else that can't import source code.
- Using natcap.versioner for versioning.
- Fixing an issue where
types.StringTypeis not the same astypes.StringTypes. - Redefined
targetinadd_tasktofuncto avoid naming collision withtarget_path_listin the same function.
- Fixing a TYPO on
__version__number scheme. - Importing
psutilif it exists.
- Initial release.