Configuration and Build System

Introduction

This document describes the Numenta software development environment, build system, release structure and automated test system. Its intended audience is engineers who will be using the software development environment and building the software.

While this document includes a description of the build environment and how to use it, please refer to guides on the internal Wiki for up-to-date and streamlined instructions on how to build and test Numenta software.

Definitions

Architecture
- A combination of processor architecture and operating system software. For example, the "linux32" architecture refers to Linux running on an x86 32-bit processor.
Autobuild System
- Numenta’s continuous integration system, that builds and tests NuPIC on all architectures whenever there is a new check-in to the source repository.
Autotest
- A test that can be run by the automated build system
Acceptance Test
- A suite of tests that is run on platforms prior to a release. It is a superset of the standard test that are run for each checkin.
Binary release
- Customer release of NuPIC that includes the runtime engine, tools, Basic/Learning plugins, pynodes (but not pynodes/extra), vision toolkit, and several examples. Synonymous with "platform release". OBSOLETE: NuPIC is no longer a product.
Build system
- The set of scripts, makefiles, and other files that build and install Numenta software.
Build file
- A configuration file that controls the build process. Usually a Makefile.am or a .vcproj file. But also *.sln, Includes.am, etc.
Build environment
- The operating system environment and tools (e.g. compiler) required to build Numenta software from source code
Customer
- Synonymous with "Developer"
Customer Release
- A complete set of Numenta software, packaged for distribution to customers. There are two binary release types – binary (aka platform) npp and three source release types -- basic plugin, learning plugin, and tools. There is a binary and npp release for each architecture, but there is only one source release of each type. OBSOLETE: we no longer have source releases and NuPIC is no longer considered an end-user product.
Developer
- Someone who licenses Numenta’s platform directly for the purpose of developing an HTM application. A customer of Numenta.
End-user
- A customer of the Developer.
Engineer
- A member of the Numenta engineering team. Not to be confused with "developer"
Engineering build
- A software build created with make install (Unix) or building a solution (Win32), usually by an Engineer. This build is not suitable for distribution to customers because it may have links back to the source directory because there are no safeguards in place to make sure it is clean and functional. Compare to release build.
Installer
- A program that installs a binary release using a GUI interface, and makes other system changes (e.g. installing python, setting environment variables) to make the release usable.
NPP release
- The release of NuPIC given to NPP partners. It is a superset of the binary release and is equivalent to an Engineering build. OBSOLETE: we no longer have an NPP program, and the "NPP release" is no longer actually released, except in very special cases.
NuPIC
- The Numenta Platform for Intelligent Computing. Includes the Network Engine, the Core API, Developer Tools. See NuPIC2 Overview for details and definitions of these components. Note: NuPIC used to be the Numenta "product" but its future role is not clear.
nupic
- The Numenta Python package -- contains all Numenta Python modules (algorithms, bindings, network, session, analysis, math, pynodes, support)
Platform Release
- Same as Binary Release. (Obsolete)
Release build
- A software build that is a starting point for creating a customer release. It is similar to an engineering build, but has more safeguards in place, and does not include soft links to the source directory. It may be created by the autobuild.
Smoke test
- A set of tests that must pass before QA tests a build.
Source tree
- The set of files and directories in Subversion used to build NuPIC.

Development Philosophy

Numenta development processes are built around the following principles:

Continuous integration
- The software is built and tested continuously, so that it is "releasable" (building and passing all tests) at all times. Continuous integration is automated.
Lightweight processes suitable for rapid algorithm development and for releasing software as a research tool. We may need to add processes and automation for a commercial product.
Senior engineers
- Engineers are given broad responsibility and independence, and the processes do not include as much review as might be needed for more junior engineers. Developers are also responsible for writing tests for their own code.

We expect that more development process will be added as we begin development of a commercial product.

Historical Background

Significant pieces of the automated build and test system were designed around the idea of NuPIC as a base product, with a binary release, source releases, and an NPP release. The build system also supports building and testing standalone applications for Windows and OS X. The reason we developed a custom system instead of an off-the-shelf system was because of the special requirements coming from this release model. Neither NuPIC nor source releases nor standalone applications are part of the recent product vision, and many of the features to support these are disabled in the current build system.

Disabled features are sometimes not described in this document, because they would be more confusing than helpful, but in some cases documentation of older features has been retained and labeled "obsolete", as aid to understanding the current code base.

This document describes the build and test system for what was formerly known as the "NPP Release", aka "Engineering Build". Other features are still present in the build system code, but are either commented out or unused. These include:

Support for NuPIC releases that are a subset of the NPP release (e.g. the binary release, but also releases used for building standalone applications)
Support for source releases
Support for regression testing
Support for building standalone applications.

Build Environment

The Numenta Build Environment is the environment required to build and install Numenta software from source code. Standardization of this environment is important, not just because the software cannot be built without certain tools (e.g. gcc 4) but because a nonstandard environment can create dependencies on software that is not available on a customer system or on the systems of other developers (e.g. nonstandard Python modules, or shared library dependencies) or cause irreproducible behavior.

Engineers should be particularly careful about dependencies that may be introduced by ez_install, pip, fink or darwinports. Historically, these tools have caused many hard-to-diagnose problems with the build environment.

Supported architectures

Numenta software is built on three architectures:

OS X - x86 ("darwin86")
Linux - x86_64 ("linux64")
Windows - x86 ("win32") - not currently working

The OS X distributions run on Snow Leopard (10.6) (out of date)

Build systems

Corresponding to each supported architecture there used to be one build platform whose environment has been carefully configured and which changes infrequently. This provides high reproducibility, and using a lowest common denominator build system increases portability, because subsequent OS releases will strive for backward compatibility. (Historical note: this approach is more important for a product that will run on end user systems than for a service running on systems we control).

Since open sourcing NuPIC, we use Travis CI as our continual integration system and pretty much only guarantee that it will work in that environment but will help with other environments as our time permits.

Additional software required for Linux/OS X systems

In addition to a standard "user" installation of the base operating system, in some cases additional software must be installed to be able to run and/or build Numenta software.

In order of preference, required software should come from:

Base operating system.
Additional packages distributed by the OS vendor, e.g. Xcode, Linux packages available through the package updater.
Prebuilt standalone packages from a third party (e.g. prebuilt Python) and "blessed" by Numenta.
Build from source.

Engineers should be careful of (and avoid if possible) packages from Fink or DarwinPorts. Using such tools potentially add dependencies1[2] such as shared libraries, and different (and irreproducible) behavior.

The following packages are required to build or run NuPIC. A "*" in the notes field means that this requirement is met by Snow Leopard and most recent (>2007) Linux distributions.

Software	Version	Notes
Python	2.6.1	4-byte unicode for Linux. (1)
GCC/Xcode 3	>=4.1	* (2)
Autoconf	>=2.61	* (3)
Automake	>=1.9.6	* (3)
Libtool	>=1.5.22	* (3)
Swig	=1.3.36	Included with NuPIC. Not required from base OS
wxPython	2.8.10.1	(4)

Notes:

Python 2.6.1

Shared Python library required on all systems
4-byte unicode support required on Linux systems (configure --with-unicode=ucs4 --enable-shared). Linux systems must not be upgraded from a lower version of Python using RPMs. Replacing the default version of Python may cause system problems. Instead, it should be installed as an additional Python installation in /usr/local.
On Mac OS X we require the version of Python from python.org – we do not build or test with the Native Snow Leopard version.

GCC4

g++ and gcc required
Xcode for OSX supplies GCC4. It must be installed separately.
gcc 4.2.1 (on Mac) and 4.1.2 (on Linux) are explicitly tested via automated builds. GCC 4.4 has been used, but is not tested by the automated build process.
If you want to use LLVM, just install Xcode 4 and set these environment variables before running configure.
- export CC=llvm-gcc
- export CXX=llvm-g++

GNU autotools

Different versions of autotools may be installed with different OSes. The versions listed here are strict requirements -- the build will fail in strange ways if you have the wrong version. Note that we use GNU libtool -- at least one non-GNU version is out there (e.g. on Mac OS)

Additional software required for Windows systems

To build and test it on Windows requires the following tools:

Visual Studio 2008
Python pywin32 module
Python 2.6.2
wxPython

Additionally, the autobuild system requires

dnspython
putty (pscp, plink)

Additional software required for Mac systems

To build and test it on OS X requires the following tools:

Xcode 3
Python 2.6.2 (from python.org)

Additionally, the autobuild system requires

dnspython

Repository structure

This section describes the structure of the source tree.

Overview

The top level of nupic has the following directories

nta
- All C++ libraries, except for code for Python bindings and Python regions.
- See section 4.4
qa
- QA test applications/tools. See section 4.6
build_system
- Files that are part of the build system. See section 6
- autobuild
  - the autobuild system
- pybuild
  - python modules used by the build system
- setup.py
  - a script that generates build system files in the source tree
- buildfile_gen
  - the subsystem that generates build system files
- contrib
  - files not part of the formal build system, but useful for engineers
- misc
  - other tools (mostly obsolete)
- unix
  - unix-specific scripts and files
- win32
  - windows-specific scripts and files
- post_build
  - a build system project that copies extra (non-compiled) files into an installation.
examples
- Example HTM applications for use in demos and tutorials. Projects appearing in an NPP release are determined by install rules in build_system/post_build/files_to_copy.txt.
py
- All Python code; C++ code that enables Regions to be implemented in Python; math and algorithms bindings
lang
- Bindings for the core C++ API
external
- Third party code that is used in NuPIC.

Why the split between py and lang? The lang directory is for bindings that were intended to be available in all languages. The py directory is for functionality that would only be available in Python. It is useful to have Python implementations for experimentation but the C++ implementations are generally more CPU and memory efficient and we intend to make them compile into bindings for multiple languages.

Naming conventions for C++ code

File names should be all lower case unless they are named after a C++ object, in which case they have the same capitalization as the object. A few files (e.g. Makefile.am) are uppercase by standard convention.

A library corresponding to an application or a library is flat. All .cpp and .hpp files associated with that library or application (except for unit tests) are in the same directory. We don’t use subdirectories for further grouping except for directories in nta/python corresponding to Python modules.

Unit tests for C++ code

All C++ classes should have a set of unit tests. The unit test class for the the class Foo should be called FooTest (not FooTester, even though it is a subclass of Tester, or Foo_Test, even though some use that name today). Name standardization will facilitate future automation.

Unit tests are kept in subdirectories called "unittest" of the code that they test. For example, each library in nta has a subdirectory called "unittests". Applications have their own unittest subdirectories, for classes that are used by the application but are not part of a library. Please do not create directories called "unittest" that are not part of the automatic unit test system.

Unit tests are included automatically in the application "testeverything". They are distinct from other QA tests, including the "autotests" described in section 9.

Library structure

Most Numenta C++ source code is part of a library. Libraries allow more than one application to share the same code. Code that is used only by one application should remain in the source directory for that application, but may also be placed in a library to facilitate unit testing.

The name of the directory containing library source is "nta". The historical reason for this is so that the structure of the include directory in a source release can match the structure of the original source. E.g. both internal code and end-user code can contain #include . Since we no longer plan to ship C++ libraries to users, this is no longer important.

Dependencies between libraries are captured in the buildinfo.py file described below.

Python Modules

Most Numenta python code is in the top-level "py" directory. The directory py/nupic contains most of the nupic python package. All python modules except for nupic.bindings are pure python modules. That is, all C++ functionality is isolated to the nupic.bindings module. In the build process, the pure python modules are copied directly from the source tree to the install tree. The nupic modules are:

algorithms

This module is contains high level algorithms implemented in Python. In some cases, these modules use C++ code via nupic.bindings.math or nupic.bindings.algorithms.

analysis

Most of the code in here is related to inspectors.

bindings

Wrappers for C++ code. There are four bindings libraries: math, algorithms, engine, and iorange. Math/algorithms/engine are based on the corresponding trunk/nta libraries, and the iorange bindings module is a small set of wrappers for arrays

math

Imports the math library bindings and adds some statistics functions (which are used by nupic.algorithms)

engine

All functionality for the core HTM functionality (networks, regions, linking, running a network, region inputs/outputs, etc). See the NuPIC 2 architecture document for more information.

regions

HTM regions implemented in Python. This includes RecordSensor and CLARegion.

qa directory

The QA directory primarily contains tests, as well as some infrastructure (autotest) for running tests automatically. It does not directly include C++ unit tests, which are discussed above, but does include the testeverything application, which runs the C++ unit tests. C++ unit tests are distributed in directories called "unittests" near the C++ code that they are testing.

Most of the subdirectories of QA contain tests written in Python. In all cases, the test exits with status zero on success, and non-zero on failure -- this facilitates automation.

The directory qa/autotest contains code and configuration files for automated testing, and is described in section 9.3.

external directory

The external directory contains prebuilt executables, libraries and include files for 3rd party software that will be included in a release or is required to build Numenta software. Each architecture has its own separate directory, and each of those architecture-specific directories has bin, lib, and include subdirectories.

Includes and libraries that are common across all architectures go into the external/common directory. This includes common C and C++ header files, which go into external/common/include, and java jar files, which go into external/common/lib/java.

Source code for each external package is in the external/src directory. All packages have a script that builds and installs the software. At the top level of the source directory is a file "build_setup.py" that is included by each build script. You should not normally need to build or install this software. All third party software must be approved before being added to the repository. The license must allow Numenta to redistribute the software without restriction. Licenses are collected in the top level "licenses" directory in external.

The following sections describe third party software packages in the external directory.

Boost

Boost (www.boost.org) provides free peer-reviewed portable C++ source libraries. You can think of it as "everything that did not make it into the C++ standard". We are using version 1.37.

Boost is quite complex, and we try to minimize its use. Check with NuPIC dev team before using anything besides shared_ptr or scoped_array (both of these are very useful, though). We do not build any boost libraries – the only functionality we use is header-only.

png

png is a library for creating png files. It is used by PIL (May be obsolete – was used for vision applications)

OpenSSL

OpenSSL is a library for doing SSL (Secure Socket Layer) network connections. We do not use this functionality. Instead, we use some of its cryptographic functions to implement license management. (It was used for this purpose in NuPIC 1, but is currently unused. It is probably not necessary for key management in a cloud-based service.).

zlib

zlib is a general purpose data compression library that supports gzip and other formats. It is not currently used.

python_modules

Numenta uses several external python modules:

numpy – provides basic numerical functionality, including support for arrays
PIL (Python Image Library – also labeled Imaging) – basic image manipulation (Obsolete?)
matplotlib -- plotting package
ipython – a very convenient interactive shell for Python
yaml – was used for Numenta web services. Obsolete
scipy – not currently used
psutil – used only for memory management in a sorting utility. Might be useful in a service.
Processing – was needed for Python 2.5. Can be deleted.

java_packages

Numenta uses several external java packages:

common-collections
common-beanutils
common-lang
ezmorph - required for the json-lib
json-lib - used for reading/writing JSON strings

swig

Swig is a preprocessor that allows you to create python (and java and other) bindings for C++ code. It used to be part of the build environment, but now is checked in to the trunk.

wxPython

wxPython is a Python module based on the wx toolkit that we use for creating guis (not checked in to external)

traits

traits is Python package developed by Enthought that makes it easier to write guis using wxPython. We use it to implement the inspectors. We use verison 2. The current version is version 3.

Adding software to the external directory.

Installation tree structure

The "installation tree" or just "install tree" is what you get when you build with make install. The structure is a unix-like directory structure, and was originally designed with a NuPIC release in mind. Some of this structure may not make sense for a different type of product.

Goals of the release tree structure (as originally conceived/described for NuPIC 1):

Everything installs in one neat package. We will not scatter files all over the file system.
A consequence of the above: multiple NuPIC installations can coexist on the same machine.
The public NuPIC platform (binary) release is a subset of the release tree. (Obsolete)
Use standard Unix conventions as much as possible, both to make it easier for us to work with third party tools and so that it is familiar to developers.
The tree built by an engineering build will have the same structure as a release.
Automate the process of building the entire release structure, as much as possible.

Overview of the release tree

This is outdated and we don't do releases anymore.

This section describes the release that is created by the autobuild system.

The top level directory for customer releases is nupic-- for the binary release, and nupic-npp-- for NPP releases.
The version number is "r" followed by the subversion revision number, for normal builds, and "1.6", "1.7", etc., for public releases.
An "architecture" may be defined by processor type, operating system, and OS version. Right now we have darwin, darwin86, linux64 and win32
The substructure is the standard Unix directory layout – bin, lib, share, etc, man, include, even on Windows systems.
Unix: Building Numenta software with make install builds into a release directory structure, with a release named "eng" For example, building on a mac will install files into ~/nta/eng. Engineers can set their "current" link to point to the "eng" directory.
Windows: Building the top level solution builds into a release directory specified by NTAX_INSTALL_DIR.
All names are lowercase unless the name reflects c++ class capitalization.

A standard directory structure is represented below, for the 1.0 platform release.

$HOME/nta
    current -> nupic-1.0.0-npp
    nupic-1.0.0-npp
        bin/
        lib/
        share/

On windows, the directory structure is the same, except that there is no current link.

bin directory

All executable programs go in here. The only two applications currently built are testeverything and htmtest, which are both test programs for C++ code in the Network Engine.

lib directory.

The lib directory contains libraries, python modules, and java packages. Python modules are in the subdirectory python2.6/site-packages. Java jar files are in the subdirectory lib/java. The only library currently built is libcpp_region.dylib/so/dll, which allows NuPIC to use regions implemented in Python.

share directory

The share directory contains everything that does not fit somewhere else. It is architecture-independent. It includes:

doc -- currently includes only licenses for 3rd-party software.
projects -- all examples. The examples included this directory are explicitly installed via a rule in the top level Makefile.am file. The platform release contains a subset of the examples, as defined by the platform release manifest files.
prediction – experiments and data for the prediction toolkit

The build system

Getting started (Unix systems)

See [Developer Setup] for details

The Build File Generation System (Metabuild System)

The metabuild system exists because it is otherwise very difficult to keep the Windows and unix build systems in sync. The basic idea is that the Unix build files (Makefile.am) and Windows build files (*.vcproj) are generated from templates using a common methodology.

The long-term goal is to have an "invisible build system", in which all build files will be generated automatically. Currently, "project" files are generated mostly automatically, as described below. Other files (intemediate Makefile.am’s in Unix, and the solution file in windows) are hand-crafted.

Projects

A "project" (the name is borrowed from Visual Studio terminology) is a set of rules that builds a single entity (static library, executable, python module). Each project corresponds to a directory in the tree and the name of the project is the name of the directory. For example, nta/engine is a project that builds the engine library (libengine.a/.lib).

The Metabuild system generates buildfiles (a Makefile.am and a .vcproj file) for each project. Project directories are identified by the existence of a file named "buildinfo.py". generate_buildfiles.py creates a Unix and Windows build file for each directory that contains buildinfo.py. Note that merely generating a project build file does not make it part of the build -- it must be added to one of the handcrafted files (Makefile.am or trunk.sln) to make it a part of the build.

buildinfo.py

buildinfo.py is a python-code configuration file that specifies:

Template type (see trunk/build_system/buildfile_gen/templates for a list)
"file collector" -- the name of a method used to decide which projects to include in the project. Currently either "defaultFileCollector" or "testeverythingFileCollector" (discussed below).
Dependencies on other projects (which must be built before this project is built)
A guid for the windows .vcproj file. This guid also appears in the handcrafted trunk.sln.
The name of the win32 project if not the same as the directory name

Invoking the Metabuild System

The metabuild system is invoked with the python script build_system/setup.py.

The setup.py script uses code and templates in build_system/buildfile_gen, and in particular its core functionality is in build_system/buildfile_gen/generate_buildfiles.py. It creates "Makefile.am" and ".vcproj" in every directory with a buildinfo.py file.

Additionally, setup.py installs hand-crafted build files. These files are checked into subversion as "Makefile.ami" and "trunk.slni", and are copied by setup.py to Makefile.am and trunk.sln, respectively. This ensures that trunk.sln will not be accidentally checked in (it is frequently modified in the normal course of development) and that there are no files named "Makefile.am" checked into subversion (any such files must have been accidentally added, and may be deleted).

Finally, setup.py creates a file "project.manifest" in each directory, which is a list of all of the source files used in that project and is used for creating source releases. (This feature is obsolete, as we no-longer create source releases).

Dependencies

Projects are identified as a relative path from the root of the source tree, e.g. "nta/utils". This relative path is specified in the project dependencies entry in buildinfo.py. The list of dependencies specified in buildinfo.py uses these relative path names.

The buildfile generator computes a full set of dependencies based on the direct dependencies specified in the buildinfo.py file. I.e. if A depends on B and B depends on C, then it will figure out that A depends on C. The code that computes dependencies is in the _generateProject() method, and was pretty tricky to get right -- don’t modify it unless you really understand it and then test it thoroughly.

Because we link statically, the build will fail if the dependencies are incorrect or circular.

Generating buildfiles from templates

The buildfile generator collects a list of files using the file collector (e.g. *.cpp and *.hpp) and uses this list to construct the buildfiles that is inserted into the template. The set of files chosen depends on the file collector:

The default file collector includes all .cpp, .hpp, .c and .h files, except those whose name contains "private".
The testeverything file collector first runs the default file collector for the current directory, and then finds and includes all C++ unit test code, and includes it directly in the testeverything project. It also generates header files for the testeverything project, so that all of these unit tests can be included in the application itself.

Templates are located in build_system/buildfile_gen/templates, and there is a Unix and Windows template for each template type.

The buildfile generator uses a Unix and a Win32 "helper" that handle the platform-specific aspects of generating a build file. These helpers mostly handle details such as line-ending and line-continuation characters, and how to refer to another project. The Unix helper excludes files whose names contain "win_" -- these are windows-specific files used in python bindings.

How the build system works (Unix)

The previous section discussed the metabuild system. After the metabuild has generated the build files, we are left with a regular build system, described in this section and the next.

The build system is based on a fairly standard GNU autotools setup. The most complex parts have to do with creating python modules and libraries that can be used in python modules. Configuring GNU autotools is an arcane art, and for the most part you shouldn’t have to change anything. All of the Numenta special configuration is handled in build_system/unix/configure.ac (copied to the root of the trunk by setup.py) and build_system/unix/Includes.am. This section describes what you should know as a user of the system.

Makefile structure

The "Makefile.am" files that you see in each directory of the build tree govern the overall build process. These files, created by setup.py, are high level descriptions that are converted by the GNU autotools into real Makefiles. Makefile.am lists the files that are a part of the package (e.g., an executable or a library).

The Makefile structure is hierarchical. The hierarchy is generated by the "SUBDIRS" entry in a Makefile.am. A "make" at the top level therefore turns into a "make" in each of the subdirectories of the top level, which turns into a "make" in each of the sub-sub-directories.

Autogen

The script "build_system/unix/autogen.sh" (automatically invoked if you use the --autogen option to setup.py) performs most of the initialization to prepare the source tree for a build. The script is completely Numenta-written, but the name of the script is a standard autotool name. (Obsolete: Source code that is shipped to customers is "post-autogen" code, and includes files automatically generated by autogen. ) The output of autogen is machine-independent – its output is appropriate for darwin, or linux32, or any other architecture. The architecture-specific configuration happens later. See section 6.6 for information on when autogen.sh needs to be rerun.

Autogen runs the GNU autotools "libtoolize", "aclocal", "autoconf" and "automake". These tools do a lot of crazy things, most of which are not understood even by Bill and Charlie. All you need to know is that they generate the following files:

Makefile.in. In each directory where there was a "Makefile.am" there is now a "Makefile.in". This is a very long and complicated Makefile. It is not quite a makefile because it has to be processed for architecture- and configuration-dependent substitutions.
configure. This script is at the top level of the source tree. It is a very long shell script that will do the final setup of the build process, and convert all of the "Makefile.in" files to "Makefile" files.

configure

The "configure" script generates all of your Makefiles and is the last preparation step. It needs to be rerun any time autogen.sh is run. What configure does is governed entirely by the "configure.ac" configuration file.

Configure takes the following options:

--enable-debugsymbols={yes,no}
- "yes" means -g (of -gfull on darwin) is used
- "no" means do not use -g (results in smaller executable)
- The default for engineers is "yes"
- Releases are built with "no"
--enable-optimization={yes,no}
- "yes" means -O3 optimization is used (but only –O1 for python bindings)
- "no" means -O0 optimization is used
- The default for engineers and releases is "yes"
--enable-assertions={yes,no}
- "yes" means -DNTA_ASSERTIONS_ON is used, which enables NTA_ASSERT
- "no" means assertions are disabled
- The default for engineers is "yes"
- Releases are built with "no" (except on darwin86)
--enable-profiling={yes,no}
- "yes" means all code is compiled with -pg, so that it generates profiling output suitable for gprof
- "no" means all code is compiled without profiling
- The default for engineers and releases is "no"
- This feature has not been used or tested in a long time. YMMV.
--prefix=
- Changes the installation location. By default, the installation directory ("prefix" directory) is ~/nta/eng. You can relocate this directory anywhere.

Configure automatically figures out what type of machine it is being run on, and makes the appropriate substitutions to generate the Makefiles.

Note that configure is capable of doing architecture-specific configuration on any file, not just Makefile.in. We do not currently use this capability, but if you see any files with a ".in" suffix in subversion, these are intended to be processed by configure.

VPATH builds

In the example above, the compilation is done outside of the source directory. We do not support building in the source directory directly. Compiling in a separate build directory reduces clutter, reduces the risk of error in a subversion checkin, and allows you to have more than one build at the same time (e.g. an optimized and non-optimized build).

A build that is outside of the source directory is called a VPATH build. To enable a VPATH build, simply run configure from the build directory. For example, if your source is in ~./nta/trunk, you can build in ~/nta/build by doing:

mkdir ~/nta/build
cd ~/nta/build
~/nta/trunk/configure

All generated files (including Makefiles, and object files) will be put in ~/nta/build instead of ~/nta/trunk. The directory structure of ~/nta/build will mirror the directory structure of ~/nta/trunk. Wherever your build directory is relative to your trunk directory, you should always use an absolute pathname to run configure. If you use a relative pathname, strange errors will occur late in the build process.

It is possible to break the ability to do VPATH builds if you have Makefile rules that refer to the source directory instead of the build directory – be careful if you make any nontrivial changes to a Makefile.am.

Building and installing the software

To build the software, simply type make in the top level of the build directory. This will build all software. It is possible to type make in a subdirectory. For example, if you are working on a specific library you can build only that library. However, in order to test the software, you generally need to build an application.

To install the software into the prefix directory, type make install. You can omit the original make – make install will build everything if it’s not already built.

make install copies all built software into the prefix directory.

Copies from source tree to install tree

Many files are simply copied from the source tree to the install tree, including all of the examples and quite a bit of test data. Copying is performed by the "post_build" project, which copies all of the directories/files listed in build_system/post_build/files_to_copy.txt. Copying it every time would be time consuming.

The copy itself is handled by the routine smartCopy() in build_system/pybuild/install.py. smartCopy() skips .pyc and .pyo files. Normally, all other files are copied. If the environment variable NTAX_DEVELOPER_BUILD is specified, then smartCopy() creates a link when possible, and if not (e.g. on windows) will copy the file only if the modification time of the source is newer than the modification time of the destination.

Running the standard regression test (Unix and Windows)

You can run the standard regression test against any installation directory as long as the source directory is available. If the installation directory is (e.g. ~/nta/current) and the source directory is (e.g. ~/nupic) you can run the standard tests with:

cd /config
python test_release.py --short

This command is the same on Unix and Windows. See section 9.4 for more details about test_release.py.

Frequently asked questions on the build system (Unix)

Q: When do I need to rerun configure and setup.py?

A: The rule of thumb is to rerun setup.py –autogen if any files are added or deleted, or if configure.ac is changed.

Adding Directories To The Source Tree

Adding CPP directory

A CPP library can live anywhere in nupic. Some live in py, some live in nta. The reason for the distinction is now mostly gone but it may still be a useful to organize as such.

What you need: dir/buildinfo.py

This tells the metabuild system how to set up the build for this directory.

Example nta/engine. This is one of our libraries

This tells the metabuild system how to set up makefiles or visual studio project

templateType = static_lib

what we are generating is a static library as opposed to a shared library

build_system/buildfilegen/templates

TODO look up template module

also in buildinfo.py

lists dependencies

name

----

There is the algorithms library which is a static library, nupic.algorithms.bindings module.

py/bindings/algorithms/buildinfo.py

python_binding template

depends on matha nd algorithms

it will link all those libaries into the module

this produces a .so and a .py

----

entry in configure.ac

/build_system/unix

If you have a new dir you MUST add it here

makefile.ami

This is part of the recursive make structure

Lists all the subdirectories that you have to build in

so you need to add the directory you've just added into the ami of the parent dir

----

as long as you use an existing template they go to the right place

---

Adding Python directories

There are no build files. You need to add an entry for that directory (or file) in:

/build_system/post_build/files_to_copy.txt

For every bit of python code that you want installed in the release. When you add a directory in nupic you need to create __init__.py in that directory.

Changing Compiler Settings

Compile directives and flags (including platform dependent flags) are in:

/build_system/unix/configure.ac

Non-platform dependent optimization and debugging flags are in:

/build_system/unix/Includes.am

If we want to use llvm for darwin compile we would change those configure options here

Build System Details

Treatment of shared libraries

Our approach is to statically link libraries whenever possible, to minimize dependencies on the runtime environment. This is complicated by the fact that Python requires shared libraries in several cases.

nta libraries – engine, utils, algorithms, etc.

Numenta libraries are built as static libraries (.a). But they are compiled with -fPIC so that they can be statically linked into python modules in addition to being statically linked into executables. Libtool will issue warnings when linking .a files into .so files, saying that it is "nonportable". It is nonportable if you can’t guarantee that your .a files are compiled with -fPIC, but we do guarantee this, so it’s fine.

One downside of static libraries is that two python modules may contain the same underlying C++ library, a duplication that could be avoided by using shared libraries. We prevent name collision by compiling with -fvisibility-hidden by default, so that any symbols that are to be exported outside of a .so must be explicitly exported (e.g. with NTA_EXPORT).

Why don’t we use shared libraries? Basically for reasons that made sense for NuPIC 1 but may not be important for a future product.

(This is historical information, for reference, in case we decide to move to shared libraries.)

First, we would have have to put them in $NTA/lib and somehow set LD_LIBRARY_PATH (or DYLD_LIBRARY_PATH) to be able to find them. Everything including python would have to be wrapped. Second, since developers with a source license can modify our libraries, they might make incompatible changes that work with their plugins or tools, but cause the runtime engine or Numenta plugins to fail. System shared libraries are not normally modified, but may not be available on the system, and may be incompatible, even with an identical version number (e.g. libstdc++). By statically linking wherever possible, we mostly immunize ourselves against all of these issues.

Note: we no longer statically link against libstdc++. (See NPC-59).

Python modules

Python modules based on C extensions must always be in shared libraries -- we have no choice there. Python has its own mechanism for finding shared libraries, so we don’t have to do any special wrapping.

Python Interpreter

We originally built NuPIC with a statically linked Python interpreter. However we relied on system-provided python modules. This ran into problems because our statically linked interpreter sometimes couldn’t load these system-provided modules. So instead, we now rely in a system-provided python shared library for our embedded interpreter. This interpreter is "guaranteed" to work with the system-provided modules. The only interoperability issue we have come across (except for python version) is whether python is compiled with 4-byte or 2-byte unicode. When we had a statically linked python, there were several additional compatibility problems.

C library

The C library is intimately connected to the system. It basically isn’t possible to use a C library different from the system-provided C library. Because the C ABI isn’t changing, we don’t have problems with this. The only compatibility issue is that we require libc 2.3 or greater.

Creating and Validating Customer Releases

This section is obsolete. It is retained for reference, because functionality is still present in the build system.

This section has a high level picture of how customer releases are created and validated. See section Error! Reference source not found. for information on how to automatically build and test.

Creating Customer Releases

The NPP and Binary/Platform releases are created directly from a release build as shown in Figure 1.

Figure 1 -- Creation of Binary Releases

The NPP is just a copy of the release build. The binary release is a subset of the release build creating using manifest files. The manifest files specify specific files in the release build that are copied into the binary release.

Figure 2 -- Creation of Source Release

The source releases are created directly from the source tree as shown in Figure 2, using manifest files to specify exactly which files should be copied. In a few cases, instead of copying the original file from the source tree, a substitute file is used. For example, a Makefile.am in the original source tree may refer to a subdirectory that does not exist in one of the customer source release.

The code that creates a release from a manifest file is in trunk/build_system/pybuild/manifest.py.

There are three source releases:

basicplugin-source
learningplugin-source
tools-source

The tools source release builds the full tools, including the ability to invoke the embedded runtime (SP-NuPIC). Since runtime source is not included, a precompiled runtime library must be used to build the tools source. This library is provided in the binary release, and must be copied into the source release prior to building the source release. It would be more convenient to include the runtime library with the source release, but this would greatly complicate and lengthen the autobuild process, as the source release is currently created on a single system and then tested separately on each architecture.

Validating Customer Releases

Customer releases are ultimately validated by running tests against them. For the binary release, the set of tests that is run is a subset of the standard set of tests that are run against an engineering build (many of the tests can’t be run because the needed files aren’t there. For example, the testeverything test program is not part of the binary release).

For source releases, the release is untarred (or unzipped), configured, and built. The output of the build (everything installed by make install, on Unix) is compared to a list of files specified in a manifest file. The build output must exactly match the manifest, or the validation fails. (The comparison mechanism is to run manifest.py using the build output directory and the manifest file. If any files are missing from the build output, install_from_manifest.py will fail. Then the python equivalent of diff -r is run to compare the two directories. If any files were placed in the build directory but were not included in the manifest, the two directories will differ.

After the build output of a source release is validated, the build output is copied into the binary release and the binary release tests are run.

The Autobuild system

This is outdated. For the open source NuPIC project we are using Travis CI as our continuous integration system.

The autobuild system continuously builds and tests the entire platform on all architectures. It sends notification on failure. All passed autobuild releases are made available on a web server, and are installed on internal servers. During the research release phase of Numenta (starting at 1.3) autobuilds are also used for customer releases.

The script build_system/autobuild/autobuild.py is the main autobuild driver. It is invoked every five minutes on all build systems. Detailed output is logged using the python logging module.

The pybuild package

Most of the details are handled by methods in the pybuild package (build_system/pybuild) as destribed here:

arch.py
- Figures out the NuPIC architecture.
build.py
- Performs the build itself (using the make-based system on Unix and the VS-based system on windows)
- Performs pre- and post-build steps (e.g. copying extra files in the post-build on windows and running configure on Unix)
clicker.py
- Some tests intentionally segfault or return another error. These failures may be trapped on windows and cause a dialog to be displayed (e.g. "Do you want to start the JIT debugger?") The clicker runs in a separate thread during autobuild tests and clicks on a button to dismiss the dialog. Otherwise, these tests would hang the autobuild.
install.py
- smart copying of files on a local host
mail.py
- sending mail
- Used by the autobuild but not part of customer releases.
- Depends on dnspython
make_release.py
- Contains most of the infrastructure for actually building a release.
- Creates engineering releases and customer releases
- Builds from a clean (empty) build tree into a clean (empty) installation directory. If either is non-empty, the build fails.
- Produces a self-describing reproducible build --- the build contains the information (e.g. revision number, subversion branch, optimization level) needed to recreate it
- Clearly labels an incomplete build or a build that fails to pass as unsuitable for distribution to customers. Labels the release with a file named ".buildinfo" in the top level directory.
- Detects and prevents easy mistakes, such as trying to build twice into the same location, trying to restart a failed build that has not been completely cleaned up, etc.
qa.py
- Creates "qa" releases. A qa release is everything needed for testing a release. Mostly this is the trunk/qa directory, but it contains a few other things as well.
test_release.py
- Contains most of the infrastructure for running tests
- Running autotests using arbitrary set of testlists, individual tests and excluded tests
- Running autotests in a separate directory (to avoid hardcoded path dependencies)
- Logging test results
- Has method to perform tests in a location different from the initial installation location, and makes the installation location unavailable while the tests are being run (to ensure that there are no hardcoded pathnames).
utils.py
- filesystem operations (change directory, create directory, etc). These are similar to python native methods but are logged
- copying files to remote hosts (uses scp on unix, pscp on windows)
- creating and extracting archives (zip files on windows, tgz on unix)
- intra-process locking
- running external commands
- setting up a clean environment for NuPIC testing

Windows installer

The windows installer is created from the .zip file of the binary release. The autobuild uses the method createInstaller() from build_system/win32/gui_installer/create_win32_installer. This script may also be invoked from the command line as described below under "Release renaming"

Autobuild status files

The autobuild creates and uses status files that contain:

what svn revision was built (or attempted)
the result of the autobuild
time of the build
list of tests that failed, if the result is not PASS

These status files are used locally to remember what was last built, and what the result was, as well as stored on the web server. Slave autobuilds use the master status file to decide whether to build.

Status files are represented by the BuildStatusInfo class defined in autobuild.py

High level view of the autobuild

The autobuild script is run every five minutes from a cron job (Unix) or task scheduler (Windows) on each build system. The autobuild script uses the directory ~buildaccount/autobuild (on both Unix and Windows). This directory contains log files, releases, lock files, source, and other files.

The darwin86 build system is a "master build system", primarily because it can compile the software and run the tests faster than any other platform. This system runs continuously in the following loop.

Grab a lock. If the lock is already held by another process, exit. If the lock is held by another running process, but has expired, send notification about a hung build (but doesn’t clean up automatically).
Check out the most recent revision of the trunk.
Build and test, including all customer releases.
Copy log files to a web server.
If build/test was successful, copy the build output to a web server also, and mark the revision number as passed on the master build system.
If the list of failures has changed since the last attempt, send mail to a notify-on-change list. Otherwise send mail to the always-notify list.

The other build systems run in the following loop:

1. Grab a lock.
Has there been successful build on the master server since the last time I checked? If no, wait for a bit and try again. If yes, update the source tree to the same revision number.
Same as for master build system.

Mid-level view of the autobuild

Here is what goes on inside the build/test cycle after revision has been selected.

Create a "stamp" based on the revision number (e.g. rXXXXX)
Create a directory for building and for installing.
Perform a svn checkout into ~/autobuild/src. This is done incrementally, using a method that has proven to be extremely robust. For releases we like to do a checkout from scratch. This is not done automatically, but can by done by stopping the autobuild, removing ~/autobuild/src, and doing a full checkout on each build system, then restarting the autobuild.
Create a .buildinfo build description file
Build and install. This includes running autogen.sh on Unix.
Copy the installation to a temporary directory, rename the original (to make sure there are no hardcoded paths, and to make sure we don’t add any files to the installation, such as .pyc files) and the run the standard testlist on the copy.
Run regression tests (also in a temporary copy) [currently disabled]
Create the npp release. This is just a tgz or zip archive of the original installation directory.
Create the binary release from the original installation. This creates a tgz or zip file. [currently commented out]
Extract the binary release and test it. [currently disabled]
Create the windows NPP and binary installers if we’re running on Windows.
Create the source release if we’re on the master system or copy if from the master system. [currently commented out]
Build the source release and make sure that the build output exactly matches the build output manifest. [currently disabled]
Extract the binary release, copy the build output into it, and test this combined binary release + source release output. [currently disabled]
Create qa releases. These are just an archive of the qa subdirectory of the trunk and a few other directories necessary for running tests. Having a qa release makes it easier for QA to test a release.
Copy build results to the web server

Administrative details

To disable autobuilds, comment out the cron job on the darwin86 master system. This will shut down everything cleanly. There is no need to remove the cron job on the slave systems, as they will not perform a build if there are no new builds from the master.

The autobuild is actually run from a separate script -- run_autobuild.py. This script updates ~buildaccount/autobuild/scripts, which contains the trunk/config directory. It then launches autobuild.py from the scripts directory. Doing it this way ensures that the latest autobuild software is always used to build the system. This is necessary because autobuild.py is launched before the main subversion repository is updated.

Autobuild configuration is mostly contained in a file autobuild.cfg in the trunk/config directory. Configuration includes the definition of mailing lists for notify-on-change, always-notify and admin mail. (Mail sent to the admin list includes notification of lock expiration and internal autobuild problems). Configuration also includes the names of the master server, the web server (where build logs and releases are copied), and directory locations for copying files.

The autobuild can be run in test mode. Edit autobuild.py and change the testAutobuild flag to True. Autobuild will then use an alternate configuration file -- autobuild_test.cfg. Instead of building from the trunk, it will build from the repository /tmp/autobuild_test. An entire autobuild cycle on darwin86 takes about a minute in test mode.

Autobuild logs are contained in ~buildaccount/autobuild/status. The short log (sent in email) is in slog.r-arch.txt and the long log is in log.rarch.txt. There is a special log, syslog, that records every time an autobuild starts up. The monitoring system checks this file and alarms if its modification date is too old, meaning that the autobuild hasn’t been running.

Writing automated tests

This section describes at a high level how to write and run automated tests. Our goal is to have 100% automation of the test infrastructure.

Writing C++ Unit Tests

All C++ code should have unit tests. Unit tests directly test the interface to a class and don’t require a full runtime engine to execute. Unit tests are executed by one of three test executables -- testruntime, testplugins or testtools. In the context of NuPIC, the word "unit test" only applies to C++ code. We do not have a unit test infrastructure for python code. To write a unit test, you must define a unit test class in a file in a directory named unittests which is a subdirectory of the directory containing the class to be tested.

Include . The unit test class must inherit from Tester.

Implement the method RunTests().

As long as you have added the unit test file to an existing unittests directory, it will be picked up automatically and run as part of the appropriate test executable.

Writing other tests

Most other tests are standalone programs that return exit status 0 on success and nonzero on failure. These tests usually get their own subdirectory under qa.

On failure, it is best to return exit status 1. Do not use the return status to indicate the number of failures, as an exit status cannot be larger than 255.

Adding tests to the autotest framework

The autotest framework is our mechanism for running tests automatically. An autotest may be a self-contained test that is run only from the autotest framework, but it usually wraps a standalone tests as described in section 9.2. With few exceptions all tests should be runnable either as standalone tests or as autotests.

Autotests live in the directory qa/autotest, which contains:

QATest.py is a base class for automated tests. It sets up the test environment and makes available functionality that is used by many tests. This class has extensive internal documentation.
Individual automated tests (all other .py files in the autotest dir) subclass QATest and implement at least the "test" method, which should throw an exception on failure.
Testlist files (e.g. "standard.testlist") define sets of tests that are run in automated testing.

Each autotest is a python file that defines a class named ""Test" that extends the class "QATest". To implement an autotest you must override the "test" method. This method should raise an exception upon any failure. The test method typically invokes self.runCommand() to run an external command. Use the built-in help for QATest to learn about all the features of this class.

Autotests are not run automatically simply by being in the autotest directory. They must be part of one or more testlists. (This needs to be changed) A testlist determines when a test is run. All testlists live in the qa/autotest directory and have a ".testlist" extension. For example, "standard.testlist" currently defines what tests are run by the autobuild system against the NPP release. It is also the testlist that is run by default by test_release.py".

Disabling autotests

It frequently happens that we want an autotest to not run for some reason. Tests can be not run in one of three ways:

Overriding the "disabled" command to return a string instead of None
Commenting out the test in a testlist file
Removing the test from a testlist file

A disabled test should be an indication of a temporary change that needs to be undone and that we shouldn't forget about.

Tests should normally be disabled with the disable() autotest method, rather than commenting out of the testlist file. This is because 1) we can report the disable test in the test summary and 2) it disables the test in all testlists -- you don't have to comment it out everywhere, which is error-prone.
The disable() method should return a string that indicates why the test was disabled. The helper methods disableUntil() and noshort() are helper methods that can be used when a test is expected to be fixed by a certain date and when a test should not be run if the --short flag is specified.

If you comment out a test in a testlist file, it should be with a comment explaining why the test is not appropriate to run with that testlist. For example, a very long running test might be commented out of the standard testlist with the comment that it runs too long for the autobuild. The comment is just a reminder about why it wasn't included. If the test has been removed entirely, remove it from the testlist file.

Running automated tests

See [Developer Setup]for instructions on running tests

Upgrading Python version

This page includes instructions for [upgrading NuPIC's python version].