homepage

A Database for Neural Simulations

Introduction

This document seeks to design a database suitable for the internal data structures of NEURON and of neural simulators in general. A database is a software tool for organizing and managing data.

The State of the Art

Currently NEURON and all other state of the art neural simulators are written in an object oriented programming style. Data is held in arrays of structures or classes which are connected together by pointers. They use the programming languages built-in data structures and algorithms. Their data structures are specified by their source code, which works well for an immediate implementation, but also forces them to use the syntax and semantics of the programming language which they write in.

The case for using a Database

One benefit of using a database is that it gives you a way to think about data in the abstract. By collecting all of your data into a single place you can see all of the commonalities between the different pieces of data in your simulation. To that end, this document seeks to gather all requirements and constraints for the data structures inside of your simulations.
Another benefit of using a database is that you can consolidate duplicate pieces of software into a shared code-base. You can write software tools which do more things with fewer lines of code and with almost no redundant code duplication (at least related to memory management). Fixes and new features can be enabled and applied to every datum from a central location.
Finally, if you make your own database then you can make it to bespoke to suit the needs of your programs.

Innovations

Most of aspects of this design are not novel. Rather, they came about as solutions to common problems that arose in the course of implementing neural simulators. What is novel is the combination of all of these pieces to form a cohesive set of tools, where all of the pieces work together with synergy but still remain flexible.

User Profiles

The database will be used by two groups of people: first by programmers and then by the end users of their programs. The database will have two API’s, one for each user group.

Profile of a Computer Programmer

Has technical experience and ability:
- Reads the documentation.
- Learns the details of a given computer system.
- Debugs unexpected issues, can read a stack backtrace.
- Trusted with your programs private variables.
Wants computer speed and memory performance.

Profile of the End User

Minimal technical experience:
- Wants easy to use.
- Not good at debugging software.
- Unwilling to learn new ‘programming paradigms’. Expects computer systems to conform to their existing mental models of how things should work.
Wants high quality results that can be directly published.

Two API’s

The programmers API allows access to the data arrays.

The end users API presents an easy to use interface:

Python. It should use all of pythons built-in features:
- Classes
- Docstrings
- Properties for setter/getter methods

Terminology

This database is tailored for running physics simulations, and to that end it has special words for dealing with the physical things which exist in your simulation.

An Entity is a specific object that exists in your simulation.
An Archetype defines a type of Entity, and each Entity is an instance of exactly one Archetype.
A Component is piece of data which is attached to an Entity.

There is a clear correspondence between these terms and the more familiar object oriented terminology of C, C++, Python, etc. For example consider the following python program:

class Neuron:
    def __init__(self):
        self.voltage = -70 # mV

my_neuron = Neuron()

Now let’s describe this program using the database terminology:

“Neuron” is an Archetype.
“voltage” is a Component.
“my_neuron” is an Entity.

Requirements

Creating Schema

Programmers can create Archetypes and Components. Although typically these will only be created at the start of the program, it would be convenient to add things to the database while the program is running.

Components

There are many different types of components for representing different and specialized data structures and algorithms.

Attributes

An attribute is a simple piece of data attached to an Entity.
The data type can be anything, including arbitrary user defined data types. The most common data types will be floating-point and pointer types.
Attributes must keep all of their data in a single contiguous array each containing exactly one attribute. This is essential for fast processing of data.

Global Constants

This is a special case of an attribute, it’s like an attribute except that all Entities see the same constant value.

Sparse Matrixes

Compressed Sparse Row (CSR) matrixes are a way for Entities to have lists of pointers to other Entities.
- For example you could have a sparse matrix of “synapse_weights” which gives each presynaptic neuron a list of postsynaptic dendrites and associated weights.
- Another example to store the electrical resistivity between adjacent segments using a sparse matrix.

Error Checking

Each component can be configured to check for common issues.

All components that can contain floating-point numbers will have the following optional checks on their data:
- Check for NaN.
- Bounds check min <= value <= max
All components that can contain a pointer will have an optional check for NULL pointers.
By default, all checks should be disabled.
Checks should be configured when the component is first specified/created.
The database will provide an API method for running all of the checks as well as checking specific components.

Entities

Any user may create and destroy entities at any time.
Any user may access the component data at any time.

Entity Handles

EntityHandles are special code objects that the database uses to represent a specific entity.

When the end user creates an entity they get an EntityHandle in return.
EntityHandles can destroy their entity.
EntityHandles can access data associated with their entity.
EntityHandles are persistent. They are valid until the user destroys either the underlying entity or the handle.
EntityHandles should be easy to use. For this API: it is acceptable to trade off poor computer performance for a more polished end-user experience.

Entity Storage and Movement

Entities are stored in contiguous zero-indexed arrays, where each array contains all entities of their archetype.
Holes are not allowed in the entity arrays. Therefore when the user deletes an entity from the middle of the list of entities, then a different entity is moved to fill in the hole and all pointers to the moved entity are updated to point to the new address.
- This is possible because the database knows the memory location of every pointer and every EntityHandle.
- This operation can be done efficiently if all entities are destroyed in batches, as opposed to one at a time. The algorithm to move entities reads and writes every pointer in the database exactly once, and can move any number of entities simultaneously.
  - Programmers need to batch up their operations.
  - Linear run time.
- The alternative implementation is a dead/alive mask over the entity arrays.
  - The programmers to need to understand how to use the alive mask.
  - Linear memory usage.
Entities can also be reordered for optimal data access patterns. This is helpful when data access patterns span multiple archetypes.
- For example, ion channels have a pointer to the neuron segment where they are inserted. The channels should be sorted by that pointers literal value, because usually when you iterate through a list of ion channels you’re also going to want to access the data for the segments which they are inserted into. When the entities are sorted by their pointers and you iterate through the list of entities accessing each pointers’ data, then you are accessing all of the data in a single sorted sweep through the underlying data.

Pointers

A pointer is a way to refer to an Entity.

Pointers are for the programmers API. Programmers can use pointers, end users may not use pointers. End users must use the EntityHandle instead.
Pointers are represented as indexes into the entity arrays. This is important for using a structure-of-arrays memory layout. It also allows pointers to be represented in 32 bits instead of 64 bits.
- NULL is represented by the maximum value instead of zero, since zero is a valid index.
Programmers may use pointers that are stored in the database, with the limitation that pointers are only valid until the next time the database gains control of program execution (at which time the database may decide to move entities and update all pointers). The database will provide a way to make persistent EntityHandles out of the transient pointers.

Destroying Entities

Pointers to destroyed entities are invalid and must be either overwritten or destroyed. Components containing pointer values can be configured to either allow or disallow NULL pointers.

If NULL pointers are allowed then pointers to destroyed entities are replaced with NULL pointers.
If NULL pointers are disallowed then the entities that contain them are automatically destroyed. This can trigger a chain reaction of destruction.
- For example, this is useful for attaching ion-channels to a segment of a neurons membrane because then when you destroy the segment it automatically destroys all attached ion-channels.

Sets of Entities

Programmers will need persistent references to large numbers of entities.

Using a large number of EntityHandles could be inefficient since each handle is its own discrete python object.
Creating a new archetype with a pointer attribute is cumbersome, visible throughout the database, and it has different semantics than regular python code.

To meet this need: the database could implement efficient collection types for Entities.

Set of entities must be of a single homogeneous Archetype.
Backed by either Arrays or Hash-Sets.
Applies operations to all contained elements.

However, I would recommend deferring the design and implementation of this idea until after the rest of the database is implemented and programmers start trying to use it, at which point it will become more clear what faculties the database lacks regarding collections of entities.

Documentation

As the central repository for the data, it makes sense to also store any descriptions of the data in the database too.

The programmer can attach user-facing documentation to any Archetype or Component. The documentation can be as simple text string.
The database will render user-facing documentation of itself, showing its database schema, and including all attached documentation.
Physical units are a special type of documentation, and the database treats them with a special case.

Introspection

The database will provide an API for inspecting its database schema and all associated meta-data, at run time.

The primary motivation for introspection is creating user interfaces (UI’s). UI programs need to know how the underlying database is organized in order to effectively present it to the user. By reading the database schema at run time the UI can be much more flexible and easier to write & maintain.

Hardware Memory Spaces

The database can utilize multiple memory spaces.

CPU: host only.
GPU: connected to host, openCL or CUDA?
Other. The database should be implemented in such a way that it is possible to add other hardware platforms.

TODO: how is this specified? by default where do things live? what kind of controls does the programmer/user have over it?

Other Ideas

Grids of Entities

An Archetype could define a regular grid to place Entities on and provide tools for efficiently working with grids of Entities. For example this could be used to implement extracellular diffusion at a course granularity.

Temporary Buffers

It might be nice to have your working data be managed by the database, but not persistently stored. The programmer would be responsible for free’ing it when they’re done with it, or else it would just sit around like permanent data.

Spatial Partitioning Structures

For performing fast nearest neighbor searches.
Requires valid/invalid book keeping.

Linear Systems

Different solver algorithm than NEURON, might be faster but its not as accurate.
Requires valid/invalid book keeping.