What is Pyntacle?

Pyntacle is a Python package and a command-line tool that eases the analysis of graphs. Its main goal is the search for important components of graphs and the way it does it is based on topological indices which are tailored on the concepts of reachability, fragmentation and centrality. It implements and provides ancillary methods for community finding, set operations between graphs and quick data type conversion tools. Pyntacle relies on multi-core/process programming paradigms to speedup the execution of complex analysis routines. In the current release, Pyntacle enables GPU-computing experimentally via APIs.

What is group centrality?

A group of nodes in a network is central if its nodes are important, not necessarily individually. Centrality is assessed in several ways by the calculation of topological metrics. Generally, these metrics either refer to local metrics, as e.g. the degree, closeness and betweenness centrality indices, thereby extending their calculations to groups of nodes, or are based on new concepts of fragmentation and reachability.
A quick introduction to get acquainted with group centrality and with the strategies that Pyntacle implements to search for relevant groups in a network is available here.

Installation

Install from Conda

The easiest way to install Pyntacle on any Linux, Mac or Windows based system is through Conda. We recommend Miniconda, which is a lightweight version. Indeed, installing Pyntacle and all its dependencies can be challenging for inexperienced users. There are several advantages in using Anaconda to install not only Pyntacle, but also Python and other packages: it is cross platform (Linux, MacOS X, Windows), you do not require administrative rights to install it (it goes in the user home directory), it allows you to work in virtual environments, which can be used as safe sandbox-like sub-systems that can be created, used, exported or deleted at your will.

You can choose between the full Anaconda and its lite version, Miniconda. The difference between the two is that Anaconda comes with hundreds of packages and can be a bit heavier to install, while Miniconda allows you to create a minimal, self-contained Python installation, and then use the Conda command to install additional packages of your choice.

In any case, Conda is the package manager that the Anaconda and Miniconda distributions are built upon. It is both cross-platform and language agnostic (it can play a similar role to a pip and virtualenv combination), and you need to set it up by running either the Anaconda installer or the Miniconda installer, choosing the Python 3.7 version.

The next step is to create a new Conda environment (if you are familiar with virtual environments, this is analogous to a virtualenv).

Linux and MacOS X

Run the following commands from a terminal window:

conda create -n name_of_my_env python=3.7

This will create a minimal environment with only Python v.3.7 installed in it. To put your self inside this environment run:

source activate name_of_my_env

And finally, install the latest version of Pyntacle:

conda install -y -c bfxcss -c conda-forge pyntacle

Windows

Warning: Windows users could experience some issues when installing Conda or Miniconda in folders whose names contain whitespaces (e.g. "%userprofile%\John Doe\Miniconda"). This is a known bug, as reported here and here. If this occurs, we recommend to create a new directory with no whitespaces (e.g. "%userprofile%\John_Doe\) and install Conda in there.

Open a Windows prompt or (even better) an Anaconda prompt, and type:

conda create -y -n name_of_my_env python=3.7

Then, activate the newly created environment:

conda activate name_of_my_env

Finally, install the latest version of Pyntacle:

conda install -y -c bfxcss -c conda-forge pyntacle

Install from source

Alternatively, it is also possible to build Pyntacle from source (Linux and Mac only). The archive is available from the official Releases section of GitHub. For more detailed instructions, read the documentation on GitHub.

Docker

Pyntacle can also be executed in a ready-to-go Docker container. If you are familiar with Docker, a fully functional image is available for download from DockerHub
Alternatively, you can build your own Docker image using this Dockerfile

Test the installation

We developed a series of unit tests to ensure that the Pyntacle command-line interface is working properly. We recommend to run these tests before using Pyntacle. In a shell, type:

pyntacle test

The expected output should be

Ran 27 tests in 8.002s OK <pyntacle.pyntacle.App object at 0x7f7c22f8be10>

This message shows that all the Pyntacle tests ended successfully and that its command-line is ready to use. Otherwise, please contact us and specify your OS, its version, the command you used to install Pyntacle and the output of the tests. You can redirect the output of the tests to a file (testlog.txt) as follows:

pyntacle test >> testlog.txt 2>&1

↑ back

Quick start and case studies

A quick start guide and three case studies are available to ease the approach of the inexperienced user to the basic Pyntacle commands and to the ways it may be proficiently used in common network analysis contexts.

Quick Start guide html | Python notebook
Case study 1: The search for key players in ecological food webs html | Python notebook
Case study 2: Community finding and group centrality analysis of the C. elegans PPI network html | Python notebook
Case Study 3: A group-based analysis of the C. elegans connectome html | Python notebook

↑ back

Documentation

This section provides a detailed and up-to-date documentation of Pyntacle. In details:

A guide to the file formats supported by Pyntacle (both network file formats and graph attribute files)

A list of the network minimum requirements and a detailed list of Pyntacle's reserved graph attributes

A manual for all command line functions, arguments and their description

A table listing all methods provided through Octopus

Caution on parallelism

Since the command-line interface of Pyntacle was designed for not-experts, the fine-grained parallelism that deal with the enumeration of all shortest-paths was hidden. The coarse-grained parallelism is instead tunable by the argument -O/--nprocs of the brute-force search algorithm. Thus, according to the size of a graph, its level of sparseness and the number of employed processors, the computing mode is chosen based on this simple algorithm:

// auto-select the computing mode
if nprocs > 1 // n.b., nprocs is user-defined
    Let's enable multi-process and disable multi-threading
else if size(graph) < 250 or rho(graph)<0.5 //rho measures sparseness
    Let's disable multi-process and disable multi-threading
else
    Let's disable multi-process and enable multi-threading

When multi-threading gets enabled, the number of spawned threads equals that of available cores -1. This default setting can be altered by setting the environmental variable NUMBA_NUM_THREADS to the number of desired computing cores.
However, caution must be paid on this: Numba adjusts the number of active threads on-the-fly according to the current overheads and, hence, the efficiency of parallelism. This means that what specified in the environment variable might not be actually respected.

Multi-threading is however controllable via APIs, as follows:

from algorithms.bruteforce_search import BruteforceSearch
from io_stream.generator import PyntacleGenerator
from tools.enums import CmodeEnum, KpposEnum

if __name__ == '__main__':
    graph_rnd = PyntacleGenerator.Random([100, 0.6])
    start = time.perf_counter()

    # Multi-threaded
    mreach_s = BruteforceSearch.reachability(
        graph_rnd,
        2,
        KpposEnum.mreach,
        None,
        m=2,
        cmode=CmodeEnum.cpu,
        nprocs=1) # this is the default choice

    end = time.perf_counter()
    print("--- Elapsed time: {:.2f} seconds ---".format(end - start))

GPU-base processing is an experimental feature in the current version and, then, not covered by the command-line interface. This is because of weird behaviors of Numba with some hardware configuration that might puzzle the user. The GPU feature will be stable in the release 2.0, when Pyntacle will cover the possibility to manage big matrices for which replacing fine-grained parallelism with GPU computing would make sense.
However, GPU-computing can be enabled by APIs:

...
BruteforceSearch.reachability(
    graph_rnd,
    2,
    KpposEnum.mreach,
    None,
    m=2,
    cmode=CmodeEnum.gpu,
    nprocs=1)
...

Development

Pyntacle is constantly maintained and new features are added, also on user request through the GitHub repository. Here, you can track its development, find information about issues, future plans and new features.

License

Pyntacle is available under the GNU General Public License v3.0.

Contacts

You can contact the development team at bioinformatics[at]css-mendel[dot]com or report a bug using GitHub issues.

↑ back