Pyntacle Quickstart Guide

*Pyntacle version 1.3.2*

- Introduction
- Setting Pyntacle for the first use
- Dataset description
- Command-line guide
- Pyntacle APIs quick guide
- Import a network using Pyntacle
`Octopus`

: A convenient wrapper of Pyntacle methods`Octopus`

: Computing key-player metrics for a given set of nodes`Octopus`

: Greedy-optimization search`Octopus`

: Brute-force search- Exporting the
`igraph.Graph`

object - Key-player search without
`Octopus`

: Brute-force search (case example)

This brief tutorial will show how to use Pyntacle from the command-line or its API interface to find relevant groups of nodes (also called here the *kp-set*) in a network using the *fragmentation* and *reachability* key-player metrics.

*Fragmentation* is the structural effect of removing nodes from a network.
*Reachability* is the property of nodes to reach their direct or indirect neighbors in a network with a few connections.
Although we will provide here a brief explanation of these concepts, we recommend reading the introduction to group centrality section.

All data used in this tutorial are available here for download.

After installing Pyntacle as described here and activating the Pyntacle's conda environment by typing

`conda activate pyntacle_env`

**Note:** `pyntacle_env`

is just a naming convention

You can then check the Pyntacle version by typing:

`pyntacle --version`

Installation is correct if the following command ends succesfully:

In [1]:

```
import pyntacle
```

Otherwise, contact us and send us a description of the error, along with all the necessary information to reproduce the error.

The toy dataset used here is the network of advice-seeking ties among global consulting companies (also called as the **AdvSeek network** in this tutorial). This network was already used in the original paper by Stephen P. Borgatti, in which the key-player metrics are introduced, Identifying sets of key players in a social network.

We made this network available here as an adjacency matrix (cf. the supported file formats). Nodes in the network are labeled with the initials of the consultants. Since some of the consultants share the same initials, we appended progressive numbers to their initials (e.g. BS, BS2, BS3, etc) to be compliant with the Pyntacle Minimum Requirements. Edges represent relationships among consultants.

To check that Pyntacle is properly installed and working, a set of unit tests can be run by typing:

`pyntacle test`

Several tests will be launched. If succesfull, this command will result in the following message:

`Ran 27 tests in 21.002s`

`OK`

Otherwise, please contact us specifying your operative system, its version, how you installed Pyntacle and the output of your tests in a text file. You can redirect the output of the tests this way:

`pyntacle test >> testlog.txt 2>&1`

Once verified the correct installation of Pyntacle, you can replicate some of the results of Borgatti’s original article in two steps.

`kp-info`

: Computing the key-player metrics for a given set of nodes ¶First, the Pyntacle `kp-info`

function will be used to measure the **dF** fragmentation index for a specific set of nodes of the network. Borgatti measured dF for the pair {HB, WD} and obtained a value of **0.817**. We recall here that dF ranges from 0 to 1, where 0 means maximum connectedness (i.e., the network is a clique) and 1 means maximum disconnection (all nodes are isolates). This result can be replicated by typing:

`pyntacle keyplayer kp-info -i AdvSeek.adjm -t dF --nodes HB,WD`

This command returns the following output (we report the final summary only for the sake of readability):

`****************************************** RUN SUMMARY *******************************************`

`Removing node set`

`(HB, WD)`

`gives a dF value of 0.81716`

`Starting graph dF was 0.64939`

`****************************************************************************************************`

The resulting fragmentation can be better assessed comparing the dF of the original network (before node removal) with that calculated after the node removal. For this reason, `kp-info`

returns also the dF value of the original network.

Results of `kp-info`

can be saved in several file formats (default is tab-separated).

**Note:** you can specify a custom target directory using the `--directory/-d`

argument

The representation of the AdvSeek network follows, with the removed nodes represented in purple. (Thicker edges around the kp-set highlight the {HB, WD} neighborhood)

`kp-finder`

: Greedy-optimization search ¶The set of nodes of a particular size (*k*) that maximally fragment a network or that reach the highest number of other nodes can be obtained using the greedy-optimization search algorithm. It will search for a **suboptimal** set of nodes of size *k* for a selected key-player metrics. Starting from a random set of nodes and then swapping nodes with others outside the set until the selected key-player metrics cannot be improved, one dramatically reduces the number of operations at the cost of suboptimal solutions. This algorithm is particularly useful when dealing with large networks.

The key-player metric chosen in this tutorial is the *m-reach*, which is a measure of reachability. It aims at counting the number of nodes reached by a vertex set of size *k* in *m* steps or less. In Borgatti’s paper, the m-reach metric is calculated on the AdvSeek network for two different *k* sets and setting *m* to 1.

k |
Nodes reached | % of network reached | kp-set |
---|---|---|---|

2 | 15 | 53 | {BM, BS3} |

3 | 20 | 72 | {BM, BS3, NP} |

**Note**: The node *BS* is here called *BS3* as there are 3 synonymous nodes in the original graph

To reproduce this result, type:

`pyntacle keyplayer kp-finder -i AdvSeek.adjm -k 2 -m 1 -t mreach`

`kp-finder`

enables the use of the greedy-optimization search algorithm by default. This behaviour can be changed using the `-I/--implementation`

argument and setting the `brute-force`

(See the brute-force section) or `sgd`

(stochastic-gradient descent) options.

Pyntacle will output (only the run summary is reported):

`****************************************** RUN SUMMARY *******************************************`

`Size of the node set: 2`

`Key-player set of size 2 for m-reach, using at most 1 step, is:`

`(BS3, BM)`

`with value 15 on 32 (number of nodes reached on the total number of nodes)`

`The total percentage of nodes, which includes the kp-set, is 53.12%`

`****************************************************************************************************`

**Note:** We report the number of nodes reached by the kp-set (m-reach) and the percentage of reached nodes, which includes the kp-set (as from Borgatti’s paper).

Results are stored in a *txt* file in the current working directory.

Similarly, the optimal set of size 3 can be sought typing:

`pyntacle keyplayer kp-finder -i AdvSeek.adjm -k 3 -m 1 -t mreach`

This will produce the following output:

`****************************************** RUN SUMMARY *******************************************`

`Size of the node set: 3`

`Key-player set of size 3 for m-reach, using at most 1 step, is:`

`(KR, BM, NP)`

`with value 20 on 32 (number of nodes reached on the total number of nodes)`

`The total percentage of nodes, which includes the kp-set, is 71.88%`

`****************************************************************************************************`

The {KR, BM, NP} kp-set is not the one reported by Borgatti {BM, BS3, NP}, despite their scores being identical. This means that this network holds more kp-sets of size 3 with equal fragmentation scores.

A way of getting all these sets would be to run the greedy-optimization search algorithm several times. But this will not guarantee to capture all existing kp-sets. Another, exact, way would be to switch to the Brute-force search algorithm, as we will discuss in the next section.

`kp-finder`

: Brute-force search ¶Contrary to the greedy-optimization search algorithm, the brute-force search algorithm seeks and finds the **best** solutions at the price of high demand of computing resources and running times. This task requires
operations, where *N* is the size of the graph.

However, it was implemented to calculate the desired metrics for all the combinations of size *k* of nodes, in parallel on multiple processors, if available. It can be enabled by setting the `-I/--implementation`

parameter with the `brute-force`

keyword. By default, Pyntacle will use 1 CPU only. However, this can be tuned by the `-O/--nprocs`

parameter.

Let's consider again the AdvSeek network and the case discussed in the previous chapter. The greedy-optimization search did not replicate the Borgatti's findings on the AdvSeek network. Searching again the best kp-sets of size 3 with the brute-force search algorithm:

`pyntacle keyplayer kp-finder -i AdvSeek.adjm -k 3 -m 1 -t mreach -I brute-force`

`****************************************** RUN SUMMARY *******************************************`

`Size of the node set: 3`

`Key player sets of size 3 for m-reach, using at most 1 step, are:`

`(KR, BM, NP)`

`(BS3, BM, NP)`

`with value 20 on 32 (number of nodes reached on the total number of nodes)`

`The total percentage of nodes, which include the kp-set, is 71.88%`

`****************************************************************************************************`

The two best solutions were finally found. This concludes the quick start guide of Pyntacle via command-line. The same problems will be tackled using the Pyntacle's API interface in two ways: using the `Octopus`

wrapper and by directly using the Pyntacle methods for kp-search.

The APIs are designed for intermediate-to-expert Python users, with a basic knowledge of object-oriented programming and some experience with the igraph package. If you are not acquainted with igraph basics, we recommend reading its official Python tutorial.

Pyntacle is built around `igraph`

and performs its calculations on instances of `igraph.Graph`

objects. Pyntacle provides several tools for importing/exporting from/to `igraph.Graph`

objects and to/from several textual network file formats. We refer to the File Formats Specifications page for more details on this regard. Moreover, we recommend reading the Minimum Network Requirements.

Networks can be imported from file via the `PyntacleImporter`

class of the `io_stream`

module. This class contains a series of handy methods to parse and store an input graph into an `igraph.Graph`

object and initialize all the Pyntacle reserved attributes (see the Minimum Network Requirements section).

Considering again the AdvSeek network, which is available as an adjacency matrix (`AdvSeek.adjm`

), it is imported with the following command:

In [2]:

```
from pyntacle.io_stream.importer import PyntacleImporter
adv = PyntacleImporter.AdjacencyMatrix("AdvSeek.adjm")
```

The imported `adv`

object is of type `igraph`

In [3]:

```
type(adv)
```

Out[3]:

The `adv`

object can thus be inspected by means of standard igraph methods:

In [4]:

```
adv.summary()
```

Out[4]:

`Octopus`

: A convenient wrapper of Pyntacle methods ¶`Octopus`

' methods are included in the `Octopus`

class of the `tools`

module. This is the list of all available `Octopus`

methods:

In [5]:

```
from tools.octopus import Octopus
dir(Octopus)
```

Out[5]:

When calling an `Octopus`

function, `Octopus`

will execute the corresponding Pyntacle function, will assign a new attribute to the target graph and will return the result of the called function. For example, let us suppose one wants to compute the *average degree* of the AdvSeek network.

In [6]:

```
Octopus.average_degree(adv)
```

This function will trigger the computation of the average degree, will create an `average_degree`

attribute for the graph `adv`

and will set the result (3.4375 for the AdvSeek Network) to it:

In [7]:

```
adv.attributes()
```

Out[7]:

In [8]:

```
adv["average_degree"]
```

Out[8]:

`Octopus`

: Compute key-player metrics for a given set of nodes ¶With the aim to reproduce Borgatti's results regarding the network fragmentation on the AdvSeek network, as already discussed in the Command-line section, we will use `Octopus`

.

In [9]:

```
Octopus.kp_dF(adv, nodes=["HB", "WD"])
```

where `nodes`

is a list of names of the nodes belonging to the kp-set.

`Octopus`

will add a new attribute, `dF_info`

, to the graph `adv`

. This attribute is actually a dictionary, where each item is a key-value pair made by a (sorted) tuple of nodes (the kp-set) and the corresponding dF value.

In [10]:

```
adv.attributes()
```

Out[10]:

and

In [11]:

```
adv["dF_info"]
```

Out[11]:

dF is a relative metrics, in the sense that the effect of removing a set can be appreciated if compared with the fragmentation level of the original network. The initial dF value of a network can be computed by using the `dF`

method:

In [12]:

```
Octopus.dF(adv)
```

The initial dF is stored in the corresponding `dF`

attribute

In [13]:

```
adv["dF"]
```

Out[13]:

Now, we can conclude that removing HB and WD will result in a fragmentation increase of 17%.

`Octopus`

: Greedy-optimization search ¶The previous greedy-optimization search method, which was performed by the `kp-finder`

command line command, can be replicated calling the `GO_mreach`

method.

In [14]:

```
Octopus.GO_mreach(adv,k=2, m=1)
```

Like in the previous section, `Octopus`

adds an attribute to the graph that stores a dictionary of node names (*kp-set*) and centrality measures. In this example, the attribute will be called *mreach_1_greedy*, since the centrality metrics is the *m-reach* with `m=1`

argument and obtained with a run of the greedy-optimization search algorithm.

In [15]:

```
adv.attributes()
```

Out[15]:

The name of the attribute says that the m-reach metric with *m = 1* was computed with the greedy-optimization algorithm. This will ensure that all the following greedy-optimization searches for m-reach of distance 1 will be stored in this variable, keeping tracks of all the runs.

As before, we can obtain the key-values:

In [16]:

```
adv["mreach_1_greedy"]
```

Out[16]:

`Octopus`

: Brute-force search ¶Finally, the brute-force search method performed using the command-line can be replicated with Octopus using $5$ CPU cores.

In [17]:

```
Octopus.BF_mreach(adv, k=3, m=1, nprocs=5)
```

Again, a new attribute storing the results is added to the `adv`

graph. This attribute is stored with the name `mreach_1_bruteforce`

. It will contain a tuple of $3$ elements storing the node names as `key`

and the corresponding m-reach values.

In [18]:

```
adv["mreach_1_bruteforce"]
```

Out[18]:

`igraph.Graph`

object ¶Any igraph.Graph object can be saved to text or binary files (we refer again to the File Formats Specifications page). Additionally, graphs can be exported with all their attributes. The export utilities are implemented in the `PyntacleExporter`

class of the `io_stream`

module.
Let us export the `adv`

network to a binary file.

In [19]:

```
from pyntacle.io_stream.exporter import PyntacleExporter
PyntacleExporter.Binary(adv, "AdvSeek.graph")
```

This will create a file named *advseek.graph* in the current directory.

`Octopus`

: Brute-force search ¶The same functions performed with `Octopus`

can also be performed using the Pyntacle's APIs (`algorithms`

and `tools`

modules). These methods rely on an array of enumerators by which specifying:

- the key-player metrics to be calculated;
- the computing modes for some of the Pyntacle’s methods (i.e., serial, parallel CPU, GPU);

These enumerators are implemented in the `tools`

module and can be imported as:

In [20]:

```
from tools.enums import KpposEnum #this enumerator stores all the reachability metrics
from tools.enums import CmodeEnum #this one handles all the computing modes
```

`KpposEnum`

currently includes two reachability metrics: **m-reach** and **dR**.

In [21]:

```
dir(KpposEnum)
```

Out[21]:

`CmodeEnum`

contains four values: `auto`

, `igraph`

, `cpu`

and `gpu`

; `auto`

is the default choice and commands Pyntacle to choose the best computing mode, according to the specific features of the graph is working on. The `igraph`

option is chosen to run the serial versions of the Pyntacle algorithms; `cpu`

is used to enable *fine-grained*, multi-threading, parallelism on the enumeration of shortest-paths by relying on just-in-time compilation of Pyntacle's methods by Numba. Finally, `gpu`

is used to defer computationally heavy functions to be executed on CUDA-enabled NVIDIA graphics cards, if available. The value of this enumerator can be passed to the implementation parameter of any key-player methods.

In [22]:

```
dir(CmodeEnum)
```

Out[22]:

For example, the brute-force search algorithm executed with `Octopus`

can be replicated using the `BruteForceSearch`

method:

In [23]:

```
from algorithms.bruteforce_search import BruteforceSearch as bfs
```

In [24]:

```
bfs.reachability(graph=adv, cmode=CmodeEnum.igraph, metric=KpposEnum.mreach, m=1, k=3)
```

Out[24]:

Please contact us to leave a feedback.