Everything you need to know about Conda in Python


Many people have a misconception that Conda is a distribution or more particularly; a python package manager. Both Statements though are not completely wrong but neither do grasp the whole scenario. Conda, first and foremost is open-source, cross-platform package and environment manager which was originally built to regulate difficult package management issues and is a popular Python/R manager. It is released under the Berkeley Software Distribution License by Anaconda Inc.

Conda as a package manager helps you find and install packages. It provides the utility that, if you need a package that requires a different version of Python, you do not need to switch to a different environment manager.Now, lets understand its working and basic terminologies in Conda.

Installation

There are three methods of getting conda:-

  1. Installing Miniconda. Miniconda is a free,open-source,small bootstrap version of Anaconda which contains python,conda and a few other packages.
  2. Installing Anaconda. Anaconda is the most popular Python/R distribution which contains over 750 automatically installed packages along with conda and more can be installed using conda install command.
  3. If you have already installed Python or any other package manager, then just install Miniconda or Anaconda and let the installer add the conda installation of Python to your PATH environment variable without un-installing other packages.

Conda commands cheatsheet

Conda provides with a basic set of commands and these are essential in acting as the interface between the user and software.these commands can be used for
creating and modifying environments, managing queries , installing and deleting packages, defining dependencies, etc. Given below is the table (by OpenGenus) for most used and basic commands of conda:

Application Code
Install a package conda install $PACKAGE_NAME
Update a package conda update --name $ENVIRONMENT_NAME $PACKAGE_NAME
Update package manager conda update conda
Un-install a package conda remove --name $ENVIRONMENT_NAME $PACKAGE_NAME
Create an environment conda create --name $ENVIRONMENT_NAME python
Activate an environment conda activate --name $ENVIRONMENT_NAME
Deactivate an environment conda deactivate
Search available packages conda search $SEARCH_TERM
Install a package from specific source conda install --channel $URL $PACKAGE_NAME
List of installed package conda list --name $ENVIRONMENT_NAME
Create requirements file conda list --export
List all environments conda info --envs
Install other package manager conda install pip
Install Python conda install python=x.x
Update Python conda update python

Further command more specified to the task such as management of R packages, dependencies etc. can be found using help in conda or -h.

Conda Environments

A conda environment is structured as a directory which contains packages along with their dependencies and is the interface where the user analyzes different kind ao data as per its use . This system gives the user the utility that if an environment has a different version of Python and packages,you can create and utilize a new environment without changing or deleting the former environment. You can switch between environments by activating and deactivating environment or even give someone else your environment structure by sharing your .yaml file.

Creating and managing environments
To create an environment:

conda create --name myenv

to create an environment with a specific version of Python:

conda create -n myenv python=3.6

Create the environment from the environment.yaml file:

conda env create -f environment.yml

You can control where a conda environment lives by providing a path to a target directory when creating the environment.

conda create --prefix ./envs jupyterlab=0.35 matplotlib=3.1 numpy=1.16

You may also require to update your conda environment.The most common reasons for updating your conda environments are:-

  1. You have found a better package for data extraction and analysis.
  2. A new version of the core dependencies of the package has been released.
  3. The package is no more useful .
$ conda env update --prefix ./env --file environment.yml  --prune

To see a list of all of your environments, in your terminal window or an Anaconda Prompt, run:

conda info --envs

A list similar to the following is displayed:

conda environments:
myenv                 /home/username/miniconda/envs/myenv
snowflakes            /home/username/miniconda/envs/snowflakes
bunnies               /home/username/miniconda/envs/bunnies

Further instructions for creating similar environments , activating and deactivating environment, cloning etc. and be found using help or -h.

Conda Packages

Conda packages include a compressed tarball of files which contain system level libraries,language modules, metadata,executable files and clusters to be directly installed in the system excluding directories.

The structure of a package is pretty much the same across platforms.

Structure of a Package:

.
├── bin
│   └── pyflakes
├── info
│   ├── LICENSE.txt
│   ├── files
│   ├── index.json
│   ├── paths.json
│   └── recipe
└── lib
    └── python3.5

Packages are classified into various types depending on the files they contain, their architecture and use.

  • Metapackages are packages which do not contain any files and are basically used for capturing metadata and making complex packages simpler.One of the example of a Metapackage is Anaconda Installer which contains certain links from where the data is to be downloaded and dependencies for low-level libraries.
  • Noarch packages are packages which do not have a defined architecture and used to distribute source codes and docs to users. it can be built only once and are usually in python or generic.

Managing packages

  1. Check to see if a package you have not installed named "fatcat" is available from the Anaconda repository (must be connected to the Internet):
    conda search fatcat
  2. Install a package into the current environment:
    conda install [packagename]

Conda Channels

Conda channels are where bundles are put away. They fill in as the base for facilitating and overseeing bundles. Conda bundles are downloaded from URLs to registries containing Conda bundles. The Conda direction look through a default set of channels, and bundles are naturally downloaded and refreshed.
Downloading same packages from different channels cause a fault called a conda collision .Conda resolves the issue by removing the lower priority channel bundle and keeping the latter so as to not to override the core package.

How does Conda work internally

Now that we are aware of the various terms associated with conda, let us understand how it works internally.
when we install Conda for the first time or install a new package in conda, the package consists of metadata and tarball of files to be installed. A tarball is a jargon for a .tar archive , in which all the files are just groped together, not compressed.
For eg. , the basic structure of a conda package for python is given below.

info/
files
index.json
...
bin/
python
...
lib/
libpython.so
python 2.7/
...
...
...

the installer extracts the files into the pkgs folder and hard links files from meta.yaml.once the software knows what is to be installed it starts running tests for dependencies and faults. once it is installed, the software allows to set environments as per user choice and we are able to use the software to its full utility as a package manager.

Step by step flow-

  1. Downloading and processing index metadata.
  2. Reducing the index.
  3. Expressing the package data and constraints as a SAT problem.
  4. Running the solver.
  5. Downloading and extracting packages.
  6. Verifying package contents.
  7. Linking packages from package cache into environments.

Given below is a flowchart of how conda installs packages.
Annotation-2020-01-25-002013

Conda performance

A few methods can be implemented in improving the performance of the conda system likely-

  1. creating fresh environments as the older they grow the harder it becomes to resolve them. so creating small,dedicated environments can help in reducing compilation time.
  2. using specific packages instead of broad spectrum packages for use .
  3. setting strict priority controls . these will help in significantly reduce compilation time by removing mixed set of possible solutions.
  4. Another possible way of reducing compilation time is by disabling security checks, as conda spends a signification amount of the total compilation time resolving conflicts.But is is not recommended as it may crash your environment.

conda vs pip

Before comparing the pros and cons of conda and pip , let us understand the difference between them.

pip or Python package installer is the default package manager for python whine venv is the default environment manager . Conda provides both of these utilities
in a single package.
The major pros of conda over pip is that pip only allows python packages to be installed from PyPl while conda allows every package from all languages. pip does not have in-built support for maintaining its environments and has to depend on tools lie virtualenv for maintaining them, while conda allows packages to be utilized in isolated environments , making it an extremely valuable tool for data analytics.

Also pip fails to simultaneously work all dependencies of all packages installed, conda uses a resolver to make sure all the requirements of the packages are met.

Pros and Cons of conda

Pros:-

  1. The most significant pro of conda is the utility of managing isolated environments.As long as a package can be relocated, you can use it in multiple instances and independently for free.
  2. it can install packages in all from all available resources and can effectively maintain,manage and resolve conflicts among dependencies.
  3. Open-source ,free, multi-platform and language agnostic.
  4. It only installs binary scripts , so removes the use of a compiler.

Cons:-

  1. Since it only installs binary files leaving system packages, security patches are no longer available to you.
  2. Larger compilation time due to simultaneous running of packages.
  3. Conda fails to provide package provenance.

Read other Python related topics so that you can strengthen your knowledge.

I hope that this article is enough for giving a basic understanding about conda and gives a interest for it.