In this blog, we will explain how to store bigger/ multiple images very easily into a MongoDB document using the Blosc module. By the method we are discussing here you can easily save the image along with its quality while storing and retrieving images from MongoDB. We will explain the entire implementation through Python. So, let’s talk about Can MongoDB store images.

Can MongoDB store images

Also Read: Uninstall MongoDB on Ubuntu: Few easy steps

The Standard Way: Can MongoDB store images

The standard way of storing images in MongoDB is very simple. All you need to do is convert binary images into Base64 format and then storing them into MongoDB. In the standard way, the Base64 conversion will increase the size of the images by 3 to 4 times. But if you want to store multiple/larger images in one MongoDB document using this method can cause many different problems. Because document exceeding maximum size and the maximum size allowed for a MongoDB document is 16MB.

A Walkthrough to Blosc

Blosc is a very high-performance compressor optimized for binary data. Blosc works very well for compressing numerical arrays.

This blocking method is used to reduce activity on the memory as much as possible. The blocking technique works by diving datsets in blocks that are small to fit in L1 cache of modern processor and perform compression and decompression there. It also makes use of the Single Instruction, Multiple Data (SIMD) and multi-threading capabilities present in multi-core processors so as to speed up the compression/decompression process.

python-blosc is a type of package that wraps Blosc. Only Python 3.6 or above are supported by python-blosc.

However blosc can be used on binary data. But here in this blog we are going to look on using blosc to compress numpy arrays as this completes are two purposes:

  • Storing numpy arrays itself into MongoDB.
  • Storing images (in numpy array format) into MongoDB.

Installing Blosc:

In order to install python-blosc package easily using the Conda, to do that use the code given below.

$ conda install -c conda-forge python-blosc

If you want to install python-blosc via pip use the following command:

$ pip install blosc

Important: The library python-blosc is directly dependent on cmake and scikit-build packages. So, it is very important that you install these packages first. In order to do that use the command given below.

$ pip install cmake
$ pip install scikit-build

Compressing Numpy Array using Blosc

The first thing you need to do is to convert images into numpy array. After that you may need to use the pack_array function in blosc to compress the numpy array. The steps are as following.

The different modules which are to be imported are given below:

import numpy as np
from numpy import asarray
from PIL import Image
import blosc

1: Opening the image using Pillow module.

img = Image.open(path/to/image)

2: After that converting the image into numpy array using the asarray function.

image_array = asarray(img)

3: Now compressing the numpy array using the pack_array function of blosc module.

compressed_bytes = blosc.pack_array(image_array)

Once you execute this function numpy array will be converted into compressed bytes. Now you can store the new compressed bytes with ease into MongoDB document. Also, the compression keeps the image data unaffected. The actual size of the image is also minimized to considerable extent without the image data getting distorted.

Decompressing Numpy Array using Blosc

On retrieving the compressed bytes from MongoDB document, you need to get back the image from compressed bytes.

1: To perform decompression you have to make use of unpack_array function in blosc module to convert compressed bytes to numpy array.

decompressed_array = 
blosc.unpack_array(compressed_bytes)

2: Now from the numpy array that we got in the previous step, you can retrieve the image object using fromarray function.

im = Image.fromarray(decompressed_array)

3: Now you can store the object as an image file of desired extension using the save function.

im.save("filename.png", quality = 95)

When you execute these functions you can easily retrieve compressed bytes. You can change the decompressed back to numpy array and restore the image. The quality and size of the image will be same as the image before compression.

Also, the decompression and compression can happen very quickly. The pack_array andunpack_array methods use pickle and unpickle respectively, behind the scenes.

Conclusion:

There are many different types of decompression and compression techniques that blosc module supports. Here we have provided one of method where you will understand can MongoDB store images. So, here is everything you need to know about the Can MongoDB store images. Hope you find this information useful. Thank you for the read.