Scripting and automation

Image analysis using PMA.start

The PMA.core.lite API allows any REST or SOAP enabled client to fetch image information as well as pixel data. This enables us to run image analysis algorithms using Python and OpenCV.

Fetching image snapshots from PMA.core.lite using Python

The following example uses Python 2.7 to connect to the locally installed instance of PMA.core.lite and downloads snapshots for all the images that exist in a directory of the system. Assuming that Python 2.7 is already installed in your system, the only other requirement to run the example is to install a SOAP client. Here we use suds which can be installed, if it’s not already, by issuing the following command at the command prompt:

pip install suds

PMA.core.lite runs at http://localhost:54001/. The WSDL definition of the SOAP API can be found at http://localhost:54001/api?WSDL

In order to fetch snapshots for all the images in a specific directory, we first have to get the list of images in that directory. Assuming that our slides reside in the directory C:\Whole Slide Images, we can get the list by calling the GetFiles API method as follows:

from suds import WebFault
from suds.client import Client
import urllib
client = Client("http://localhost:54001/API?singleWsdl")
client.set_options(cache = None)
files = client.service.GetFiles(path = "C/Whole Slide Images") # notice how all back slashes have to be replaced by slashes and the colon (:) has to be removed from the path

Then, to fetch image information (such as width, height, resolution etc.) for any image, we have to call the GetImageInfo API method. Finally we can download a snapshot from the /region route, providing the required parameters. Here’s an example:

for i, filename in enumerate(files):
	info = client.service.GetImageInfo(filename)

	# Finally, to get the actual snapshot we have to decide the size of the output image. Suppose that we want to get a snapshot that is at most 1000 pixels wide or tall. We can do this like this:
	width = info.Width
	height = info.Height
	scale = 1.0

	# find the appropriate scale but also maintain aspect ratio
	if (width >= height):
		scale = 1000 / width
		scale = 1000 / height

	# build the url to get the image snapshot
	# parameters:
	# pathOrUid: the path of the image we want to retrieve
	# timeframe: the index of the timeframe we wish to load. Usually 0, unless we are dealing with images that contain time series.
	# layer: the index of the layer we wish to load. Usually 0, unless we are dealing with images that Z-stacks.
	# x: the top left pixel x coordinate of the image at which we want the snapshot to start at. This value is expressed in base level scale.
	# y: the top left pixel y coordinate of the image at which we want the snapshot to start at. This value is expressed in base level scale.
	# width: the width of the image we want to read. This value is expressed in base level scale.
	# height: the height of the image we want to read. This value is expressed in base level scale.
	# scale: the factor by which we want the width and height parameters to be scaled by. The resulting image will be width * scale pixels wide and height * scale pixels tall
	parameters = { 'pathOrUid' : filename, 'timeframe': 0, 'layer': 0 ,'x': 0,'y': 0, 'width': width, 'height': height, 'scale': scale }

	# we call the /region route with the appropriate parameters to fetch our image
	url = "http://localhost:54001/region?" + urllib.urlencode(parameters)

	# finally we can save it locally
	urllib.urlretrieve(url, "C:\snapshots\{0}.jpg".format(filename.split('/')[-1]))

Using OpenCV to calculate the edges of the tissue on a slide

As displayed in the example above, we can fetch snapshots of any size from PMA.core.lite. Once we have a snapshot we can further process it with OpenCV for example to calculate the boundaries of the tissue.
In order to run this example the following libraries need to be installed along with Python 2.7:

  • numpy
  • matplotlib
  • opencv

Windows users can download these binaries as .whl files from here:

Once the correct whl files have been downloaded they can be installed using pip.

The steps to calculate the contours for the tissue in a whole slide image, presented in this example are:

  1. Fetch an image snaposhot from PMA.core.lite (as illustrated in the snapshot example above)
  2. Convert the image’s color space from RGB to HSV
  3. Perform a gaussian blur
  4. Calculate a threshold value to convert the image to black & white using Otsu’s method
  5. Find the contours using OpenCV
  6. Plot the contours on top of the image

Example output

Digital slide image analysis
Digital slide image analysis

Here you can download the source code in Python for the image analysis examples.

To run the examples modify the code accordingly to point to a directory in your system that contains whole slide images and issue:


Using PMA.start with CellProfiler

In addition to the first example, where we displayed how to fetch a snapshot of an image from PMA.start using Python, we can extend the example slightly so that we can feed the snapshot image to CellProfiler for further analysis.

CellProfiler can be invoked by the command line and instructed to execute a pipeline against a particular image. To achieve this, first we have to download an image from PMA.start and create an accompanying CSV file for it that contains required meta data. A simple case for an RGB image is to print the path to the image file inside the CSV.

For example if we want to feed the image at c:\slides\slide1.svs to CellProfiler, we first fetch a snapshot using the attached Python program by issuing:

python c:/slides/slide1.svs

This will fetch a snapshot of the image and save it as slide1.jpg in the current directory. It will also create a CSV file named input.csv, again in the current directory.

Next we invoke CellProfiler, provided that we have our pipeline ready, in the following manner:

CellProfiler.exe -c -r --data-file input.csv -o /output_directory -p my_pipe_line.cppipe

This will start CellProfiler in headless mode and execute “my_pipe_line.cppipe” against the image we fetched. The results will be stored in a folder named “output_directory”.

Click here to download the script to load images from PMA.start into CellProfiler

Image segmentation using GNU Octave

With PMA.start, it is very easy to load whole slide images in GNU Octave and run image analysis algorithms.

To work with images, we first need to install the ‘image’ package from forge by issuing:

pkg install -forge image

In order to interact with the web API of PMA.start, we need a JSON parser for Octave. An easy to use one can be found here Extract it locally and load it in Octave by issuing:


Now we are ready to do some image analysis:

% define the path to a WSI. You can download a sample WSI here

% build the image info URL
infoUrl=['http://localhost:54001/api/json/GetImageInfo?pathOrUid=' imagePath];

% fetch image info as JSON

% parse it

% calculate a scale factor that produces an image at most 1500 pixels wide or tall, but don't scale up the image
scale=min(1500 / (max(imInfo.d.Width, imInfo.d.Height)), 1)

% download the image from PMA.start
imgUrl=['http://localhost:54001/region?pathOrUid=' imagePath '&width=' num2str(imInfo.d.Width) '&height=' num2str(imInfo.d.Height) '&x=0&y=0&scale=' num2str(scale) '&timeframe=0&layer=0';];

% read it
figure,imshow(wsi); title('WSI');

% convert the image to grayscale
figure,imshow(gray); title('Grayscale');

% blur it - the actual blur factor depends on the type of the image
blur=imsmooth(gray, 'Gaussian', 5);
figure,imshow(blur); title('Blur');

% run Otsu threshold

% convert the image to black and white
bw=im2bw(blur, th);
figure,imshow(bw); title('Black - white');

% run edge detection - multiple methods are available
edges=edge(mat2gray(bw), 'Canny');
figure,imshow(edges); title('Edges');

Input imageoctave-input



Click here to download here the complete Octave example.

Converting to Hierarchical Data Format (HDF5)

HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.

Using the API of PMA.start, one can convert a digital slide to an HDF5 file, where each pyramid level is stored as a three dimensional dataset of RGB values.

You can try this yourself by downloading an example python program that converts a slide to an HDF5 file.

Before you run the program, make sure that the following packages have been installed using the pip command:

- numpy
- requests
- h5py

To run the script issue the following command:
python "D:/MySlides/slide.mrxs"
Once conversion is complete, you can even visualize the dataset using Panoply:
Panoply output