Scripting and automation

Image analysis using PMA.start

The PMA.core.lite API allows any REST or SOAP enabled client to fetch image information as well as pixel data. This enables us to run image analysis algorithms using Python and OpenCV.

Fetching image snapshots from PMA.core.lite using Python

The following example uses Python 2.7 to connect to the locally installed instance of PMA.core.lite and downloads snapshots for all the images that exist in a directory of the system. Assuming that Python 2.7 is already installed in your system, the only other requirement to run the example is to install a SOAP client. Here we use suds which can be installed, if it’s not already, by issuing the following command at the command prompt:

pip install suds

PMA.core.lite runs at http://localhost:54001/. The WSDL definition of the SOAP API can be found at http://localhost:54001/api?WSDL

In order to fetch snapshots for all the images in a specific directory, we first have to get the list of images in that directory. Assuming that our slides reside in the directory C:\Whole Slide Images, we can get the list by calling the GetFiles API method as follows:

from suds import WebFault
from suds.client import Client
import urllib
client = Client("http://localhost:54001/API?singleWsdl")
client.set_options(cache = None)
files = client.service.GetFiles(path = "C/Whole Slide Images") # notice how all back slashes have to be replaced by slashes and the colon (:) has to be removed from the path

Then, to fetch image information (such as width, height, resolution etc.) for any image, we have to call the GetImageInfo API method. Finally we can download a snapshot from the /region route, providing the required parameters. Here’s an example:

for i, filename in enumerate(files):
	info = client.service.GetImageInfo(filename)

	# Finally, to get the actual snapshot we have to decide the size of the output image. Suppose that we want to get a snapshot that is at most 1000 pixels wide or tall. We can do this like this:
	width = info.Width
	height = info.Height
	scale = 1.0

	# find the appropriate scale but also maintain aspect ratio
	if (width >= height):
		scale = 1000 / width
	else:
		scale = 1000 / height

	# build the url to get the image snapshot
	# parameters:
	# pathOrUid: the path of the image we want to retrieve
	# timeframe: the index of the timeframe we wish to load. Usually 0, unless we are dealing with images that contain time series.
	# layer: the index of the layer we wish to load. Usually 0, unless we are dealing with images that Z-stacks.
	# x: the top left pixel x coordinate of the image at which we want the snapshot to start at. This value is expressed in base level scale.
	# y: the top left pixel y coordinate of the image at which we want the snapshot to start at. This value is expressed in base level scale.
	# width: the width of the image we want to read. This value is expressed in base level scale.
	# height: the height of the image we want to read. This value is expressed in base level scale.
	# scale: the factor by which we want the width and height parameters to be scaled by. The resulting image will be width * scale pixels wide and height * scale pixels tall
	parameters = { 'pathOrUid' : filename, 'timeframe': 0, 'layer': 0 ,'x': 0,'y': 0, 'width': width, 'height': height, 'scale': scale }

	# we call the /region route with the appropriate parameters to fetch our image
	url = "http://localhost:54001/region?" + urllib.urlencode(parameters)

	# finally we can save it locally
	urllib.urlretrieve(url, "C:\snapshots\{0}.jpg".format(filename.split('/')[-1]))

Using OpenCV to calculate the edges of the tissue on a slide

As displayed in the example above, we can fetch snapshots of any size from PMA.core.lite. Once we have a snapshot we can further process it with OpenCV for example to calculate the boundaries of the tissue.
In order to run this example the following libraries need to be installed along with Python 2.7:

  • numpy
  • matplotlib
  • opencv

Windows users can download these binaries as .whl files from here: www.lfd.uci.edu/~gohlke/pythonlibs/

Once the correct whl files have been downloaded they can be installed using pip.

The steps to calculate the contours for the tissue in a whole slide image, presented in this example are:

  1. Fetch an image snaposhot from PMA.core.lite (as illustrated in the snapshot example above)
  2. Convert the image’s color space from RGB to HSV
  3. Perform a gaussian blur
  4. Calculate a threshold value to convert the image to black & white using Otsu’s method
  5. Find the contours using OpenCV
  6. Plot the contours on top of the image

Example output

Digital slide image analysis
Digital slide image analysis

Here you can download the source code in Python for the image analysis examples.

To run the examples modify the code accordingly to point to a directory in your system that contains whole slide images and issue:

python SnapShooter.lite.py
python AutoAnnotator.lite.py

Using PMA.start with CellProfiler

In addition to the first example, where we displayed how to fetch a snapshot of an image from PMA.start using Python, we can extend the example slightly so that we can feed the snapshot image to CellProfiler for further analysis.

CellProfiler can be invoked by the command line and instructed to execute a pipeline against a particular image. To achieve this, first we have to download an image from PMA.start and create an accompanying CSV file for it that contains required meta data. A simple case for an RGB image is to print the path to the image file inside the CSV.

For example if we want to feed the image at c:\slides\slide1.svs to CellProfiler, we first fetch a snapshot using the attached Python program by issuing:

python CellProfiler.py c:/slides/slide1.svs

This will fetch a snapshot of the image and save it as slide1.jpg in the current directory. It will also create a CSV file named input.csv, again in the current directory.

Next we invoke CellProfiler, provided that we have our pipeline ready, in the following manner:

CellProfiler.exe -c -r --data-file input.csv -o /output_directory -p my_pipe_line.cppipe

This will start CellProfiler in headless mode and execute “my_pipe_line.cppipe” against the image we fetched. The results will be stored in a folder named “output_directory”.

Click here to download the script to load images from PMA.start into CellProfiler