Derivative calculation is carried out by applying the Savitzky-Golay algorithm. In this method n-th order derivatives are obtained while data are smoothed at the same time to reduce the noise. First or second order derivatives can be calculated including 5 to 25 smoothing points. Please note that derivatives are taken in the spectral domain, only. Details of the Savitzky-Golay algorithm can be found in the literature:
A. Savitzky and M. Golay. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Anal. Chem. 1964 Vol 36(8):1627.
Any type of data blocks can be handled (including also derivatives). Derivative spectra are stored in a data block reserved exclusively for derivative spectra. If this block is not empty the data are overwritten without warning when obtaining derivatives again (see also Internal Data Organization, Table II )). Derivative calculation is always carried out on one complete 3D spectral data block.
Select the source data block by clicking the appropriate radio button, then select the number of smoothing points and the order of the derivative. To finally obtain the derivatives click on the 'derive' button.
Parameter used for obtaining derivative spectra can be stored within the program workspace and are accessible in the File Info menu (File Info --> File Manipulations --> derivatives). These parameters are also shown in the command line window.
The CytoSpec 'normalization' subroutine offers three different methods of spectra normalization:
1. Offset correction
Offset correction performs are linear correction of the complete spectrum such that at least on point of the spectral region indicated equals zero. Spectra are not scaled in this mode.
2. Min-Max normalization
3. Vector normalization
4. SNV (standard normal variate)
Min-Max normalization indicates that all spectra of the source data block are scaled between zero and one, that is the maximal absorbance value of the spectrum in the selected spectral region equals one, the minimum 0. You can use this normalization method to perform a simple band normalization (e.g. for the amide I band).
Vector normalization is carried out in the following way: spectra are first mean-centred, i.e. the average value of the absorbances is calculated for the spectral region indicated. This value is then subtracted from the spectrum. Then, the spectra are scaled such, that the sum squared deviation over the indicated wavelengths equals one.
Standard Normal Variate (SNV) A standard normal variate is a normal variate with mean �=0 and standard deviation sigma=1. SNV normalization is achieved by dividing mean-centred spectra by the standard deviation over the spectral intensities giving the resulting spectra a unit standard deviation of one.
To perform normalization select first the source data block by activating the appropriate radio button (see also Internal Data Organization, Table I ). The target data block will be specified in the gray field of the normalization window. Please note that the data of the target data block will be overwritten without warning! Then, select the type of normalization and use the keyboard to enter the wavenumber values between which the spectra the spectra will be normalized. To start normalization click onto the 'norm' button. If you wish to cancel the operation press 'cancel'.
Parameters used for normalization such as spectral range are stored within the program workspace and are accessible through the File Info menu (File Info --> File Manipulations --> type of data block). These parameters are also shown in the command line window.
CytoSpec's 'cut' subroutines offer two different methods to cut hyperspectral data sets:
- Cutting in the spectral domain, and
- Crop images in the spatial domains.
Cutting in the spectral (z)- dimension can be used to narrow the frequency range of spectral data files. This may be useful to free some memory before memory-consuming calculations such as 3D Fourier self deconvolution are carried out. Define the frequency range to be kept, then click on the 'cut' button to start the function. By pressing the 'cancel' button you can cancel the operation.
Note that the 'cut/crop' function overwrites all existing data blocks (see also Internal Data Organization, Table IV).
The parameter used for 'cut' are stored within the program workspace and are accessible through the File Info pull down menu (File Info --> File Manipulations --> type of data block). These parameters are also shown in the command line window.
CytoSpec's 'interpolation' routines offers two different methods of interpolation:
- interpolation in the spectral domain, and
- interpolation in the spatial domains.
Interpolation in the spectral (z)- dimension changes the spacing between spectral data points. The spacing can be increased or decreased by the 'interpolation factor', which can vary between 1/32 and 32. For example, if a factor of 4 is chosen, the number of data points is increased by a factor of 4, i.e. one frequency interval is filled with (4-1) additional data points. In this case the program performs an one-dimensional interpolation of the spectra. Using a large interpolation factor (e.g. 32) the number of data points of the new spectrum may become rather large. The actual number of data points depends on the start and end frequency and the frequency interval of the original spectrum.
If a factor smaller than 1 is chosen, the data point spacing is decreased. For example, if a factor of 0.25 is chosen, the number of data points is decreased by a factor of 4, i.e. four frequency intervals are merged into one wavelength interval. Consequently, spectral information is lost. Interpolation is useful to reduce the noise or to free some memory before memory-consuming calculations such as 3D-Fourier self deconvolution (3D-FSD) are carried out.
The 'interpolate' function overwrites all existing data blocks (see also Internal Data Organization, Table III).
The parameter used for interpolate are stored within the program workspace and are accessible through File Info menu (File Info --> File Manipulations --> type of data block). These parameters are also shown in the command line window.
Smoothing: This function is used to smooth spectra, using either the Savitzky-Golay, or the average smoothing algorithm. Possible values for smoothing points are 5 to 25. Select the source data block as usual, choose the number of smoothing points and click the 'smooth' button to start the operation. Smoothing has a mostly cosmetic effect on the spectrum, reducing the noise at the expense of distorting the signals.
Details of the Savitzky-Golay algorithm can be found in the literature:
A. Savitzky and M. Golay. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Anal. Chem. 1964 Vol 36(8):1627.
The parameters used for smoothing are stored within the program workspace and are accessible through File Info menu (File Info --> File Manipulations --> type of data block). These parameters are also shown in the command line window.
TR <--> ABS conversion: This function performs the conversion from transmission spectra to absorbance spectra and vice versa. Note: The function ABS <--> TR acts on the complete data block of original spectra and overwrites the data block of original spectra. Furthermore, all other types of data will be deleted.
For the conversion of absorbance spectra to transmission spectra the following formula is used:
Formula, used to obtain absorbance spectra from transmission spectra:
Dispersion correction: This function performs the correction of dispersion artefacts from transmittance spectra (function written by Dr. Melissa Romeo).
The correction of dispersion artefacts is carried out in the following way:
- Transmittance spectra are linearly interpolated such that they contain 2^n data points (256, 512, ... , 16384)
- These spectra are then Fourier-transformed.
- The first half of the Fourier transformed function is inversely Fourier-transformed. This operation produces a complex function containing a real and an imaginary part.
- Negative terms in the imaginary component are compensated.
- Squares of the real and imaginary terms are taken.
- The squared terms are co-added.
- The root of the sum yields a phase-corrected spectrum.
- Transmittance spectra are back-interpolated and converted into absorbance spectra. They are stored in the data block of preprocessed data.
Quality Test: The function 'quality test' implemented in the CytoSpec software comprises five distinct checks for spectral quality:
Data organisation: the quality tests are performed exclusively on the data block of original spectra. Spectra that have passed the tests are copied without modifications into the data block of preprocessed spectra. Note that existing data of this block are overwritten without warning. If the quality test of a given spectrum is negative, the respective field in the preprocessed data block is replaced by NaN (Not a Number). In this way, spectra tested for poor quality are excluded from further evaluations and will appear in the hyperspectral images as black areas.
- a test for spectral signs of water vapor
- the check for sample thickness (integrated intensity)
- the test of the spectral signal-to-noise-ratio
- a check called 'test for an additional band'
- a 'bad pixel' test (a tool to eliminate spectra from dead pixels of focal plane array detectors
If you wish to perform a quality test on preprocessed spectra, for example a sample thickness test after baseline correction, you have to use the Swap Data Block function of the 'tools' pull down menu. This function enables you to overwrite the data block of original spectra by preprocessed spectra.
To enable a test, check the appropriate checkbox and specify the quality test parameters such as absorbance thresholds. Press the 'test' button to start the quality test or hit 'cancel' if the test should be canceled. The parameters of the test for spectral quality and details of the test results can be found in the File Info menu (File Info --> File Manipulations --> preprocessed). These parameters are also displayed in the command line window.
1. Test for water vapor:
Sharp water vapor absorption bands can be found in the spectral region between 1300 and 1800 1/cm, a region where many biomaterials exhibit also strong absorption bands. It is therefore recommended to use water vapor bands above 1750 1/cm for testing. Indicate the precise positions of two water vapor bands which should be utilized for testing and define an absorption threshold criterion. If the absorption of one of the bands is higher than the specified criterion, the test result for the given spectrum will be negative, and the spectrum will be eliminated.
2. Integral absorption as a measure for sample thickness:
The absorbance, integrated over a large spectral region, can be used as a rough measure of sample thickness in transmission type measurements. As many multivariate imaging techniques such as HCA or ANN imaging require a consistent level of the SNR throughout the map, spectra with too low absorptions have to be excluded from further multivariate analysis. On the other hand you may want to eliminate also spectra showing intense signals. This could be the case where the Beer-Lambert law is not obeyed (total absorption, non-linear detector response, etc.)
In order to apply the 'sample thickness' criterion indicate the spectral region to be used for obtaining the integral. Next, define a upper and a lower threshold for the integral (edit field lower/upper limit). Check the appropriate checkbox to enable the test. A spectrum has failed the sample thickness test if an integration value is determined which is higher or lower than the defined thresholds.
3. Signal/noise ratio (SNR):
This test allows the signal-noise-ratio for individual spectra to be calculated, and to eliminate those that do not fulfill a threshold SNR ratio. Indicate the spectral regions to be used for defining the noise and signal, respectively. For biomedical samples, it is recommended to obtain the signal in the amide I region (1600 - 1700 1/cm) and the noise in the region between 1800-1900 1/cm. Also indicate the SNR threshold and check the checkbox for the SNR test. Spectra are rejected if the SNR is lower than the threshold.
Noise: the standard deviation in the defined spectral range:
Signal: the maximum ordinate value in the defined wavenumber range
4. Test for an additional band:
This test is useful to exclude spectra from the data set that contain an artifact band (example: regions of a tissue section contaminated by tissue embedding medium). Indicate a typical band position (carbonyl esters of tissue freezing medium: 1746 1/cm) and an absorbance threshold (edit field criterion). Spectra with a higher absorbance at this frequency will be eliminated.
5. Elimination of 'bad' pixel from FPA data:
Most of the focal plane array (FPA) detectors have so-called 'dead pixels', i.e. detector elements with zero response to IR radiation. The spectral information at these FPA elements is usually replaced by the camera software with interpolated data from pixel neighbors. If you wish to remove interpolated spectra from the data set, you have to create a simple text file, which should contain the dead pixel (x,y) positions. The text file can be loaded by activating the appropriate check box. Spectra at the given positions are then replaced by NaNs (not a number), i.e. excluded from all subsequent calculations.
Please note: Please use the function Define Spectral Regions to define sample areas in which spectra should be excluded from further analyses.
Water vapor compensation: This function permits to automatically subtract a water vapor spectrum from the measurement data such that the spectral effects of water vapor are minimized.
The water vapor correction routine works as follows:
- A second derivative spectrum of a pure water vapor absorbance spectrum is obtained.
- Then, a second derivative spectrum is calculated from the sample spectrum.
- Depending on your selection, up to 4 separate y-values at defined spectral positions are obtained for both derivative spectra.
- The water vapor correction factor is calculated by dividing the respective y-values of the water vapor and the sample spectrum. If more than one y-value was selected, the final water correction factor is the average of the ratios.
- Finally, the sample data are corrected by subtracting the original water vapor spectrum, which was weighted by the water vapor correction factor.
water vapor correction of derivative spectra: If you wish to perform water vapor compensation on derivative spectra, you have to make sure that spectra are 2nd derivative spectra and that derivative calculations are carried out by choosing 5 smoothing points in the Savitzky-Golay algorithm. The algorithm described above will not work if these two preconditions are not fulfilled.
number of vapor bands: Please choose the number of water vapor bands on which the spectral compensation for water vapor bands should be carried out.
edit fields 1-4: Enter the correct positions (in wavenumbers) of water vapor bands. Please note that the band positions may slightly differ from instrument to instrument (calibration) and also as a function of the temperature.
Source block: Here you can choose the type of the source block for water vapor compensation.
load vapor file: Permits to load a double column ASCII water vapor spectrum. If the file could be successfully loaded the directory and the file name are displayed and the button 'correct' becomes activated.
correct: Starts the spectral water vapor correction routine.
cancel: The routine is aborted.
data organization (source and target data blocks): Any type of data blocks (except 3D-FSD data) can be handled (including also derivatives). If the source block is of type of original spectra, or preprocessed spectra, the data are stored in the data block of preprocessed spectra. If this block is not empty the data are overwritten. Water vapor compensated derivative spectra are stored in the data block of derivative spectra (existing data are also overwritten without warning, see also Internal Data Organization, Table II). The water compensation is always carried out on the complete 3D spectral data block.
note: In order to spectrally compensate for water vapor one have first to produce a double column ASCII spectrum of water vapor (for details of the data format see spectra vap_cut.dat or wap_full.dat; both spectra can be found in the directory CytoSpecRootDir/Testdata/watervap/.
Upon loading the external spectrum is automatically adapted such that its data point spacing and its frequency range fits that of the sample data:
- It will be interpolated (if the point spacing is different), cut (broader frequency range), and/or extrapolated (narrower range).
- Extrapolation is achieved by using the closest absorbance value to fill missing data points.
In the water vapor testdata directory (CytoSpecRootDir/Testdata/watervap/) you can find a test file named 'watervap.mat'. The first data block of this file (original data) contains the original absorbance spectra. Water vapor corrected IR absorbance spectra are found in the second data block of preprocessed spectra. Original spectra are corrected by using the file 'vap_full.dat'.
PCA based noise reduction: This function can be used to reduce spectral noise. Noise is eliminated by performing principal component analysis (PCA) of the image data and re-assembling of spectra on the basis of a selection of principal components (low order PCs) . In this way, higher-order principal components that are supposed to contain mainly 'noise' are omitted.
PCA based noise reduction can be carried out on the basis of original or preprocessed data sets. The target data block will be always the data block of preprocessed data.
Important: Please carefully use this preprocessing routine! The decision which of the PCs can be omitted is highly subjective and may cause spectral artifacts.
The algorithm has been adapted from a suggestion of Dr. Spragg R. (PerkinElmer) "Addressing Problems in Data Reduction for FT-IR Images of Biological Samples" (Oral Contribution). RISBM - Raman and IR spectroscopy in Biological Medicine. Feb 29-Mar 02, 2004. Friedrich-Schiller-University, Jena, Germany.
Cosmic spike removal: The cosmic spike removal tool allows the user to remove cosmic ray features from the spectral (Raman) data. The function can be chosen from the preprocessing pulldown menu. When this function is activated a dialog box shows up which allows the user to change parameters of the cosmic spike removal filter.
With this filter the removal of cosmic spikes is carried out in the following way:
- Smoothing of the spectral data of choice (original or preprocessed) in the spectral dimension by a 7-point Savitzky-Golay algorithm.
- A 3D-array of difference values between the un-smoothed and smoothed data is obtained.
- This array of difference values is normalized by dividing it by the mean standard deviation of the complete array.
- Cosmic spikes are now obtained by a systematic analysis in the spatial domains. For this purpose, maxima are obtained in each of the image planes of the normalized array of spectral differences. In this context, the parameter 'sensitivity' of the cosmic spike removal dialog box is used to define a threshold above which Raman intensity difference value supposedly indicate the presence of cosmic rays. The higher the sensitivity the lower this threshold.
- In the next step, the spatial coordinates of spectra with cosmic spike candidates are determined. Cosmic spikes are excised by replacing them with Raman intensities from neighboring frequencies (spectral domain). The parameter 'spikes width' of the dialog box defines the width of the excised spike in points.
- The cosmic spike removal tool can be applied to data block of original or pre-processed data. In both cases the spike-corrected Raman data are written into the data block of pre-processed spectra. Note that existing pre-processed data are overwritten without warning.
source block: please select the type of data block you wish to correct
sensitivity: used to define a threshold above which Raman intensity difference values supposedly indicate cosmic spikes. The higher the sensitivity the lower the threshold.
spikes width: defines the width of the excised spike in data points.
verbose mode: displays more details of spike removal function.
remove: starts the cosmic spike filter on the data block of choice.
cancel: closes the dialog box.
Batch preprocessing: This function permits automated preprocessing of hyperspectral data. When this option is chosen one will be asked to indicate a predefined macro
file (*.cbt -CytoSpec batch) which should be generated (and tested) before. CytoSpec batch files can be prepared by simple text editors like Wordpad, or
Notepad. It is important to store the *.cbt file in a simple text format. Do not use special characters or format tags!.
CytoSpec's batch processing files contain different sections, also called blocks. Each block starts with one of the following (capitalized) three-letter codes:
DER - Derivative
NRM - Normalize
CUT - Cut / Crop
INT - Interpolate
SMO - Smoothing
ATR - ABS --> TR conversion
TRA - TR --> ABS conversion
QAL - Quality tests
BAS - Baseline correction (SavGol)
BMI - Baseline correction from minima
ALS - Baseline correction by asymmetric least squares
LBS - Subtract linear baseline
WVC - Water vapor correction
SWA - Swap data blocks
CSR - Cosmic ray correction
PNR - PCA-based noise reduction
EPD - Edge-preserving denoising
FLT - Filter images
FSD - 3D Fourier self-deconvolution
Most of the blocks contain a number of parameters required for pre-processing hyperspectral imaging data (such as type of source or target datablock, wavenumber regions, etc.).
These parameters are mandatory and must be indicated by a sequence of a three letter code followed by a numeric value and a space character for separation. It is important to note also
the comments given after the '#' character at each line. These comments contain descriptions of the pre-processing parameters and provide allowed selection values of these parameters
(usually in the following format: [5-7-9-11-13-15-17-19-21-23-25]).
Note also that each of the blocks must be terminated by a line containing the code 'END'.
IMPORTANT: The sequence of preprocessing steps is given by the sequence of blocks in the batch file. To omit pre-processing functions, it is sufficient to comment out the respective block by
setting the '#' character (number sign, or hash sign) at the first position of the line containing the three-letter block code. Please refer also to the online help or to the
example file that comes with CytoSpec's installation CD / USB drive.
Example of the block 'CUT' in a CytoSpec batch (*.cbt) file:
# --------------- CUT --------------------------------------------------
TYP 1 # type of cutting (1-spectral, 2 spatial dimension)
WV1 1000 # first wavenumber for cut in spectral dimension
WV2 1800 # last wavenumber (WV2 larger than WV2!)
XD1 1 # cut, spatial dimension x : first pixel to keep
XD2 10 # cut, spatial dimension x : last pixel to keep
YD1 1 # cut, spatial dimension y : first pixel to keep
YD2 10 # cut, spatial dimension y : last pixel to keep
# some lines with comments may follow
next block ...
A detailed example of a CytoSpec batch file is given here: preproc.cbt