This tutorial will present and explain the code behind the paper titled Scribble Based Interactive Page Layout Segmentation Using Gabor Filter which was presented in ICFHR 2016. Feel free to download the paper from the link here. It is found under publications section.
Note: The code below uses Qt and OpenCV libraries. Please refer to their API in case of some functions and data-structures are unfamiliar.
This method focuses on special manuscripts pages containing skewed notes. These notes are mainly written by students studying these manuscripts a thousand years ago. Due to this, the manuscripts now contain several texts written by different people, and the notes part is mostly written around the main text resulting in skewed lines.
You can download the data from here.
Instead of feeding the grab-cut algorithm the original image, we provide it, as input, an RGB image created from the Gabor filter bank.
The two-dimensional Gabor transform is defined as follows: \begin{equation*} \begin{split} g(x,y) = \exp(-\frac{x'+\gamma^2y'^2}{2\sigma^2 })cos(2\pi\frac{x'}{\lambda}+\psi) \\ \\ Where: x' = xcos(\theta) + ysin(\theta) \\ y' = ycos(\theta) - xsin(\theta) \\ \end{split} \end{equation*}
The Gabor transform parameters, and values used:
The wavelength of the cosine factor is determined as follows:
\begin{equation*} \begin{split} F_{L}(k) = \frac{1}{4} - \frac{2^{k-\frac{1}{2}}}{w} \\ \\ F_{H}(k) = \frac{1}{4} + \frac{2^{k-\frac{1}{2}}}{w} \\ \end{split} \end{equation*}
Where:
On the output of each application of Gabor transform, we apply a non-linear function as follows:
\begin{equation*} tanh(\alpha t) = \frac{1 - e^{-2 \alpha t}}{1 + e^{-2 \alpha t}} \end{equation*} Where:
Finally, we apply Gaussian smoothing on the results, where $a = 3$, and $b = 1$.
After the mathematical introduction, now we will look at the code of these equations. The code implementation can be found in marginalnotesdetection.cpp file, function createGaborBank().
We will begin with $\theta$ values and create an array for them. Please note that their quantity can be easily changed following your liking. We've used three values of theta in our case to generate an RGB matrix (channel for each theta).
ORIENTATION_COUNT = 3; // number of orientations (theta values)
QVector<double> thetas; //the orientation of the filter
for (size_t i = 0; i <= ORIENTATION_COUNT; i++)
thetas.push_back(i * M_PI / 4); //M_PI is pi
Next, we will calculate the values of the sinusoidal factor wavelength.
//calculating lambda's values // wavelength of sinusoidal factor
QVector<double> J;
for (size_t i = 0; i <= std::log2(fImage.cols / 8); i++) {
J.push_back(0.25 - ((qPow(2, i) - 0.5) / fImage.cols));
J.push_back(0.25 + ((qPow(2, i) - 0.5) / fImage.cols));
}
QVector<double> F;
for (size_t i = 0; i < J.size(); i++) {
double min = DBL_MAX;
int minIdx = -1;
for (size_t o = 0; o < J.size(); o++) {
if (min > J[o]) {
minIdx = o;
min = J[o];
}
}
F.push_back(J[minIdx]);
J[minIdx] = DBL_MAX;
}
QVector<double> lambdas;
for (size_t i = 0; i < F.size(); i++) {
lambdas.push_back(1.0 / F[i]);
}
Using three orientations, and 14 different wavelengths, we end up with 42 Gabor responses which will denote our filter bank. The array fGaborFilterBank will store the 42 Gabor responses. For each orientation, we combine the response matrices received from the filter, resulting in one matrix per orientation. By merging them together we end up with an RGB image showing the result.
int kernel_size = 9; // size of filter
double sigma = 1; // standard deviation of gaussian envelope
double gamma = 1; // spatial aspect ratio
double psi = 0; // phase offset
// 0 is the real parts of the gabor filter
// pi/2 is the imaginary parts
//Step 1: Gabor Filter Bank Generation
//now we generate the gabor filter bank from the parameters
for (double theta : thetas) {
cv::Mat matToPush;
int idx = 0;
for (double lambda : lambdas) {
cv::Mat dest;
cv::Mat kernel = cv::getGaborKernel(cv::Size(kernel_size, kernel_size),
sigma, theta, lambda, gamma, psi);
cv::filter2D(src_f, dest, CV_32FC1, kernel);
cv::Mat dest2;
if (idx == 0)
matToPush = dest;
else
matToPush += dest;
idx++;
}
cv::Mat dst;
fGaborFilterBank.push_back(matToPush);
}
Two steps left to do after generating the Gabor responses. We need to apply a non-linearity function, and afterwards a smoothing function. For non-linearity we apply tanh, and for smoothing, we use Gaussian filter.
//Nonlinearity
for (cv::Mat entry : fGaborFilterBank) {
for (int i = 0; i < entry.rows; i++)
for (int j = 0; j < entry.cols; j++)
entry.at<float>(i, j) = std::tanh(0.25*entry.at<float>(i, j));
}
//Smoothing
double b = 1;
for (size_t i = 0; i < fGaborFilterBank.size(); i++) {
cv::Mat dest;
cv::Mat dst;
double lambda = lambdas[std::floor(i / thetas.size())];
sigma = 3 * ((1 / M_PI) *
std::sqrt(std::log(2) / 2) * (qPow(2, b) + 1) / (qPow(2, b) - 1) * lambda);
cv::GaussianBlur(fGaborFilterBank[i], dest, cv::Size(2 * std::floor(sigma) + 1,
2 * std::floor(sigma) + 1), sigma);
fGaborFilterBank[i] = dest;
}
The first row of the images illustrates what the user sees when executing an interactive layout segmentation of a manuscript. Starting with left most image showing the manuscript page, and moving right each time shows one segmentation of page. Each time it segments and removes one paragraph of the manuscript page.
The second row illustrates what happens behind the scenes. Starting with the left most image showcasing the RGB image of superimposing the Gabor responses. Each image to its right shows the generated mask, where each stage is colored differently. The colors denote the paragraphs detected in the interactive segmentation process.
I hope you have enjoyed reading this work. If you have any question or comment feel free to email me at majeek @ cs bgu ac il.
The complete code is found on-line, and can be downloaded freely from by github.
More information regarding my work and me can be found at my webpage, and my linkedin.