Variational Bayesian Matrix Factorization
and Certain Post Classifiers for Classification of Epilepsy from EEG Signals
Harikumar Rajaguru, Sunil Kumar Prabhakar
Department of ECE, Bannari Amman Institute of Technology, India
*Corresponding Author E-mail: sunilprabhakar22@gmail.com, harikumarrajaguru@gmail.com
ABSTRACT:
The main aim of this paper is to employ
Variational Bayesian Matrix Factorization (VBMF) as a dimensionality reduction
technique followed by the Gaussian Mixture Model (GMM), Genetic Algorithm (GA)
and Naïve Bayes Classifier (NBC) as post classifiers for the classification of
epilepsy risk levels from Electroencephalography (EEG) Signals. Since epilepsy
is one of the serious disorders of the brain which is characterized by frequent
and recurrent seizures, the detection and classification of it seems to be very
important. Using the EEG signals, the epileptic seizures can be analyzed
because it aids in the recording, diagnosing and for treating other
neurological disorders. In this paper, the results are analyzed and compared in
terms of sensitivity, specificity, time delay, quality values, performance
index and accuracy.
KEYWORDS: VBMF, GMM, GA, NBC, EEG.
INTRODUCTION:
Epilepsy
tends to have unpredictable interruptions in the functioning of the brain
[1]. Due to the continuos
synchronization of the electrical discharges across a particular group of
neurons, epileptic seizures occur. Therefore seizure is considered as a
transient and abnormal characteristic trait of neurons in the brain [2]. The
epileptic seizures affect the consciousness, memory and behaviour of the
patient to a high extent thereby disturbing the physical and mental activities
of the patients totally. Diagnosing epilepsy is very important but it is very
critical and challenging for the clinicians. For the diagnosis of the epilepsy,
EEG is used widely. Several automated methods are available in literature for
the analysis and diagnosis of epilepsy [3]. Since the recordings of the EEG
seem to be very long, it had lead to the development of automated detection
systems. Moreover, the obtained EEG dataset has a huge dimension and therefore
it is quite difficult to process
[4].
Hence the dimensionality of the EEG dataset should be reduced greatly and
so in this paper, VBMF is employed to reduce the dimension of the data
set. In recent years, a great interest
has been triggered within biomedical scientists where new machine learning
methods are well investigated [4]. In this paper, we have used three different
types of post classifiers such as GMM, GA and NBC for our analysis. The paper
is organized as follows. In section II, the materials and methods are discussed
followed by the usage of VBMF as a dimensionality reduction technique in
section III. In section IV the discussion about the post classifiers is done
followed by the results and conclusion in section V. Then the paper is
concluded along with suitable references.
For the performance analysis
of the epilepsy risk levels using Independent Component Analysis (ICA), Linear
Graph Embedding (LGE) and Fuzzy Mutual Information (FMI) as Dimensionality
Reduction technique followed by GMM as Post Classifiers, the raw EEG data of 20
epileptic patients who were under treatment in the Neurology Department of Sri
Ramakrishna Hospital, Coimbatore, Tamil Nadu, India in European Data Format
(EDF) were taken for detailed examination. A high priority is given to the pre
processing stage of the EEG signals as it is vital to use the best technique
available in literature to extract all the necessary information embedded in
the non-stationary biomedical signals [4]. The EEG records which were obtained
were continuous for about 30 seconds and each of them was divided into epochs
of two second duration in our experiment. Generally a two second epoch is long
enough to avoid any unnecessary redundancy in the signal and it is long enough
to detect the occurrence of any significant changes in activity and to detect
the presence of artifacts in the signal. The total number of channels is 16 for
each and every patient and it is over three different epochs. Considering the
frequency as 50 Hz, the sampling frequency is considered to be about 200 Hz.
The instantaneous amplitude values of the signal correspond to each and every
sample and it totals to around 400 values for each and every epoch. Four
different artifacts are present in the data such as chewing artifacts, eye
blinks, electromyography (EMG) and motion artifacts and approximately the
percentage of data which are artifacts is just 1%. Not a single attempt was
made to select certain number of artifacts which are of more specific in
nature. In order to differentiate the spike categories of waveforms from non
spike categories, artifacts are included. The figure 1 shows the block diagram
of the procedure.
Figure 1 Block Diagram of the
Procedure
Initially
the sampling process is applied to the Raw EEG signals. The dimensions of the
samples are reduced with the help of Variational Bayesian Matrix Factorization.
Then the dimensionally reduced values are given as inputs to the various post
classifiers such as GMM, GA and NBC and their performance is compared.
Dimensionality reduction is a
basic preprocessing step done to change the dimensions of the data from higher
level to a lower level. The dimensionality reduction technique employed here is
Variational Bayesian Matrix Factorization Technique [6]. Consider as a low-rank matrix. The matrix is then decomposed into the product of and
Therefore the equation can be
written as
If the obtained matrix is
denoted as and if it is prone to
the additive noise model, then
where is a noise matrix
For the Bayesian matrix
factorization, the Gaussian priors on the parameters and is used and is represented as follows [6]
The Bayes posterior can be written as follows
where denotes the expectation over
For VBMF trial distribution
[6] the equation is expressed as follows
The post classifiers used in
this paper are Genetic Algorithm, Gaussian Mixture Model and Naïve Bayesian
Classifier.
They are very powerful
optimization techniques and it gets its idea from the principle of evolution
[5]. To find the near optimal solution, Genetic Algorithms are generally
used. In this paper, GA is used as a
post classifier and the simulation parameters are shown in Table 1 as follows:
TABLE 1 SIMULATION PARAMETERS
Parameters |
Description |
Population
Size |
320 |
Maximum
Number of Evaluated Individuals |
4 |
Total
Number of Generations |
512 |
Type
of selection |
Roulette-wheel |
Type
of Crossover |
One-point |
Replacement
Type |
Elitist |
Generation
Gap |
0.9 |
Probability
of Crossover |
0.5 |
Probability
of Mutation |
0.5 |
The Knn algorithm is used
initially to compute the Euclidean Distance between the testing data set and
the training data set [10]. The nearest point is found out using the following
formula
In the data preprocessing
stage of the NBC, attribute grouping has to be done. Each column of the
measurements of this gene expression is divided into a number of groups [9].
Then in a random manner the number of groups can be easily chosen. The grouping
is done in the 3 phases as follows:
Initially, the maximum () and maximum values (in each column of the EEG data set is found out[9]. Secondly, the number of groups in every
column is decided and then is calculated.
Thirdly, the values of
threshold is then calculated as
where (
Now the NBC is developed as
follows [9]
where =target value of output by the Naïve Bayesian Classifier
frequency with which each target value occurs in training data
=instance with attributes, where
are classes for instances from the set of all particles
classes in
is the joint product of probabilities for the individual
attributes.
Consider a dataset, where where is the value of the signal model and is the total number of signals in dataset. This model always
assumes a mixture model in general [7]. It comprises of Gaussian density
components initially and it is mixed with the parameters in the component.
For GMM, the probability
density of is determined as
follows [8]
,
where is the parameters of all the components. is the mixing of the component.
The Gaussian is denoted as
follows
where and are the mean and covariance matrix respectively. Expectation
maximization is used to estimate the parameters in an iterative fashion.
For VBMF as dimensionality
reduction techniques and GMM, GA and NBC as a Post Classifier, based on the
Quality values, Time Delay and Accuracy the results are computed in Table 2
respectively. The formulae for the Performance Index (PI), Sensitivity,
Specificity and Accuracy are given as follows
where PC – Perfect Classification,
MC – Missed Classification, FA – False Alarm,
The Sensitivity,
Specificity and Accuracy measures are stated by the following
The Quality Value QV
is defined as
where C is the scaling
constant,
Rfa is the
number of false alarm per set,
Tdly is the
average delay of the onset classification in seconds
Pdct is the
percentage of perfect classification and
Pmsd is the
percentage of perfect risk level missed
The time delay is given as
follows
Time Delay =
The Specificity and Sensitivity Analysis for the
application of VBMF as dimensionality reduction technique followed by the
application of GMM, GA and NBC as Post Classifiers is shown in Figure 2. The
Time Delay and Quality Value Analysis for the application of VBMF as
dimensionality reduction technique followed by the application of GMM, GA and
NBC as Post Classifiers is shown in Figure 3. Similarly the Performance Index
and Accuracy Analysis for the application of VBMF as dimensionality reduction
techniques followed by the application of GMM, GA and NBC as Post Classifiers
is shown in Figure 4.
Figure
2 Sensitivity and Specificity Measures
It is inferred from figure 2 that the Specificity is
high as of 99.23 when VBMF is applied with GA combination rather than the other
two combinations. The sensitivity is high for VBMF –GMM combination.
Figure 3
Time Delay and Quality Value Measures
It is
inferred from figure 3 that the time delay is high in VBMF-NBC combination when
compared to the other two combinations. Also if the quality values are
considered, then the VBMF-GA is producing the highest quality value as of
17.90.
Figure 4
Performance Index and Accuracy Measures
It is inferred from figure 4 that the highest
Performance Index value is obtained for VBMF-GMM combination as of 94.65%. Also
the highest accuracy is obtained in 96.97% if VBMF-GMM is employed.
TABLE 2 PERFORMANCE
COMPARISON ANALYSIS
Parameters |
Vbmf with
NBC |
VBMF With GMM |
VBMF with
GA |
PC (%) |
80.48 |
95 |
85.13 |
MC (%) |
13.19 |
4.09 |
0.76 |
FA (%) |
6.31 |
0.90 |
14.09 |
PI (%) |
75.26 |
94.65 |
81.34 |
Sensitivity (%) |
93.68 |
98.05 |
85.90 |
Specificity (%) |
86.80 |
95.90 |
99.23 |
Time Delay (sec) |
2.40 |
2.14 |
1.74 |
Quality Values |
17.35 |
22.50 |
17.90 |
Accuracy |
90.24 |
96.97 |
92.56 |
It is thus inferred that the perfect classification is
about 95% when VBMF is performed with GMM and is the highest when comparing the
other two techniques. If the quality values are analyzed , the highest is for
VBMF-GMM combination as of 22.50 followed by the VBMF-GA combination as of
17.90. It is thus concluded that the accuracy measures are higher for the VBMF
– GMM combination as of 96.67% and so it is definitely a better method followed
by the VBMF-GA combination as of 92.56% and VBMF-NBC combination as of 90.24%.
Future work may incorporate the modification of the variational bayesian matrix factorization to provide better accuracy.
REFERENCES:
1.
Chi-Chou Kao, ‘E-Health Design of EEG Signal
Classification for Epilepsy Diagnosis’ International Symposium on Biometrics
and Security Technologies, 2013, pg.67-71.
2.
Min Han and Leilei Sun, ‘EEG Signal Classification
for Epilepsy Diagnosis based on AR Model and RVM’, International Conference on
Intelligent Control and Information Processing, August 13-15, 2010, Dalian,
China.
3.
Mormann, F, Kreuz, T, Rieke, C, Andrzejak, RG,
Kraskov, A, David P, Christian Elger, E and Lehnertza, K 2005, ‘On the predictability of epileptic seizures’,
Clin Neurophysiol, vol.116, pp.569-581.
4.
Harikumar R, Sunil Kumar P, “Dimensionality
Reduction Techniques for Processing Epileptic Encephalographic Signals”,
Biomedical and Pharmacology Journal, Vol.8, No.1, 2015, pg no:103-106
5.
Durga Prasad Muni, Nikil R Pal and
Jyotirmoy Dos, 2004, ‘A Novel Approach to Design Classifiers Using Genetic
Programming,’ IEEE Transaction of Evolutionary Computation, vol.8, no.2,
pp. 183-196.
6.
Zhanyu Ma et al, ‘Variational
Bayesian Matrix Factorization for Bounded Support Data’, IEEE Transactions on
Pattern Analysis and Machine Intelligence, Vol.37, no.4, April 2015.
7.
Weiling Cai, Lei Lei, Ming Yang, “ A Gaussian
Mixture Model-based Clustering Algorithm for Image Segmentation using
Dependable Spatial Constraints”, Third International Congress on Image and
Signal Processing (CISP), 2010, pg:1268-1272.
8.
Permuter H, Francos J, Jermyn.I.H, ‘Gaussian Mixture
Models of Texture and Colour for Image Database retrieval’ ICASSP, 2013, pg.
569-572.
9.
Cuiping Leng, Shuangcheng Wang, Hui Wang, ‘Learning
Naïve Bayes Classifiers with Incomplete Data’, International Conference on
Artificial Intelligence and Computational Intelligence, 2007, pg:350-353
10.
Guo L, Rivero D, Dorado J,
Munteanu, A and Pazos A, 2011, ‘Automatic feature extraction using genetic
programming: an application to epileptic EEG classification’. Expert Syst.
Appl., vol.38, no.8, pp10425-10436.
Received on 26.03.2016
Modified on 06.04.2016
Accepted on 13.05.2016 ©
RJPT All right reserved
Research J. Pharm. and Tech. 2016; 9(6):750-754
DOI: 10.5958/0974-360X.2016.00142.6