Variational Bayesian Matrix Factorization and Certain Post Classifiers for Classification of Epilepsy from EEG Signals

 

Harikumar Rajaguru, Sunil Kumar Prabhakar

Department of ECE, Bannari Amman Institute of Technology, India

*Corresponding Author E-mail: sunilprabhakar22@gmail.com, harikumarrajaguru@gmail.com

 

ABSTRACT:

The main aim of this paper is to employ Variational Bayesian Matrix Factorization (VBMF) as a dimensionality reduction technique followed by the Gaussian Mixture Model (GMM), Genetic Algorithm (GA) and Naïve Bayes Classifier (NBC) as post classifiers for the classification of epilepsy risk levels from Electroencephalography (EEG) Signals. Since epilepsy is one of the serious disorders of the brain which is characterized by frequent and recurrent seizures, the detection and classification of it seems to be very important. Using the EEG signals, the epileptic seizures can be analyzed because it aids in the recording, diagnosing and for treating other neurological disorders. In this paper, the results are analyzed and compared in terms of sensitivity, specificity, time delay, quality values, performance index and accuracy.

 

KEYWORDS: VBMF, GMM, GA, NBC, EEG.

 

 


INTRODUCTION:

Epilepsy tends to have unpredictable interruptions in the functioning of the brain [1].  Due to the continuos synchronization of the electrical discharges across a particular group of neurons, epileptic seizures occur. Therefore seizure is considered as a transient and abnormal characteristic trait of neurons in the brain [2]. The epileptic seizures affect the consciousness, memory and behaviour of the patient to a high extent thereby disturbing the physical and mental activities of the patients totally. Diagnosing epilepsy is very important but it is very critical and challenging for the clinicians. For the diagnosis of the epilepsy, EEG is used widely. Several automated methods are available in literature for the analysis and diagnosis of epilepsy [3]. Since the recordings of the EEG seem to be very long, it had lead to the development of automated detection systems. Moreover, the obtained EEG dataset has a huge dimension and therefore it is quite difficult to         process [4].

 

Hence the dimensionality of the EEG dataset should be reduced greatly and so in this paper, VBMF is employed to reduce the dimension of the data set.  In recent years, a great interest has been triggered within biomedical scientists where new machine learning methods are well investigated [4]. In this paper, we have used three different types of post classifiers such as GMM, GA and NBC for our analysis. The paper is organized as follows. In section II, the materials and methods are discussed followed by the usage of VBMF as a dimensionality reduction technique in section III. In section IV the discussion about the post classifiers is done followed by the results and conclusion in section V. Then the paper is concluded along with suitable references.

 

MATERIALS AND METHODS:

For the performance analysis of the epilepsy risk levels using Independent Component Analysis (ICA), Linear Graph Embedding (LGE) and Fuzzy Mutual Information (FMI) as Dimensionality Reduction technique followed by GMM as Post Classifiers, the raw EEG data of 20 epileptic patients who were under treatment in the Neurology Department of Sri Ramakrishna Hospital, Coimbatore, Tamil Nadu, India in European Data Format (EDF) were taken for detailed examination. A high priority is given to the pre processing stage of the EEG signals as it is vital to use the best technique available in literature to extract all the necessary information embedded in the non-stationary biomedical signals [4]. The EEG records which were obtained were continuous for about 30 seconds and each of them was divided into epochs of two second duration in our experiment. Generally a two second epoch is long enough to avoid any unnecessary redundancy in the signal and it is long enough to detect the occurrence of any significant changes in activity and to detect the presence of artifacts in the signal. The total number of channels is 16 for each and every patient and it is over three different epochs. Considering the frequency as 50 Hz, the sampling frequency is considered to be about 200 Hz. The instantaneous amplitude values of the signal correspond to each and every sample and it totals to around 400 values for each and every epoch. Four different artifacts are present in the data such as chewing artifacts, eye blinks, electromyography (EMG) and motion artifacts and approximately the percentage of data which are artifacts is just 1%. Not a single attempt was made to select certain number of artifacts which are of more specific in nature. In order to differentiate the spike categories of waveforms from non spike categories, artifacts are included. The figure 1 shows the block diagram of the procedure.

 

 

 

Figure 1 Block Diagram of the Procedure

 

Initially the sampling process is applied to the Raw EEG signals. The dimensions of the samples are reduced with the help of Variational Bayesian Matrix Factorization. Then the dimensionally reduced values are given as inputs to the various post classifiers such as GMM, GA and NBC and their performance is compared.

 

Vbmf As A Dimensionality Reduction Technique:

Dimensionality reduction is a basic preprocessing step done to change the dimensions of the data from higher level to a lower level. The dimensionality reduction technique employed here is Variational Bayesian Matrix Factorization Technique [6]. Consider as a low-rank matrix. The matrix is then decomposed into the product of and

Therefore the equation can be written as

If the obtained matrix is denoted as  and if it is prone to the additive noise model, then

where  is a noise matrix

For the Bayesian matrix factorization, the Gaussian priors on the parameters and is used and is represented as follows [6]

The Bayes posterior can be written as follows

where denotes the expectation over

For VBMF trial distribution [6] the equation is expressed as follows

 

Post Classifiers Used Here:

The post classifiers used in this paper are Genetic Algorithm, Gaussian Mixture Model and Naïve Bayesian Classifier.

 

Genetic Algorithm:

They are very powerful optimization techniques and it gets its idea from the principle of evolution [5]. To find the near optimal solution, Genetic Algorithms are generally used.  In this paper, GA is used as a post classifier and the simulation parameters are shown in Table 1 as follows:

TABLE 1 SIMULATION PARAMETERS

Parameters

Description

Population Size

320

Maximum Number of Evaluated Individuals

4

Total Number of Generations

512

Type of selection

Roulette-wheel

Type of Crossover

One-point

Replacement Type

Elitist

Generation Gap

0.9

Probability of Crossover

0.5

Probability of Mutation

0.5

 

The Knn algorithm is used initially to compute the Euclidean Distance between the testing data set and the training data set [10]. The nearest point is found out using the following formula

 

Naïve Bayes Classifier:

In the data preprocessing stage of the NBC, attribute grouping has to be done. Each column of the measurements of this gene expression is divided into a number of groups [9]. Then in a random manner the number of groups can be easily chosen. The grouping is done in the 3 phases as follows:

Initially, the maximum () and maximum values (in each column of the EEG data set is found out[9].  Secondly, the number of groups in every column is decided and then is calculated.

Thirdly, the values of threshold is then calculated as

where (

Now the NBC is developed as follows [9]

where =target value of output by the Naïve Bayesian Classifier

frequency with which each target value occurs in training data

=instance with attributes, where

are classes for instances from the set of all particles classes in

is the joint product of probabilities for the individual attributes.

 

Gaussian Mixture Model:

Consider a dataset, where  where  is the value of the signal model and is the total number of signals in dataset. This model always assumes a mixture model in general [7]. It comprises of  Gaussian density components initially and it is mixed with the parameters in the component.

For GMM, the probability density of  is determined as follows [8]

,

where is the parameters of all the components. is the mixing of the  component.

The  Gaussian is denoted as follows

where and are the mean and covariance matrix respectively. Expectation maximization is used to estimate the parameters in an iterative fashion.

 

RESULTS AND CONCLUSION:

For VBMF as dimensionality reduction techniques and GMM, GA and NBC as a Post Classifier, based on the Quality values, Time Delay and Accuracy the results are computed in Table 2 respectively. The formulae for the Performance Index (PI), Sensitivity, Specificity and Accuracy are given as follows

where PC – Perfect Classification, MC – Missed Classification, FA – False Alarm,  

The Sensitivity, Specificity and Accuracy measures are stated by the following

 

The Quality Value QV is defined as

where C is the scaling constant,

Rfa is the number of false alarm per set,

Tdly is the average delay of the onset classification in seconds

Pdct is the percentage of perfect classification and

Pmsd is the percentage of perfect risk level missed

The time delay is given as follows

Time Delay =

 

The Specificity and Sensitivity Analysis for the application of VBMF as dimensionality reduction technique followed by the application of GMM, GA and NBC as Post Classifiers is shown in Figure 2. The Time Delay and Quality Value Analysis for the application of VBMF as dimensionality reduction technique followed by the application of GMM, GA and NBC as Post Classifiers is shown in Figure 3. Similarly the Performance Index and Accuracy Analysis for the application of VBMF as dimensionality reduction techniques followed by the application of GMM, GA and NBC as Post Classifiers is shown in Figure 4.

 

 

Figure 2 Sensitivity and Specificity Measures

 

It is inferred from figure 2 that the Specificity is high as of 99.23 when VBMF is applied with GA combination rather than the other two combinations. The sensitivity is high for VBMF –GMM combination.

 

 

Figure 3 Time Delay and Quality Value Measures

It is inferred from figure 3 that the time delay is high in VBMF-NBC combination when compared to the other two combinations. Also if the quality values are considered, then the VBMF-GA is producing the highest quality value as of 17.90.

 

 

Figure 4 Performance Index and Accuracy Measures

 

It is inferred from figure 4 that the highest Performance Index value is obtained for VBMF-GMM combination as of 94.65%. Also the highest accuracy is obtained in 96.97% if VBMF-GMM is employed.

 

TABLE 2 PERFORMANCE COMPARISON ANALYSIS

Parameters

Vbmf with NBC

VBMF

With GMM

VBMF with GA

PC (%)

80.48

95

85.13

MC (%)

13.19

4.09

0.76

FA (%)

6.31

0.90

14.09

PI (%)

75.26

94.65

81.34

Sensitivity (%)

93.68

98.05

85.90

Specificity (%)

86.80

95.90

99.23

Time Delay (sec)

2.40

2.14

1.74

Quality Values

17.35

22.50

17.90

Accuracy

90.24

96.97

92.56

 

It is thus inferred that the perfect classification is about 95% when VBMF is performed with GMM and is the highest when comparing the other two techniques. If the quality values are analyzed , the highest is for VBMF-GMM combination as of 22.50 followed by the VBMF-GA combination as of 17.90. It is thus concluded that the accuracy measures are higher for the VBMF – GMM combination as of 96.67% and so it is definitely a better method followed by the VBMF-GA combination as of 92.56% and VBMF-NBC combination as of 90.24%. Future work may incorporate the modification of the variational bayesian  matrix factorization  to provide better accuracy.

 

 

REFERENCES:

1.       Chi-Chou Kao, ‘E-Health Design of EEG Signal Classification for Epilepsy Diagnosis’ International Symposium on Biometrics and Security Technologies, 2013, pg.67-71.

2.       Min Han and Leilei Sun, ‘EEG Signal Classification for Epilepsy Diagnosis based on AR Model and RVM’, International Conference on Intelligent Control and Information Processing, August 13-15, 2010, Dalian, China.

3.       Mormann, F, Kreuz, T, Rieke, C, Andrzejak, RG, Kraskov, A, David P, Christian Elger, E and Lehnertza, K 2005, ‘On the predictability of epileptic seizures’, Clin Neurophysiol, vol.116, pp.569-581.

4.       Harikumar R, Sunil Kumar P, “Dimensionality Reduction Techniques for Processing Epileptic Encephalographic Signals”, Biomedical and Pharmacology Journal, Vol.8, No.1, 2015, pg no:103-106

5.       Durga Prasad Muni, Nikil R Pal and Jyotirmoy Dos, 2004, ‘A Novel Approach to Design Classifiers Using Genetic Programming,’ IEEE Transaction of Evolutionary Computation, vol.8, no.2, pp.  183-196.

6.       Zhanyu Ma et al, ‘Variational Bayesian Matrix Factorization for Bounded Support Data’, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.37, no.4, April 2015.

7.       Weiling Cai, Lei Lei, Ming Yang, “ A Gaussian Mixture Model-based Clustering Algorithm for Image Segmentation using Dependable Spatial Constraints”, Third International Congress on Image and Signal Processing (CISP), 2010, pg:1268-1272.

8.       Permuter H, Francos J, Jermyn.I.H, ‘Gaussian Mixture Models of Texture and Colour for Image Database retrieval’ ICASSP, 2013, pg. 569-572.

9.       Cuiping Leng, Shuangcheng Wang, Hui Wang, ‘Learning Naïve Bayes Classifiers with Incomplete Data’, International Conference on Artificial Intelligence and Computational Intelligence, 2007, pg:350-353

10.     Guo L, Rivero D, Dorado J, Munteanu, A and Pazos A, 2011, ‘Automatic feature extraction using genetic programming: an application to epileptic EEG classification’. Expert Syst. Appl., vol.38, no.8, pp10425-10436.

 

 

 

 

 

Received on 26.03.2016          Modified on 06.04.2016

Accepted on 13.05.2016        © RJPT All right reserved

Research J. Pharm. and Tech. 2016; 9(6):750-754

DOI: 10.5958/0974-360X.2016.00142.6