Analysis of Body Mass Index Based on Correlation and Regression

 

Dr. S. Kavitha1*, T. Sabhanayagham2, R. Thenmozhi3

1Assistant Professor, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India

2Assistant Professor- SG, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India

3Assistant Professor, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India

*Corresponding Author E-mail: kavitha.s@ktr.srmuniv.ac.in

 

ABSTRACT:

The data mining is to extract the useful knowledge from a big data and making the decision based on the analysis of height and its correlation with weight allow diagnosis, anticipation of so many problems and preserve their life. The decision makers are ready to make their decisions with the help of the Computer Based Decision Making Support System in an accurate manner.  This study is based on the sampled data set. For the preparation of the sampled big data set, we have adopted the correlation and regression analysis techniques. The correlation and regression are for investigating the association or relationship between quantitative variables.

 

KEYWORDS: Body Mass Index (BMI), Big Data Mining, Analysis of Correlation and Regression.

 

 

 


1. INTRODUCTION:

Now-a-days children are growing in up normal that leads to undergo a number of physical changes and also attacks many diseases. The Body Mass Index (BMI) is to the estimation of the fat amount in the human body. Based on this, we can lose the weight or not. It will help to maintain the healthy weight for everyone in our life.  This paper describes the maintaining of the body’s healthy weight according to the height. The data mining is an important concept for extracting knowledge from Big Data and to predict the accurate and reliable results from Big Data with the help of correlation and regression analysis. The big data is rapidly growing in engineering and science domains. The analysis of correlation and regression are used for prediction, validity, reliability and verification.

 

This paper organized in the following ways. The section 2 defines the big data mining, correlation analysis and regression analysis.

 

The Section 3 is important concepts of the methodology which is used in this paper. The section 4 analyzes the concepts of correlation of big data sets. The Section 5 presents about the analysis of regression of big data sets.  The section 6 discusses about the results of the big data sets. This paper concludes with the section 7.

 

2. REVIEW OF LITERATURE:

2.1    Big Data Mining:

Data mining is to find the interesting patterns from datasets. But big data is to store large scale data and process of large datasets. The Big Data is huge volume; huge velocity and / or huge variety of information and it require processing of new forms to enable enhanced, discovery, optimization and decision making[1].

 

The mining tools used for discovering valuable knowledge from big data storage. It will give ideas about the prevention of defectives, improvement of qualities etc.,[2].

 

Big data are large volume, complex, autonomous sources and growing datasets. The mining of interesting knowledge from the huge data sets. It provides the most relevant and accurate feedback of social sensing for better understanding of society in real time[3].

 

 

2.2 Correlation Analysis:

The correlation analysis is a statistical procedure and to evaluate the association between the two quantitative variables. The calculation of the correlation coefficient (r) is defined as follows.

 

     ….. (1)

 

 

If the variable is to quantify the relationship strength, we have to measure the correlation coefficient (r).  The complete numerical value ranges from +1 to -1. The sign and magnitude represents the relationship of direction and strength respectively.

 

2.2.1 Strength of the Relationships:

The raw dataset send for pre-processing of the data. Then the r value is calculated. Based on the r value, the following relationship is assigned for the dataset [4].

1.       If r > 0, the linear relationship is positive. i.e., when one variable increases and other variable increases the correlation is positive.

2.       If r < 0, the linear relationship is negative. i.e., when one variable decreases and other variable increases, the correlation is negative relationship.

3.       If r = 0, No linear relationship.  i.e., it is complete absence of correlation.

 

 

 

Figure 1. Data Flow Diagram for Correlation Analysis

 

2.2.2 General Guidelines of Strength:

The correlation coefficient’s magnitude determines strength of the correlation in the Table 1.

 

Table 1. Guidelines for the Strength of the Correlation

S.NO

Strength

Correlation

1.

3

Small / Weak

2.

Medium /  Moderate

3.

Large / Strong

 

2.3 Regression Analysis:

Linear regression is a type of regression analysis which is to examine the relationship between two quantitative variables i.e., one independent or explanatory variable and one dependent variable[5].

 

The equation of linear regression is written as 

 

Y=a+bX

 

Where  

X – Variable of Independent or explanatory

Y- Variable of Dependent 

b – Line Slope

a  -  Interception 

 

3. METHODOLOGY:

The dataset is taken from the SOCR Data. It consists of 25000 records of Human Heights (inches) and Weights (Pounds) in the Table 2. From the dataset, we estimate the Body Mass Index and find relationship among the height and width with BMI. Here, heights (measured in inches) and weights (measured in pounds) are the independent variables and BMI is the dependent variable.

 

Table 2.  Dataset for Measurement of Health

S. No.

Height

(Inches)

Weight (Pounds)

BMI

Measure of health

1.      

71.51521

136.4879

18.4873

Healthy Weight

2.      

69.39874

153.0269

22.33675

Healthy Weight

3.      

68.2166

142.3354

21.50246

Healthy Weight

4.      

67.78781

144.2971

22.07546

Healthy Weight

5.      

69.80204

141.4947

20.41546

Healthy Weight

6.      

70.01472

136.4623

19.56993

Healthy Weight

7.      

66.78236

120.6672

19.02046

Healthy Weight

8.      

66.48769

127.4516

20.26834

Healthy Weight

9.      

68.30248

125.6107

18.92819

Healthy Weight

10.   

67.11656

122.4618

19.11158

Healthy Weight

11.   

71.0916

136.9975

19.47328

Healthy Weight

12.   

66.461

129.5023

20.611

Healthy Weight

13.   

68.64927

142.9733

21.32742

Healthy Weight

14.   

71.23033

137.9025

19.10722

Healthy Weight

15.   

67.13118

124.0449

19.35021

Healthy Weight

16.   

67.83379

141.2807

21.5847

Healthy Weight

17.   

68.87881

143.5392

21.26937

Healthy Weight

18.   

68.42187

129.5027

19.44663

Healthy Weight

19.   

67.62804

141.8501

21.80376

Healthy Weight

20.   

67.20864

129.7244

20.18956

Healthy Weight

21.   

70.84235

142.4235

19.95037

Healthy Weight

22.   

67.49434

131.5502

20.30075

Healthy Weight

23.   

65.44098

113.8922

18.69604

Healthy Weight

24.   

65.8132

120.7536

19.5988

Healthy Weight

25.   

61.8163

125.7886

19.22775

Healthy Weight

26.   

70.59505

136.2225

19.21568

Healthy Weight

 

3.1 Motivation of the Study:

Many of the researchers concentrated in different ways for the healthy life of the human beings. But the foremost important factor is the body mass index, diseases which attack the human bodies in terms of body mass. So this study will help for the analysis of the healthy life based on BMI.

 

 

3.2 Measurement of BMI:

There are three numeric variables like Height, Weight and BMI which will be analyzed with the use of SPSS package. The BMI (kg/m2) is calculated from the below formula [6]

 

       (1)

 

The followings are the categories for the BMI in the Table 3.

 

Table 3. BMI Categories

SNO

Categories

BMI (kg/m2)

1.

Under weight

<  18.50

2.

Normal or Healthy weight

18.50 to 24.90

3.

Over weight

25.0 to 29.90

4.

Weight is Obese 

>  30.0

 

BMI test is essential to figure out the physique fat. It gives a general idea of our disease risk. It is relatively simple to measure rates and to investigate the obesity in a population. When we have a better understanding of the correlation of height, weight and BMI, It will give an idea to control and prevent the disease through proper dietary pattern [7]. If the BMI value is high, it indicates the high body fatness and also the relative risk of the disease is increased.

 

4. CORRELATION ANALYSIS OF DATASET:

These are used in the Pearson Correlation for getting the correlation coefficients using SPSS package. From the below tables, N indicates the number of data in the big datasets.  The correlation between the Height and Weight is 0.503. The significant value of the correlation is 0.000. Every variable is to correlate with 1. The half diagonal is identical. The following is a Pearson correlation matrix. The diagonal value is 1. The correlation between the Height and Weight is equivalent to the correlation between the Weight and Height. The correlation between the Weight and BMI is equivalent to the correlation between the BMI and Weight. The above correlation is Positive correlation. The correlation between the Height and BMI is equivalent to the correlation between the BMI and Height. But this correlation is negative correlation. The r is 0.795 so a strong relationship is between height and weight. Result shows correlations for all the pairs of variables and also each of the correlation is produced twice in the matrix in the Table4.


 

Table 4 Correlations Analysis for Two Independent Variables and On Dependent Variable correlations

 

Height (Inches)

Weight (Pounds)

BMI

Height (Inches)

Pearson Correlation

1

0.503”

-0.122”

 

Sig. (2-tailed)

 

0.000

0.000

 

N

25000

25000

25000

Weight (Pounds)

Pearson Correlation

0.503”

1

0.795”

 

Sig. (2-tailed)

0.000

 

0.000

 

N

25000

25000

25000

BMI

Pearson Correlation

-0.122”

0.795”

1

 

Sig. (2-tailed)

0.000

0.000

 

 

N

25000

25000

25000

**. Correlation is significant at the 0.01 level (2-tailed)

 

 


5. REGREESION ANALYSIS  OF DATASET

The following results are taken from the SPSS package using dataset [9].

 

5.1 Analysis Weight, Height and BMI:

The model summary is shown the value of the  is 0.998, the coefficient of determination (  is 0.997 i.e, 99.70% and standard error of the estimate is 0.87 in Table 5. The unstandardized and standardized coefficient values are displayed in the Table 6.

 

Table 5. Model Summary (Two Independent and One Dependent)

Model summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

0.998a

0.997

0.997

0.087

a. Predictors: (Constant), Height (Inches), Weight (Pounds)

 


 

Table 6. Coefficients based on Two Independent and One Dependent) Coefficientsa

Model

Unstandardized Coefficients

Standardized coefficients

t

Sig.

B

Std. Error

Beta

1.

(Constant)

38.660

0.020

 

1908.260

0.000

 

Weight (Pounds)

0.153

0.000

1.146

2785.277

0.000

 

Height (Inches)

-0.570

0.000

-0.698

-1695.816

0.000

 


5.2 Analysis of Weight and BMI Variables:

The model summary is presented the value of the  is 0.795, the coefficient of determination (  is 0.633 i.e, 63.30 and the standard Error of the Estimate is 0.940 in Table 7. The unstandardized and standardized coefficient values are displayed in the Table 8.


Table 7. One Independent Variable and One Dependent Variable in the  Model Summary

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1.

0.795a

0.633

0.633

0.940

a. Predictors: (Constant), Weight (Pounds)

 

Table 8. Coefficients of one Independent and One Dependent Variables Coefficientsa

Model

Unstandardized Coefficients

Standardized coefficients

t

Sig.

B

Std. Error

Beta

1.

(Constant)

5.867

0.065

0.795

90.148

0.000

Weight (Pounds)

0.106

0.001

207.549

0.000

 


5.3 Analysis of Height and BMI Variables

The model summary is presented the value of the  is 0.122, the coefficient of determination (  is 0.015 i.e, 1.50 and the standard Error of the Estimate is 1.540 in the Table 9. The unstandardized and standardized coefficient values are displayed in the Table 10.

 

 

 

Table 9. One Independent Variable with One Dependent Variable in Model Summary

Model summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1.

-0.122a

0.015

1.540

0.940

a. Predictors: (Constant), Weight (Pounds)


Table 10. Coefficients of one Independent and One Dependent Variables coefficientsa

Model

Unstandardized Coefficients

Standardized coefficients

t

Sig.

B

Std. Error

Beta

1.

(Constant)

26.062

0.348

-0.122

74.794

0.000

 

Height (Inches)

-0.099

0.005

-19.356

0.000

a. Dependent Variable: BMI

 


5.4 DISCUSSION:

5.4.1 Analysis of Two Independent Variables with One Independent Variable (Weight):

The following formula is used to find the estimated BMI based on the two Independent variables (Height and Weight) and one Independent Variable (Weight) from Tables 5, 6, 7 and 8.

 

Estimated BMI = (0.153) Weight + (-0.570) Height +38.6660          (2)

Estimated BMI = (0.106) Weight + 5.867                                          (3)

 

The R square value of the two Independent variables like Weight and Height is 0.997. The R square value of the one Independent variable like Weight is 0.633. The R square value of the two Independent variables is higher than the R square value of the one Independent variable from the coefficients.  The Estimation of the Standard Error of the two Independent variables (0.087) is decreased from the Estimation of the Standard Error of the one Independent variable (0.940). So the equitation is proved.

 

5.4.2 Analysis of Two Independent Variables with One Independent Variable (Height):

The following formula is used to find the estimated BMI based on the two Independent variables (Height and Weight) and one Independent Variable (Height) from Table 5, Table 6, Table 9 and Table 10.

 

Estimated BMI = (0.153) Weight + (-0.570) Height +38.6660 (4)

Estimated BMI = (-0.099) Height + 26.062                              (5)

 

The R square value of the two Independent variables like Weight and Height is 0.997. The R square value of the one Independent variable like Height is 0.015. The R square value of the two Independent variables is higher than the R square value of the one Independent variable from the coefficients.  The Estimation of the Standard Error of the two Independent variables (0.087) is decreased from the Estimation of the Standard Error of the one Independent variable (1.540). So the equitation is proved.

 

5.4.3 Analysis of one Independent Variable (Weight) with One Independent Variable (Height):

The following formula is used to find the estimated BMI based on the one Independent variable (Weight) and one Independent Variable (Height). ( Tables from 7, 8, 9 and 10)

 

Estimated BMI = (0.106) Weight + 5.867                                         (6)

Estimated BMI = (-0.099) Height + 26.062                                       (7)

 

The R square value of the one Independent variable like Weight is 0.633. The R square value of the one Independent variable like Height is 0.015. The R square value of one Independent variable like Height (1.540) is higher than the R square value of the one Independent variable like Weight (0.940) from the coefficients.  So, the Weight is the better predictor of BMI than the Height.

 

6. RESULT AND DISCUSSION:

The Bivariate dataset is displayed in the graphical form by a scatter plot or scatter diagram.   The independent variable i.e., height is on the X-axis or horizontal axis and the dependent variable i.e., BMI is on the Y axis / vertical axis.  From Figure 2, height increases, weight also increases. So, there is a linear relationship exit between the height and BMI variables in the big dataset. Figure 3 and 4 are positive correlation because both the variables like BMI and Height; and BMI and Unstandardized Predicated Value move in the same direction The Figure 5 is the negative correlation because both the variables move in opposite directions. The   value is 0.015[8].

 

 

Figure 2. Scatter Diagram for BMI and Height

 

 

Figure 3. Scatter Diagram for BMI and Weight

 

 

Figure 4. Scatter Diagram for BMI and Unstandardized Predicated Value

7. CONCLUSION:

The maintaining of the healthy weight is so important for the overall health of the body and also helps to prevent and control many of the diseases. If a person is overweight or obese, the same person has the higher risk of serious health problems like high blood pressure, gallstones, breathing problems, certain cancers, heart disease and type 2 diabetes. Many people from overweight never develop the diabetes. Statistically, obesity has been proven and to increase the risk of type-2 diabetes  and also sleeping disordered breathing [10]. So this paper concludes to maintain the weight based on the height for preventing the higher risk of serious problems.

 

8. REFERENCES:

1.        Natalija Koseleva and Guoda Ropaite, “Big data in building energy efficiency: understanding of big data and main challenges”, Modern Building Materials, Structures and Techniques, MBMST 2016, Procedia Engineering 172 (2017) 544-549, 1877-7058, 2017 Elsevier Ltd, DOI: 10.1016/j.proeng.2017.02.064.

2.         Ying Cheng, Ken Chen, Hemeng Sun, Youngping Zhang and Fei Tao, “Data and knowledge mining with big dta towards smart production”, Journal of Industrial Information Integration,1-13, 2017 Elsevier Ltd, DOI: 10.1016/j.jii.2017.08.001.

3.        Xingdong Wu, Gong-Qing Wu and Wei Ding, “Data Mining with Big Data”, IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No.1, 97- 107, Jan 2014.

4.        Research Methods Knowledge Base  https://www.socialresearchmethods.net/kb/statcorr.php

5.        Regression Analysis, http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_multivariable/BS704_Multivariable6.html1

6.        What Health, Healthy Living for Ever Body, http://www.whathealth.com/bmi/formula.html.

7.        Body Mass Index: Considerations for Practitioners, Department of Health and Human Services Centers for Disease Control and Prevention. https://www.cdc.gov/obesity/downloads/bmiforpactitioners.pdf

8.        Scatterplot and Correlation: Definition, Example and Analysis http://study.com/academy/lesson/scatter-plot-and-correlation-definition-example-analysis.html

9.        SOCR Data Dinov 020108 HeightsWeights, http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights

10.     Diabetes.co.uk, The global diabetes community, https://www.diabetes.co.uk/bmi.html.

 

 

 

 

 

 

Received on 01.11.2017             Modified on 02.12.2017

Accepted on 07.03.2018           © RJPT All right reserved

Research J. Pharm. and Tech 2018; 11(6): 2243-2247.

DOI: 10.5958/0974-360X.2018.00415.8