Analysis of Body Mass Index Based on Correlation and Regression
Dr. S. Kavitha1*, T. Sabhanayagham2, R. Thenmozhi3
1Assistant Professor, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
2Assistant Professor- SG, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
3Assistant Professor, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
*Corresponding Author E-mail: kavitha.s@ktr.srmuniv.ac.in
ABSTRACT:
The data mining is to extract the useful knowledge from a big data and making the decision based on the analysis of height and its correlation with weight allow diagnosis, anticipation of so many problems and preserve their life. The decision makers are ready to make their decisions with the help of the Computer Based Decision Making Support System in an accurate manner. This study is based on the sampled data set. For the preparation of the sampled big data set, we have adopted the correlation and regression analysis techniques. The correlation and regression are for investigating the association or relationship between quantitative variables.
KEYWORDS: Body Mass Index (BMI), Big Data Mining, Analysis of Correlation and Regression.
1. INTRODUCTION:
Now-a-days children are growing in up normal that leads to undergo a number of physical changes and also attacks many diseases. The Body Mass Index (BMI) is to the estimation of the fat amount in the human body. Based on this, we can lose the weight or not. It will help to maintain the healthy weight for everyone in our life. This paper describes the maintaining of the body’s healthy weight according to the height. The data mining is an important concept for extracting knowledge from Big Data and to predict the accurate and reliable results from Big Data with the help of correlation and regression analysis. The big data is rapidly growing in engineering and science domains. The analysis of correlation and regression are used for prediction, validity, reliability and verification.
This paper organized in the following ways. The section 2 defines the big data mining, correlation analysis and regression analysis.
The Section 3 is important concepts of the methodology which is used in this paper. The section 4 analyzes the concepts of correlation of big data sets. The Section 5 presents about the analysis of regression of big data sets. The section 6 discusses about the results of the big data sets. This paper concludes with the section 7.
2. REVIEW OF LITERATURE:
2.1 Big Data Mining:
Data mining is to find the interesting patterns from datasets. But big data is to store large scale data and process of large datasets. The Big Data is huge volume; huge velocity and / or huge variety of information and it require processing of new forms to enable enhanced, discovery, optimization and decision making[1].
The mining tools used for discovering valuable knowledge from big data storage. It will give ideas about the prevention of defectives, improvement of qualities etc.,[2].
Big data are large volume, complex, autonomous sources and growing datasets. The mining of interesting knowledge from the huge data sets. It provides the most relevant and accurate feedback of social sensing for better understanding of society in real time[3].
2.2 Correlation Analysis:
The correlation analysis is a statistical procedure and to evaluate the association between the two quantitative variables. The calculation of the correlation coefficient (r) is defined as follows.
….. (1)
If the variable is to quantify the relationship strength, we have to measure the correlation coefficient (r). The complete numerical value ranges from +1 to -1. The sign and magnitude represents the relationship of direction and strength respectively.
2.2.1 Strength of the Relationships:
The raw dataset send for pre-processing of the data. Then the r value is calculated. Based on the r value, the following relationship is assigned for the dataset [4].
1. If r > 0, the linear relationship is positive. i.e., when one variable increases and other variable increases the correlation is positive.
2. If r < 0, the linear relationship is negative. i.e., when one variable decreases and other variable increases, the correlation is negative relationship.
3. If r = 0, No linear relationship. i.e., it is complete absence of correlation.
Figure 1. Data Flow Diagram for Correlation Analysis
2.2.2 General Guidelines of Strength:
The correlation coefficient’s magnitude determines strength of the correlation in the Table 1.
Table 1. Guidelines for the Strength of the Correlation
|
S.NO |
Strength |
Correlation |
|
1. |
|
Small / Weak |
|
2. |
|
Medium / Moderate |
|
3. |
|
Large / Strong |
2.3 Regression Analysis:
Linear regression is a type of regression analysis which is to examine the relationship between two quantitative variables i.e., one independent or explanatory variable and one dependent variable[5].
The equation of linear regression is written as
Y=a+bX
Where
X – Variable of Independent or explanatory
Y- Variable of Dependent
b – Line Slope
a - Interception
3. METHODOLOGY:
The dataset is taken from the SOCR Data. It consists of 25000 records of Human Heights (inches) and Weights (Pounds) in the Table 2. From the dataset, we estimate the Body Mass Index and find relationship among the height and width with BMI. Here, heights (measured in inches) and weights (measured in pounds) are the independent variables and BMI is the dependent variable.
Table 2. Dataset for Measurement of Health
|
S. No. |
Height (Inches) |
Weight (Pounds) |
BMI |
Measure of health |
|
1. |
71.51521 |
136.4879 |
18.4873 |
Healthy Weight |
|
2. |
69.39874 |
153.0269 |
22.33675 |
Healthy Weight |
|
3. |
68.2166 |
142.3354 |
21.50246 |
Healthy Weight |
|
4. |
67.78781 |
144.2971 |
22.07546 |
Healthy Weight |
|
5. |
69.80204 |
141.4947 |
20.41546 |
Healthy Weight |
|
6. |
70.01472 |
136.4623 |
19.56993 |
Healthy Weight |
|
7. |
66.78236 |
120.6672 |
19.02046 |
Healthy Weight |
|
8. |
66.48769 |
127.4516 |
20.26834 |
Healthy Weight |
|
9. |
68.30248 |
125.6107 |
18.92819 |
Healthy Weight |
|
10. |
67.11656 |
122.4618 |
19.11158 |
Healthy Weight |
|
11. |
71.0916 |
136.9975 |
19.47328 |
Healthy Weight |
|
12. |
66.461 |
129.5023 |
20.611 |
Healthy Weight |
|
13. |
68.64927 |
142.9733 |
21.32742 |
Healthy Weight |
|
14. |
71.23033 |
137.9025 |
19.10722 |
Healthy Weight |
|
15. |
67.13118 |
124.0449 |
19.35021 |
Healthy Weight |
|
16. |
67.83379 |
141.2807 |
21.5847 |
Healthy Weight |
|
17. |
68.87881 |
143.5392 |
21.26937 |
Healthy Weight |
|
18. |
68.42187 |
129.5027 |
19.44663 |
Healthy Weight |
|
19. |
67.62804 |
141.8501 |
21.80376 |
Healthy Weight |
|
20. |
67.20864 |
129.7244 |
20.18956 |
Healthy Weight |
|
21. |
70.84235 |
142.4235 |
19.95037 |
Healthy Weight |
|
22. |
67.49434 |
131.5502 |
20.30075 |
Healthy Weight |
|
23. |
65.44098 |
113.8922 |
18.69604 |
Healthy Weight |
|
24. |
65.8132 |
120.7536 |
19.5988 |
Healthy Weight |
|
25. |
61.8163 |
125.7886 |
19.22775 |
Healthy Weight |
|
26. |
70.59505 |
136.2225 |
19.21568 |
Healthy Weight |
3.1 Motivation of the Study:
Many of the researchers concentrated in different ways for the healthy life of the human beings. But the foremost important factor is the body mass index, diseases which attack the human bodies in terms of body mass. So this study will help for the analysis of the healthy life based on BMI.
3.2 Measurement of BMI:
There are three numeric variables like Height, Weight and BMI which will be analyzed with the use of SPSS package. The BMI (kg/m2) is calculated from the below formula [6]
(1)
The followings are the categories for the BMI in the Table 3.
Table 3. BMI Categories
|
SNO |
Categories |
BMI (kg/m2) |
|
1. |
Under weight |
< 18.50 |
|
2. |
Normal or Healthy weight |
18.50 to 24.90 |
|
3. |
Over weight |
25.0 to 29.90 |
|
4. |
Weight is Obese |
> 30.0 |
BMI test is essential to figure out the physique fat. It gives a general idea of our disease risk. It is relatively simple to measure rates and to investigate the obesity in a population. When we have a better understanding of the correlation of height, weight and BMI, It will give an idea to control and prevent the disease through proper dietary pattern [7]. If the BMI value is high, it indicates the high body fatness and also the relative risk of the disease is increased.
4. CORRELATION ANALYSIS OF DATASET:
These are used in the Pearson Correlation for getting the correlation coefficients using SPSS package. From the below tables, N indicates the number of data in the big datasets. The correlation between the Height and Weight is 0.503. The significant value of the correlation is 0.000. Every variable is to correlate with 1. The half diagonal is identical. The following is a Pearson correlation matrix. The diagonal value is 1. The correlation between the Height and Weight is equivalent to the correlation between the Weight and Height. The correlation between the Weight and BMI is equivalent to the correlation between the BMI and Weight. The above correlation is Positive correlation. The correlation between the Height and BMI is equivalent to the correlation between the BMI and Height. But this correlation is negative correlation. The r is 0.795 so a strong relationship is between height and weight. Result shows correlations for all the pairs of variables and also each of the correlation is produced twice in the matrix in the Table4.
Table 4 Correlations Analysis for Two Independent Variables and On Dependent Variable correlations
|
|
Height (Inches) |
Weight (Pounds) |
BMI |
|
|
Height (Inches) |
Pearson Correlation |
1 |
0.503” |
-0.122” |
|
|
Sig. (2-tailed) |
|
0.000 |
0.000 |
|
|
N |
25000 |
25000 |
25000 |
|
Weight (Pounds) |
Pearson Correlation |
0.503” |
1 |
0.795” |
|
|
Sig. (2-tailed) |
0.000 |
|
0.000 |
|
|
N |
25000 |
25000 |
25000 |
|
BMI |
Pearson Correlation |
-0.122” |
0.795” |
1 |
|
|
Sig. (2-tailed) |
0.000 |
0.000 |
|
|
|
N |
25000 |
25000 |
25000 |
**. Correlation is significant at the 0.01 level (2-tailed)
5. REGREESION ANALYSIS OF DATASET
The following results are taken from the SPSS package using dataset [9].
5.1 Analysis Weight, Height and BMI:
The
model summary is shown the value of the
is 0.998, the coefficient of determination (
is 0.997 i.e, 99.70% and standard error of the
estimate is 0.87 in Table 5. The unstandardized and standardized coefficient
values are displayed in the Table 6.
Table 5. Model Summary (Two Independent and One Dependent)
Model summary
|
Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
|
1 |
0.998a |
0.997 |
0.997 |
0.087 |
a. Predictors: (Constant), Height (Inches), Weight (Pounds)
Table 6. Coefficients based on Two Independent and One Dependent) Coefficientsa
|
Model |
Unstandardized Coefficients |
Standardized coefficients |
t |
Sig. |
||
|
B |
Std. Error |
Beta |
||||
|
1. |
(Constant) |
38.660 |
0.020 |
|
1908.260 |
0.000 |
|
|
Weight (Pounds) |
0.153 |
0.000 |
1.146 |
2785.277 |
0.000 |
|
|
Height (Inches) |
-0.570 |
0.000 |
-0.698 |
-1695.816 |
0.000 |
5.2 Analysis of Weight and BMI Variables:
The
model summary is presented the value of the
is 0.795, the coefficient of determination (
is 0.633 i.e, 63.30 and the standard Error of the
Estimate is 0.940 in Table 7. The unstandardized and standardized coefficient
values are displayed in the Table 8.
Table 7. One Independent Variable and One Dependent Variable in the Model Summary
Model Summary
|
Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
|
1. |
0.795a |
0.633 |
0.633 |
0.940 |
a. Predictors: (Constant), Weight (Pounds)
Table 8. Coefficients of one Independent and One Dependent Variables Coefficientsa
|
Model |
Unstandardized Coefficients |
Standardized coefficients |
t |
Sig. |
||
|
B |
Std. Error |
Beta |
||||
|
1. |
(Constant) |
5.867 |
0.065 |
0.795 |
90.148 |
0.000 |
|
Weight (Pounds) |
0.106 |
0.001 |
207.549 |
0.000 |
||
5.3 Analysis of Height and BMI Variables
The
model summary is presented the value of the
is 0.122, the coefficient of determination (
is 0.015 i.e, 1.50 and the standard Error of the
Estimate is 1.540 in the Table 9. The unstandardized and standardized
coefficient values are displayed in the Table 10.
Table 9. One Independent Variable with One Dependent Variable in Model Summary
Model summary
|
Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
|
1. |
-0.122a |
0.015 |
1.540 |
0.940 |
a. Predictors: (Constant), Weight (Pounds)
Table 10. Coefficients of one Independent and One Dependent Variables coefficientsa
|
Model |
Unstandardized Coefficients |
Standardized coefficients |
t |
Sig. |
||
|
B |
Std. Error |
Beta |
||||
|
1. |
(Constant) |
26.062 |
0.348 |
-0.122 |
74.794 |
0.000 |
|
|
Height (Inches) |
-0.099 |
0.005 |
-19.356 |
0.000 |
|
a. Dependent Variable: BMI
5.4 DISCUSSION:
5.4.1 Analysis of Two Independent Variables with One Independent Variable (Weight):
The following formula is used to find the estimated BMI based on the two Independent variables (Height and Weight) and one Independent Variable (Weight) from Tables 5, 6, 7 and 8.
Estimated BMI = (0.153) Weight + (-0.570) Height +38.6660 (2)
Estimated BMI = (0.106) Weight + 5.867 (3)
The R square value of the two Independent variables like Weight and Height is 0.997. The R square value of the one Independent variable like Weight is 0.633. The R square value of the two Independent variables is higher than the R square value of the one Independent variable from the coefficients. The Estimation of the Standard Error of the two Independent variables (0.087) is decreased from the Estimation of the Standard Error of the one Independent variable (0.940). So the equitation is proved.
5.4.2 Analysis of Two Independent Variables with One Independent Variable (Height):
The following formula is used to find the estimated BMI based on the two Independent variables (Height and Weight) and one Independent Variable (Height) from Table 5, Table 6, Table 9 and Table 10.
Estimated BMI = (0.153) Weight + (-0.570) Height +38.6660 (4)
Estimated BMI = (-0.099) Height + 26.062 (5)
The R square value of the two Independent variables like Weight and Height is 0.997. The R square value of the one Independent variable like Height is 0.015. The R square value of the two Independent variables is higher than the R square value of the one Independent variable from the coefficients. The Estimation of the Standard Error of the two Independent variables (0.087) is decreased from the Estimation of the Standard Error of the one Independent variable (1.540). So the equitation is proved.
5.4.3 Analysis of one Independent Variable (Weight) with One Independent Variable (Height):
The following formula is used to find the estimated BMI based on the one Independent variable (Weight) and one Independent Variable (Height). ( Tables from 7, 8, 9 and 10)
Estimated BMI = (0.106) Weight + 5.867 (6)
Estimated BMI = (-0.099) Height + 26.062 (7)
The R square value of the one Independent variable like Weight is 0.633. The R square value of the one Independent variable like Height is 0.015. The R square value of one Independent variable like Height (1.540) is higher than the R square value of the one Independent variable like Weight (0.940) from the coefficients. So, the Weight is the better predictor of BMI than the Height.
6. RESULT AND DISCUSSION:
The
Bivariate dataset is displayed in the graphical form by a scatter plot or
scatter diagram. The independent variable i.e., height is on the X-axis or
horizontal axis and the dependent variable i.e., BMI is on the Y axis /
vertical axis. From Figure 2, height increases, weight also increases. So,
there is a linear relationship exit between the height and BMI variables in the
big dataset. Figure 3 and 4 are positive correlation because both the variables
like BMI and Height; and BMI and Unstandardized Predicated Value move in the
same direction The Figure 5 is the negative correlation because both the
variables move in opposite directions. The
value is 0.015[8].
Figure 2. Scatter Diagram for BMI and Height
Figure 3. Scatter Diagram for BMI and Weight
Figure 4. Scatter Diagram for BMI and Unstandardized Predicated Value
7. CONCLUSION:
The maintaining of the healthy weight is so important for the overall health of the body and also helps to prevent and control many of the diseases. If a person is overweight or obese, the same person has the higher risk of serious health problems like high blood pressure, gallstones, breathing problems, certain cancers, heart disease and type 2 diabetes. Many people from overweight never develop the diabetes. Statistically, obesity has been proven and to increase the risk of type-2 diabetes and also sleeping disordered breathing [10]. So this paper concludes to maintain the weight based on the height for preventing the higher risk of serious problems.
8. REFERENCES:
1. Natalija Koseleva and Guoda Ropaite, “Big data in building energy efficiency: understanding of big data and main challenges”, Modern Building Materials, Structures and Techniques, MBMST 2016, Procedia Engineering 172 (2017) 544-549, 1877-7058, 2017 Elsevier Ltd, DOI: 10.1016/j.proeng.2017.02.064.
2. Ying Cheng, Ken Chen, Hemeng Sun, Youngping Zhang and Fei Tao, “Data and knowledge mining with big dta towards smart production”, Journal of Industrial Information Integration,1-13, 2017 Elsevier Ltd, DOI: 10.1016/j.jii.2017.08.001.
3. Xingdong Wu, Gong-Qing Wu and Wei Ding, “Data Mining with Big Data”, IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No.1, 97- 107, Jan 2014.
4. Research Methods Knowledge Base https://www.socialresearchmethods.net/kb/statcorr.php
5. Regression Analysis, http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_multivariable/BS704_Multivariable6.html1
6. What Health, Healthy Living for Ever Body, http://www.whathealth.com/bmi/formula.html.
7. Body Mass Index: Considerations for Practitioners, Department of Health and Human Services Centers for Disease Control and Prevention. https://www.cdc.gov/obesity/downloads/bmiforpactitioners.pdf
8. Scatterplot and Correlation: Definition, Example and Analysis http://study.com/academy/lesson/scatter-plot-and-correlation-definition-example-analysis.html
9. SOCR Data Dinov 020108 HeightsWeights, http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights
10. Diabetes.co.uk, The global diabetes community, https://www.diabetes.co.uk/bmi.html.
Received on 01.11.2017 Modified on 02.12.2017
Accepted on 07.03.2018 © RJPT All right reserved
Research J. Pharm. and Tech 2018; 11(6): 2243-2247.
DOI: 10.5958/0974-360X.2018.00415.8