1. INTRODUCTION:

Now-a-days children are growing in up normal that leads to undergo a number of physical changes and also attacks many diseases. The Body Mass Index (BMI) is to the estimation of the fat amount in the human body. Based on this, we can lose the weight or not. It will help to maintain the healthy weight for everyone in our life. This paper describes the maintaining of the body’s healthy weight according to the height. The data mining is an important concept for extracting knowledge from Big Data and to predict the accurate and reliable results from Big Data with the help of correlation and regression analysis. The big data is rapidly growing in engineering and science domains. The analysis of correlation and regression are used for prediction, validity, reliability and verification.

This paper organized in the following ways. The section 2 defines the big data mining, correlation analysis and regression analysis.

The Section 3 is important concepts of the methodology which is used in this paper. The section 4 analyzes the concepts of correlation of big data sets. The Section 5 presents about the analysis of regression of big data sets. The section 6 discusses about the results of the big data sets. This paper concludes with the section 7.

2. REVIEW OF LITERATURE:

2.1 Big Data Mining:

Data mining is to find the interesting patterns from datasets. But big data is to store large scale data and process of large datasets. The Big Data is huge volume; huge velocity and / or huge variety of information and it require processing of new forms to enable enhanced, discovery, optimization and decision making^[1].

The mining tools used for discovering valuable knowledge from big data storage. It will give ideas about the prevention of defectives, improvement of qualities etc.,^[2].

Big data are large volume, complex, autonomous sources and growing datasets. The mining of interesting knowledge from the huge data sets. It provides the most relevant and accurate feedback of social sensing for better understanding of society in real time^[3].

2.2 Correlation Analysis:

The correlation analysis is a statistical procedure and to evaluate the association between the two quantitative variables. The calculation of the correlation coefficient (r) is defined as follows.

….. (1)

If the variable is to quantify the relationship strength, we have to measure the correlation coefficient (r). The complete numerical value ranges from +1 to -1. The sign and magnitude represents the relationship of direction and strength respectively.

2.2.1 Strength of the Relationships:

The raw dataset send for pre-processing of the data. Then the r value is calculated. Based on the r value, the following relationship is assigned for the dataset [4].

1. If r > 0, the linear relationship is positive. i.e., when one variable increases and other variable increases the correlation is positive.

2. If r < 0, the linear relationship is negative. i.e., when one variable decreases and other variable increases, the correlation is negative relationship.

3. If r = 0, No linear relationship. i.e., it is complete absence of correlation.

Figure 1. Data Flow Diagram for Correlation Analysis

2.2.2 General Guidelines of Strength:

The correlation coefficient’s magnitude determines strength of the correlation in the Table 1.

Table 1. Guidelines for the Strength of the Correlation

S.NO	Strength	Correlation
1.	3	Small / Weak
2.		Medium / Moderate
3.		Large / Strong

2.3 Regression Analysis:

Linear regression is a type of regression analysis which is to examine the relationship between two quantitative variables i.e., one independent or explanatory variable and one dependent variable^[5].

The equation of linear regression is written as

Y=a+bX

Where

X – Variable of Independent or explanatory

Y- Variable of Dependent

b – Line Slope

a - Interception

3. METHODOLOGY:

The dataset is taken from the SOCR Data. It consists of 25000 records of Human Heights (inches) and Weights (Pounds) in the Table 2. From the dataset, we estimate the Body Mass Index and find relationship among the height and width with BMI. Here, heights (measured in inches) and weights (measured in pounds) are the independent variables and BMI is the dependent variable.

Table 2. Dataset for Measurement of Health

S. No.	Height (Inches)	Weight (Pounds)	BMI	Measure of health
1.	71.51521	136.4879	18.4873	Healthy Weight
2.	69.39874	153.0269	22.33675	Healthy Weight
3.	68.2166	142.3354	21.50246	Healthy Weight
4.	67.78781	144.2971	22.07546	Healthy Weight
5.	69.80204	141.4947	20.41546	Healthy Weight
6.	70.01472	136.4623	19.56993	Healthy Weight
7.	66.78236	120.6672	19.02046	Healthy Weight
8.	66.48769	127.4516	20.26834	Healthy Weight
9.	68.30248	125.6107	18.92819	Healthy Weight
10.	67.11656	122.4618	19.11158	Healthy Weight
11.	71.0916	136.9975	19.47328	Healthy Weight
12.	66.461	129.5023	20.611	Healthy Weight
13.	68.64927	142.9733	21.32742	Healthy Weight
14.	71.23033	137.9025	19.10722	Healthy Weight
15.	67.13118	124.0449	19.35021	Healthy Weight
16.	67.83379	141.2807	21.5847	Healthy Weight
17.	68.87881	143.5392	21.26937	Healthy Weight
18.	68.42187	129.5027	19.44663	Healthy Weight
19.	67.62804	141.8501	21.80376	Healthy Weight
20.	67.20864	129.7244	20.18956	Healthy Weight
21.	70.84235	142.4235	19.95037	Healthy Weight
22.	67.49434	131.5502	20.30075	Healthy Weight
23.	65.44098	113.8922	18.69604	Healthy Weight
24.	65.8132	120.7536	19.5988	Healthy Weight
25.	61.8163	125.7886	19.22775	Healthy Weight
26.	70.59505	136.2225	19.21568	Healthy Weight

3.1 Motivation of the Study:

Many of the researchers concentrated in different ways for the healthy life of the human beings. But the foremost important factor is the body mass index, diseases which attack the human bodies in terms of body mass. So this study will help for the analysis of the healthy life based on BMI.

3.2 Measurement of BMI:

There are three numeric variables like Height, Weight and BMI which will be analyzed with the use of SPSS package. The BMI (kg/m²) is calculated from the below formula ^[6]

(1)

The followings are the categories for the BMI in the Table 3.

Table 3. BMI Categories

SNO	Categories	BMI (kg/m²)
1.	Under weight	< 18.50
2.	Normal or Healthy weight	18.50 to 24.90
3.	Over weight	25.0 to 29.90
4.	Weight is Obese	> 30.0

BMI test is essential to figure out the physique fat. It gives a general idea of our disease risk. It is relatively simple to measure rates and to investigate the obesity in a population. When we have a better understanding of the correlation of height, weight and BMI, It will give an idea to control and prevent the disease through proper dietary pattern ^[7]. If the BMI value is high, it indicates the high body fatness and also the relative risk of the disease is increased.

4. CORRELATION ANALYSIS OF DATASET:

These are used in the Pearson Correlation for getting the correlation coefficients using SPSS package. From the below tables, N indicates the number of data in the big datasets. The correlation between the Height and Weight is 0.503. The significant value of the correlation is 0.000. Every variable is to correlate with 1. The half diagonal is identical. The following is a Pearson correlation matrix. The diagonal value is 1. The correlation between the Height and Weight is equivalent to the correlation between the Weight and Height. The correlation between the Weight and BMI is equivalent to the correlation between the BMI and Weight. The above correlation is Positive correlation. The correlation between the Height and BMI is equivalent to the correlation between the BMI and Height. But this correlation is negative correlation. The r is 0.795 so a strong relationship is between height and weight. Result shows correlations for all the pairs of variables and also each of the correlation is produced twice in the matrix in the Table4.

5.4 DISCUSSION:

5.4.1 Analysis of Two Independent Variables with One Independent Variable (Weight):

The following formula is used to find the estimated BMI based on the two Independent variables (Height and Weight) and one Independent Variable (Weight) from Tables 5, 6, 7 and 8.

Estimated BMI = (0.153) Weight + (-0.570) Height +38.6660 (2)

Estimated BMI = (0.106) Weight + 5.867 (3)

The R square value of the two Independent variables like Weight and Height is 0.997. The R square value of the one Independent variable like Weight is 0.633. The R square value of the two Independent variables is higher than the R square value of the one Independent variable from the coefficients. The Estimation of the Standard Error of the two Independent variables (0.087) is decreased from the Estimation of the Standard Error of the one Independent variable (0.940). So the equitation is proved.

5.4.2 Analysis of Two Independent Variables with One Independent Variable (Height):

The following formula is used to find the estimated BMI based on the two Independent variables (Height and Weight) and one Independent Variable (Height) from Table 5, Table 6, Table 9 and Table 10.

Estimated BMI = (0.153) Weight + (-0.570) Height +38.6660 (4)

Estimated BMI = (-0.099) Height + 26.062 (5)

The R square value of the two Independent variables like Weight and Height is 0.997. The R square value of the one Independent variable like Height is 0.015. The R square value of the two Independent variables is higher than the R square value of the one Independent variable from the coefficients. The Estimation of the Standard Error of the two Independent variables (0.087) is decreased from the Estimation of the Standard Error of the one Independent variable (1.540). So the equitation is proved.

5.4.3 Analysis of one Independent Variable (Weight) with One Independent Variable (Height):

The following formula is used to find the estimated BMI based on the one Independent variable (Weight) and one Independent Variable (Height). ( Tables from 7, 8, 9 and 10)

Estimated BMI = (0.106) Weight + 5.867 (6)

Estimated BMI = (-0.099) Height + 26.062 (7)

The R square value of the one Independent variable like Weight is 0.633. The R square value of the one Independent variable like Height is 0.015. The R square value of one Independent variable like Height (1.540) is higher than the R square value of the one Independent variable like Weight (0.940) from the coefficients. So, the Weight is the better predictor of BMI than the Height.

6. RESULT AND DISCUSSION:

The Bivariate dataset is displayed in the graphical form by a scatter plot or scatter diagram. The independent variable i.e., height is on the X-axis or horizontal axis and the dependent variable i.e., BMI is on the Y axis / vertical axis. From Figure 2, height increases, weight also increases. So, there is a linear relationship exit between the height and BMI variables in the big dataset. Figure 3 and 4 are positive correlation because both the variables like BMI and Height; and BMI and Unstandardized Predicated Value move in the same direction The Figure 5 is the negative correlation because both the variables move in opposite directions. The value is 0.015^[8].

Figure 2. Scatter Diagram for BMI and Height

Figure 3. Scatter Diagram for BMI and Weight

Figure 4. Scatter Diagram for BMI and Unstandardized Predicated Value

7. CONCLUSION:

The maintaining of the healthy weight is so important for the overall health of the body and also helps to prevent and control many of the diseases. If a person is overweight or obese, the same person has the higher risk of serious health problems like high blood pressure, gallstones, breathing problems, certain cancers, heart disease and type 2 diabetes. Many people from overweight never develop the diabetes. Statistically, obesity has been proven and to increase the risk of type-2 diabetes and also sleeping disordered breathing ^[10]. So this paper concludes to maintain the weight based on the height for preventing the higher risk of serious problems.

8. REFERENCES:

1. Natalija Koseleva and Guoda Ropaite, “Big data in building energy efficiency: understanding of big data and main challenges”, Modern Building Materials, Structures and Techniques, MBMST 2016, Procedia Engineering 172 (2017) 544-549, 1877-7058, 2017 Elsevier Ltd, DOI: 10.1016/j.proeng.2017.02.064.

2. Ying Cheng, Ken Chen, Hemeng Sun, Youngping Zhang and Fei Tao, “Data and knowledge mining with big dta towards smart production”, Journal of Industrial Information Integration,1-13, 2017 Elsevier Ltd, DOI: 10.1016/j.jii.2017.08.001.

3. Xingdong Wu, Gong-Qing Wu and Wei Ding, “Data Mining with Big Data”, IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No.1, 97- 107, Jan 2014.

4. Research Methods Knowledge Base https://www.socialresearchmethods.net/kb/statcorr.php

5. Regression Analysis, http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_multivariable/BS704_Multivariable6.html1

6. What Health, Healthy Living for Ever Body, http://www.whathealth.com/bmi/formula.html.

7. Body Mass Index: Considerations for Practitioners, Department of Health and Human Services Centers for Disease Control and Prevention. https://www.cdc.gov/obesity/downloads/bmiforpactitioners.pdf

8. Scatterplot and Correlation: Definition, Example and Analysis http://study.com/academy/lesson/scatter-plot-and-correlation-definition-example-analysis.html

9. SOCR Data Dinov 020108 HeightsWeights, http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights

10. Diabetes.co.uk, The global diabetes community, https://www.diabetes.co.uk/bmi.html.

Received on 01.11.2017 Modified on 02.12.2017

Research J. Pharm. and Tech 2018; 11(6): 2243-2247.

DOI: 10.5958/0974-360X.2018.00415.8

Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	0.998^a	0.997	0.997	0.087

Model		Unstandardized Coefficients		Standardized coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1.	(Constant)	38.660	0.020		1908.260	0.000
	Weight (Pounds)	0.153	0.000	1.146	2785.277	0.000
	Height (Inches)	-0.570	0.000	-0.698	-1695.816	0.000

Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1.	0.795^a	0.633	0.633	0.940

Model		Unstandardized Coefficients		Standardized coefficients	t	Sig.
		B	Std. Error	Beta
1.	(Constant)	5.867	0.065	0.795	90.148	0.000
	Weight (Pounds)	0.106	0.001		207.549	0.000

Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1.	-0.122^a	0.015	1.540	0.940

		Height (Inches)	Weight (Pounds)	BMI
Height (Inches)	Pearson Correlation	1	0.503”	-0.122”
	Sig. (2-tailed)		0.000	0.000
	N	25000	25000	25000
Weight (Pounds)	Pearson Correlation	0.503”	1	0.795”
	Sig. (2-tailed)	0.000		0.000
	N	25000	25000	25000
BMI	Pearson Correlation	-0.122”	0.795”	1
	Sig. (2-tailed)	0.000	0.000
	N	25000	25000	25000