Salta ai contenuti. | Salta alla navigazione

Strumenti personali

lab 6_LRM inference

Plain Text icon Lab 6_LRM inference.txt — Plain Text, 3 kB (3402 bytes)

Contenuto del file

Simple Linear Regression Model using R
UNIFE 
Spring Semester
Mini V. 22-02-2019

RESEARCH QUESTION:
does exist the studied linear relationship between x and y within the reality as a whole?

#Analysis: step by step
0. LET'S PREPARE THE DATASET
1. Visualize the relationship: the scatter plot
2. Identify the estimated model
3. The model on a graph
4. Prediction: the expected Y values given a X value
5. The model�s goodness of fit
6. Graphical analysis of Linear Regression Model�s assumptions
7. what about the inference?  ????
We have performed our regression analysis
getwd()
> cake=read.csv2("cake_reg lin.csv")
> attach(cake)
> View(cake)
> y=sold_cakes
> x=unit_price
> reg_lin=lm(y~x)





#7. Making inference: test of the linear relationship between the variables within the reality (and not only in our sample)#
We mail follow two ways:
	we compute the t-statistic and the critical value, thus we compare them
	we observe the p-value associated with the b1 coefficient within the summary of our regression analysis

	We  identify the system of  hypothesis 
# H0 : B1= 0 (we exclude linear relation between Y and X within the population)
# H1 : B1 ? 0 (we don�t exclude the linear relationship between Y and X within the reality)

#if Tstat > v.c. --> we refuse H0 --> linear relationship between Y and X within the rality

#Tstat = (b1 - B1)/Sb1 --> where Sb1 = SSE^2/dev X

# let�s compute the elements using R: 
dev.disp=sum(reg.lin$residuals^2)           #residuals SSE
dev.disp    

x.difference=x-mean(x)                       #xi - x average #
dev.x=sum(x.difference^2)          #total sum of (xi - x average)#
dev.x         

b1=47.577
b1            #already computed to individuate the coefficients

#thus, the T-statistics is: 

tstat= b1/sqrt(dev.disp/(32*dev.x))
tstat           # equals 5.208#

#let�s now calculate the t a/2  critical value, assuming a level of significance a equals to 0.05 (so a level of confidence = 1-0.05= 95% )#
#vc = ta/2 --> alfa = 0.01--> alfa/2 = 0.025; n-2 = 34-2=32#
#the command for the critical value  t sub a over 2 : qt(a/2, degree of freedom)

qt(0.025,32)      #il v.c. -2.037
#let�s compare the obtained results (in absolute value) for making a final decision: 
t-stat > ta/2   5.208>2.037 --> we refuse H0 thus
#it exists an empirical evidence that at the 95% of confidence there is a linear relationship between number of sold cakes and unit�s cake prices#
	Please observe the summary to individuate the confidence level of our inference: 
summary(reg.lin)
#_____________#
IN CLASS LAB:
Please, download the database called �eating� from the course�s website
We want to investigate the causal linear relationship between the alcohol consumption and the sugar consumption. 
0. Prepare the dataset and describe the variables of interest
1. Visualize the relationship (the scatter plot between x and y)
2. Identify the estimated model and writ up the equation you find: interpret the coefficients
3. Show the model on a graph
4. Prediction: the expected Y values given a X value equals 32
5. Assess and comment the model�s goodness of fit
6. Perform the graphical analysis of Linear Regression Model�s assumptions
7. Evaluate the possibility to make inference: does the linear relationship exist within the population? (assume a confidence level equals to 95%)