Saturday, 30 March 2013

3D-Data Plotting In R


Assignment 1: Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length,T<- cbind(x,y,z) and Create 3 dimensional plot of the same

Solution:




plot3d(m)

plot3d(T,col=rainbow(1000),type='s')



Assignment no 2: Create 2 random variables and create 3 plots: 
 X-Y ,X-Y|Z (introducing a variable z and cbind it to x and y with 5 diff categories) 

> x<-rnorm(1500,100,10)
> y<-rnorm(1500,85,5)
> z1<-sample(letters,5)
> z2<-sample(z1,1500,replace=TRUE)
> z<-as.factor(z2)
> t<-cbind(x,y,z)

 qplot(x,y)



qplot(x,z)

qplot(x,y,geom=c("point","smooth"))




qplot(x,y,colour=z)

qplot(log(x),log(y),colour=z)


Sunday, 24 March 2013

Assignmet: Review of a tool


Gliffy: A Visio tool in your browser

The gliffy tool allows a user to draw online diagrams that can be used in one's business process and also in creating design documents.



The gliffy tool gives you everything required for the business. It helps one to create right from the business design to the business process diagram for one’s business requirement.

The beauty of the tool is such that it offers many predefined templates for the ease of creation and also provides an online document for this design which can be accessed and modified at any point of time.



The tool also allows you to create a new document allowing the custom designs to be created. The new design template can be made using the variety of design tools and symbols that are available in the package itself.



The tool also comes up with a library that contains predefined templates and other functions that are useful for creating various designs. 


 The tool also allows to do modification in the library to suit according to the user requirements.


The tool also allows multiple add in options like adding collaborators for the design and one can share the deign in their blog. The design can also be printed and modified several times.



A Simple user manual Video:




Conclusion: The tool is very useful in creating business requirement designs and it comes in handy for variety of purposes. The one disadvantage is that tool works in web browser with Macromedia flash player installed in  it. The tool can do wonders for one and it is indeed a valuable tool available online.

Friday, 15 March 2013

IT LAB: Assignment Session #8


Problem:

Perform Panel Data Analysis of "Produc" data

Solution:
There are three types of models:
      Pooled affect model
      Fixed affect model
      Random affect model

We will be determining which model is the best by using functions:
       pFtest : for determining between fixed and pooled
       plmtest : for determining between pooled and random
       phtest: for determining between random and fixed

The data can be loaded using the following command
data(Produc , package ="plm")
head(Produc)










Pooled Affect Model
 
pool <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("pooling"),index =c("state","year"))
summary(pool)


  








Fixed Affect Model:

fixed<-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("within"),index =c("state","year"))
summary(fixed)











Random Affect Model:

random <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("random"),index =c("state","year"))
> summary(random)


  

Testing of Model

This can be done through Hypothesis testing between the models as follows:

H0: Null Hypothesis: the individual index and time based params are all zero
H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

Null Hypothesis: Pooled Affect Model
Alternate Hypothesis : Fixed Affect Model

Command:

> pFtest(fixed,pool)


Result:

data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16
alternative hypothesis: significant effects
Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Pooled vs Random

Null Hypothesis: Pooled Affect Model
Alternate Hypothesis: Random Affect Model

Command :
> plmtest(pool)

Result:

  Lagrange Multiplier Test - (Honda)
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
normal = 57.1686, p-value < 2.2e-16
alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Affect Model.

Random vs Fixed

Null Hypothesis: No Correlation . Random Affect Model
Alternate Hypothesis: Fixed Affect Model

Command:
 > phtest(fixed,random)

Result:

 Hausman Test
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
chisq = 93.546, df = 7, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Conclusion: 

So after making all the tests we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence, we conclude that within the same id i.e. within same "state" there is no variation. 


Wednesday, 13 February 2013

IT LAB: Assignment 6

The class on 12-February focussed on how to create historical volatility of a data and the Auto Correlated plot (ACF)  of the data.

Assignment 1:  create log of returns data (from 01.01.2012 to 01.01.2013) and calculate historical volatility

Syntax:

> stockprice<-read.csv(file.choose(),header=T)
> head(stockprice)
> closingprice<-stockprice[,5]
> closingprice.ts<-ts(closingprice,frequency=252)
> returns<-(closingprice.ts-lag(closingprice.ts,k=-1))/lag(closingprice.ts,k=-1)
> z<-scale(returns)+10
> logreturns<-log(z)
> logreturns
> acf(logreturns)






From the above graph, we can see that the measurements lie with in the 95% confidence interval. Therefore, the time series is stationary.

Assignment 2: Create ACF plot for the log returns data ,perform adf test and interpret.

Syntax:

T=252^0.5
> historicalvolatility<-sd(logreturns)*T
> historicalvolatility
> adf.test(logreturns)




From the test results, we can see that p-value=0.01 (<0.05).
 Therefore, we reject the null hypothesis and accept the alternate hypothesis which states that the time series is stationary.











Thursday, 7 February 2013

IT LAB : ASSIGNMENT 5


Assignment 1:


 The Data set is downloaded from  CNX Mid-cap Index downloaded from NSE from August 2012-January 2013. The data should be read in such a manner that start data is the 10th reading and the end data is the 95th reading. 

Syntax:

> z<-read.csv(file.choose(),header=T)
> close.ts<-z$Close
> close.ts
> close.time<-ts(close.ts)
> close.time<-ts(Close.ts[10:95],deltat= 1/252)
> z.diff<- diff(close.time) 
> returns<- cbind(close.time,z.diff,lag(close.time,k= -1))
> returns<- cbind(close.time,z.diff,lag(close.time,k= -1), z.diff/lag(close.time,k= -1))
> returns
> plot(returns)

>returns<- z.diff/lag(close.time,k= -1)
>plot(returns)




Assignment 2:


Predicting the new data by doing an estimation(regression) with already available data from 1-700.

Syntax:


> z<-read.csv(file.choose(),header=T)
> z1<-z[1:700,1:9]
> head(z1)
> z1$ed<-factor(z1$ed)
> z1.est<-glm(default ~ age + ed + employ + address + income, data=z1, family ="binomial")
> summary(z1.est)
> forecast<-z[701:850,1:8]
> forecast$ed<-factor(forecast$ed)
> forecast$probability<-predict(z1.est,newdata=forecast,type="response")
> head(forecast)






Wednesday, 23 January 2013

IT Labs:: Assignment 3



ASSIGNMENT 1a: Mileage-groove

Fit ‘lm’ and comment on the applicability of ‘lm’
Plot1: Residual vs Independent curve
Plot2: Standard Residual vs independent curve

> file<-read.csv(file.choose(),header=T)
> file
  mileage groove
1       0 394.33
2       4 329.50
3       8 291.00
4      12 255.17
5      16 229.33
6      20 204.83
7      24 179.00
8      28 163.83
9      32 150.33
> x<-file$groove
> x
[1] 394.33 329.50 291.00 255.17 229.33 204.83 179.00 163.83 150.33
> y<-file$mileage
> y
[1]  0  4  8 12 16 20 24 28 32
> reg1<-lm(y~x)
> res<-resid(reg1)
> res
         1          2          3          4          5          6          7          8          9
 3.6502499 -0.8322206 -1.8696280 -2.5576878 -1.9386386 -1.1442614 -0.5239038  1.4912269  3.7248633
> plot(x,res)

Output:



 As the plot is parabolic, so we will not be able to do regression.


Assignment 1 (b) -Alpha-Pluto Data

Fit ‘lm’ and comment on the applicability of ‘lm’
Plot1: Residual vs Independent curve
Plot2: Standard Residual vs independent curve
Also plot the following:
Qq plot
Qqline

> file<-read.csv(file.choose(),header=T)
> file
   alpha pluto
1  0.150    20
2  0.004     0
3  0.069    10
4  0.030     5
5  0.011     0
6  0.004     0
7  0.041     5
8  0.109    20
9  0.068    10
10 0.009     0
11 0.009     0
12 0.048    10
13 0.006     0
14 0.083    20
15 0.037     5
16 0.039     5
17 0.132    20
18 0.004     0
19 0.006     0
20 0.059    10
21 0.051    10
22 0.002     0
23 0.049     5
> x<-file$alpha
> y<-file$pluto
> x
 [1] 0.150 0.004 0.069 0.030 0.011 0.004 0.041 0.109 0.068 0.009 0.009 0.048
[13] 0.006 0.083 0.037 0.039 0.132 0.004 0.006 0.059 0.051 0.002 0.049
> y
 [1] 20  0 10  5  0  0  5 20 10  0  0 10  0 20  5  5 20  0  0 10 10  0  5
> reg1<-lm(y~x)
> res<-resid(reg1)
> res
         1          2          3          4          5          6          7
-4.2173758 -0.0643108 -0.8173877  0.6344584 -1.2223345 -0.0643108 -1.1852930
         8          9         10         11         12         13         14
 2.5653342 -0.6519557 -0.8914706 -0.8914706  2.6566833 -0.3951747  6.8665650
        15         16         17         18         19         20         21
-0.5235652 -0.8544291 -1.2396007 -0.0643108 -0.3951747  0.8369318  2.1603874
        22         23
 0.2665531 -2.5087486
> plot(x,res)

Output: 



> qqnorm(res)

Output:



 > qqline(res)

Output:





Assignment 2: Justify Null Hypothesis using ANOVA

> file<-read.csv(file.choose(),header=T)
> file

   Chair Comfort.Level Chair1
1      I             2      a
2      I             3      a
3      I             5      a
4      I             3      a
5      I             2      a
6      I             3      a
7     II             5      b
8     II             4      b
9     II             5      b
10    II             4      b
11    II             1      b
12    II             3      b
13   III             3      c
14   III             4      c
15   III             4      c
16   III             5      c
17   III             1      c
18   III             2      c
> file.anova<-aov(file$Comfort.Level~file$Chair1)
> summary(file.anova)

            Df Sum Sq Mean Sq F value Pr(>F)
file$Chair1  2  1.444  0.7222   0.385  0.687

Since p-value is greater than 5%, we cannot reject the null hypothesis

Wednesday, 16 January 2013

IT Lbs: Assignment Set 2

Today we have learnt about creation,inverse,transpose and multiplication of matrices.Then we moved on to regression and residual analysis by taking NSE historical data for NIFTY index for a certain period.Finally we had an introductory idea about how to plot normally distributed curve.

Question 1:

Create two matrices of say size 3 X 3 and select the column 1 from one matrix and column 3 from second matrix. After selecting the columns in objects say x and y merge these two columns using cbind to create a new matrix .

Syntax & Output: 



Question 2:

Multiplication of two matrices. The two matrices which were created in the previous question were multiplied and results were seen.

Syntax : z1%*%z2

Output:





Question 3:

Read historical data of NIFTY indices from NSE for the period 1st Dec 2012 to 31st Dec 2012. Find regression and residuals.

Syntax:




Output:

Question 4: Generate a normal distribution data and plot it.

Syntax:




Output: