Sulthan's Monologue: R

Showing posts with label R. Show all posts

27 November 2017

Multivariate normality Tests with R - Mardia's Test, Henze-Zirkler, Royston

Most multivariate techniques, such as Linear Discriminant Analysis (LDA), Factor Analysis, MANOVA and Multivariate Regression are based on an assumption of multivariate normality. So, In this post, I am going to show you how you can assess the multivariate normality for the variables in your sample. The above test multivariate techniques can be used in a sample only when the variables follow a Multivariate normal distribution.

For this, you need to install a package called MVN Type install.packages("MVN")and then load the package using R command library(“MVN”)

There are 3 different multivariate normality tests available in this package

1.Mardia's Multivariate Normality Test

2.Henze-Zirkler's Multivariate Normality Test

3.Royston's Multivariate Normality Test

Let's discuss these test in brief here, I am using inbuilt trees data here data(“trees”). This data consists of 3 variables I.e Girth, Height and volume.

First, we use Mardia's test to verify the normality for the above data Type mardiaTest(trees) This will return the results of normality test with 3 variables in it. Data is not multivariate normal when the p-value is less than 0.05 . When you want to check Multivariate normality of selected variables. Create a subset. Let’s create a subset under name trees1 that includes 1st and 3rd variables using the command Trees1<-trees[c(1,3)].

Now let's check normality of trees1 using Henze-Zirkler's Test Type hzTest(trees1) .

To use Royston's Multivariate Normality Test Type roystonTest(trees1). So, That is how you can test the multivariate normality of variables using R. Give your queries and suggestions in comment section below.

Subscribe Sulthan’s Monologue and YouTube channel for more posts and videos. Follow me in twitter @sulthankhan

22 April 2017

Calculating Indian Income Tax using R (IndianTaxCalc) - My first R package

R is a powerful programming package that can perform many wonders with data analytics. The open platform gives the opportunity to develop package packages and share it with the world. Such that recently I created a package for calculating Income Tax in India. The name of the package is IndianTaxCalc. Using this package, you can calculate Income Tax liability for Financial years of

Individual resident aged below 60 years,
Senior Citizen,
Super Senior Citizen,
Firm,
Local Authority,
Any Non-Resident Individual / Hindu Undivided Family / Association of Persons /Body of Individuals / Artificial Judicial Person,
Co-operative Society.

The main advantage of using this package is you can calculate tax for any number of the person or group or mixture of both in single command with data in a spreadsheet. You can Download the package from CRAN here. This is my first package developed for R and the process of submitting and getting it listed in CRAN was very comfortable and the reviewers are very friendly.

Example workings on IndianTaxCalc

install.packages("IndianTaxCalc")
library("IndianTaxCalc")
##Income Tax calculation for individual
ITI2017(330000)
[1] 3090
##Income Tax calculation for Senior Citizen
ITI2017(480000,2)
[1] 13390
##Income Tax calculation for group of individuals
employees<-c(250000,350000,200000,7500000)
ITI2017(employees,1)
[1] 0 5150 0 2137250
##Income Tax calculation for dataframe with mixed category of data
sdata <- data.frame(income = c(300000, 400000, 5000000,15000000), category = c(1, 2, 3, 4))
ITI2017(sdata$income,sdata$category)
[1] 0 5150 1339000 5122963

Subscribe to Sulthan Academy, stay connected and receive updates.

19 October 2016

List of useful R commands

Here I present some simple commands that can be used in R. Share and Subscribe to Sulthan Academy via Email for more interesting updates. Share with your friends. Thank you. You can download the PDF

Commands	Purpose
help()	Obtain documentation for a given R command
example()	View some examples on the use of a command
c(), scan()	Enter data manually to a vector in R
seq()	Make arithmetic progression vector
rep()	Make vector of repeated values
data()	Load (often into a data.frame) built-in dataset
View()	View dataset in a spreadsheet-type format
str()	Display internal structure of an R object
read.csv(), read.table()	Load into a data.frame an existing data file
library(), require()	Make available an R add-on package
dim()	See dimensions (# of rows/cols) of data.frame
length()	Give length of a vector
ls()	Lists memory contents
rm()	Removes an item from memory
names()	Lists names of variables in a data.frame
hist()	Command for producing a histogram
histogram()	Lattice command for producing a histogram
stem()	Make a stem plot
table()	List all values of a variable with frequencies
xtabs()	Cross-tabulation tables using formulas
mosaicplot()	Make a mosaic plot
cut()	Groups values of a variable into larger bins
mean(), median()	Identify “center” of distribution
by()	apply function to a column split by factors
summary()	Display 5-number summary and mean
var(), sd()	Find variance, sd of values in vector
sum()	Add up all values in a vector
quantile()	Find the position of a quantile in a dataset
barplot()	Produces a bar graph
barchart()	Lattice command for producing bar graphs
boxplot()	Produces a boxplot
bwplot()	Lattice command for producing boxplots
help()	Obtain documentation for a given R command
example()	View some examples on the use of a command
c(), scan()	Enter data manually to a vector in R
seq()	Make arithmetic progression vector
rep()	Make vector of repeated values
data()	Load (often into a data.frame) built-in dataset
View()	View dataset in a spreadsheet-type format
str()	Display internal structure of an R object
read.csv(), read.table()	Load into a data.frame an existing data file
library(), require()	Make available an R add-on package
dim()	See dimensions (# of rows/cols) of data.frame
length()	Give length of a vector
ls()	Lists memory contents
rm()	Removes an item from memory
names()	Lists names of variables in a data.frame
hist()	Command for producing a histogram
histogram()	Lattice command for producing a histogram
stem()	Make a stem plot
table()	List all values of a variable with frequencies
xtabs()	Cross-tabulation tables using formulas
mosaicplot()	Make a mosaic plot
cut()	Groups values of a variable into larger bins
mean(), median()	Identify “center” of distribution
by()	apply function to a column split by factors
summary()	Display 5-number summary and mean
var(), sd()	Find variance, sd of values in vector
sum()	Add up all values in a vector
quantile()	Find the position of a quantile in a dataset
barplot()	Produces a bar graph
barchart()	Lattice command for producing bar graphs
boxplot()	Produces a boxplot
bwplot()	Lattice command for producing boxplots
plot()	Produces a scatterplot
xyplot()	Lattice command for producing a scatterplot
lm()	Determine the least-squares regression line
anova()	Analysis of variance (can use on results of lm())
predict()	Obtain predicted values from linear model
nls()	estimate parameters of a nonlinear model
residuals()	gives (observed - predicted) for a model fit to data
sample()	take a sample from a vector of data
replicate()	repeat some process a set number of times
cumsum()	produce running total of values for input vector
ecdf()	builds empirical cumulative distribution function
dbinom(), etc.	tools for binomial distributions
dpois(), etc.	tools for Poisson distributions
pnorm(), etc.	tools for normal distributions
qt(), etc.	tools for student t distributions
pchisq(), etc.	tools for chi-square distributions
binom.test()	hypothesis test and confidence interval for 1 proportion
prop.test()	inference for 1 proportion using normal approx.
chisq.test()	carries out a chi-square test
fisher.test()	Fisher test for contingency table
t.test()	student t test for inference on population mean
qqnorm(), qqline()	tools for checking normality
addmargins()	adds marginal sums to an existing table
prop.table()	compute proportions from a contingency table
par()	query and edit graphical settings
power.t.test()	power calculations for 1- and 2-sample t
anova()	compute analysis of variance table for fitted model

12 October 2016

how to Calculate Descriptive statistics using R

In this post I will explain you how to calculate basic descriptive statistics using R. The following R scripts helps to calculate descriptive statistics such as Mean, Median, Mode, Standard deviation, Standard error of the mean,Five-number summary, quartiles, percentiles, skewness and kurtosis.

Packages used are psych and Desctools

##Descriptive statistics using R## (www.iamsulthan.in)
##Packages used psych and Desctools
##Following commands will install the packages only if they are not already installed
if(!require(psych)){install.packages("psych")}
if(!require(DescTools)){install.packages("DescTools")}
#Load your data here
data("trees") ##loads data inbuilt in dataset package of R which includes Girth, Height and Volume for Black Cherry Trees
##structure of the data frame
str(trees)
##summary
summary(trees)
##mean
mean(trees$Height)
##median
median(trees$Height)
##mode
Mode(trees$Height)
##Standard deviation
sd(trees$Height)
##Standard error of the mean
sd(trees$Height) / sqrt(length(trees$Height))
##Five-number summary, quartiles, percentiles
summary(trees$Height)
##skewness and kurtosis
describe(trees$Height,type=3) #there are 3 options available u can see in package documentation for details

You can find the script and download in Github Descriptive Statistics in R. Share and Subscribe to Sulthan Academy via Email for more interesting updates.

27 September 2016

How to get help in R ? R–Tips

R is very useful for data analysis but it surely need some programming skills. The one solid reason why analyst prefer r is its availability and flexibility with packages. The count of packages in CRAN is increasing day by day and it already consist of more than 4000 packages from Biometrics to Econometrics. Its really very hard to remember syntax and functions in packages. In this post I give you some important commands that could be very handy while working in R.

1. Using ?

If you want help on a function or a dataset that you know the name of, type ? followed by the name of the function. For example:
?mean opens the help page for the mean function
?"+" opens the help page for addition
?"if" opens the help page for if, used for branching code

2. Using ??

To find functions, type two question marks (??) followed by a keyword related to the problem to search. Special characters, reserved words, and multiword search terms need enclosing in double or single quotes. for example:
??plotting searches for topics containing words like "plotting"
??"regression model" searches for topics containing phrases like this

3. help and help.search

The functions help and help.search do the same things as ? and ??, respectively, but with these you always need to enclose your arguments in quotes. The following commands are equivalent to the previous examples:help("mean")
help("+")
help("if")
help.search("plotting")
help.search("regression model")

4. example and demo functions

Most functions have examples that you can run to get a better idea of how they work. Use the example function to run these. There are also some longer demonstrations of concepts that are accessible with the demo function:
example(plot)
demo() #list all demonstrations
demo(Japanese)

5. vignettes

R is splits into package, some of which contain vignettes, which are short documents on how to use the packages. You can browse all the vignettes on your machine using browseVignettes :
browseVignettes()
You can also access a specific vignette using the vignette function (but if your memory is as bad as mine, using browseVignettes combined with a page search is easier than trying to remember the name of a vignette and which package it’s in):
vignette("Sweave", package = "utils")

6. RSiteSearch

The help search operator ?? and browseVignettes will only find things in packages
that you have installed on your machine. If you want to look in any package, you can use RSiteSearch, which runs a query at http://search.r-project.org. Multiword terms need to be wrapped in braces:
RSiteSearch("{Clustering}")

leave your queries in comment section and Subscribe to I'm Sulthan by Email for more interesting updates. Share with your friends. Thank you.

23 February 2016

Simple way to download and export stock data using R - Method 3

you can read the method 1 here: http://www.iamsulthan.in/2016/04/step-by-step-tutorial-to-download.html If you have missed it.

In Security analysis and Portfolio management , the historical prices of a stock or indices is the base data for analysis in that case any researcher want to download and organize the data. Mostly they prefer to save the data in .csv file which is universally imported in any software. A difficulty or time consuming task for such researchers is to organize the data. for example: to check for any missing prices or to organize the closing prices in single worksheet for all companies in case of a multivariate analysis.

Using the following code you can download and export the data to .csv format which can be later used in any other data analysis softwares if you wish whereas R can do almost what other softwares do . It just need proper command to execute your research objectives.

How to download historical Share price - Method 1 using Yahoo Finance to Excel sheet

How to download historical share price automatically Method-2 Using Excel sheet

Github link for the code can be located here: download-historical-stock-data-in-R

Use the following code:

========================================================================

####Using quantmod####
####Single Stock data####
if(!require(quantmod)){install.packages("quantmod")}
getSymbols("WIPRO.NS", from="2015-01-01", to= Sys.Date())

#### multiple Stocks data Method-1 (each as seperate dataset) ####
getSymbols("WIPRO.NS;TCS.NS;INFY.NS", from="2015-01-01", to= Sys.Date())

#### multiple Stocks Method-2 (using list)####
stocklist <- c("WIPRO.NS","TCS.NS","INFY.NS","AAPL" )
getSymbols(stocklist, from="2015-01-01", to= Sys.Date())

####multiple Stocks Method-3####
stocklist <- c("WIPRO.NS","TCS.NS","INFY.NS","AAPL") # create list of stock tickers
Adjclose <- NULL
for (Ticker in stocklist)
Adjclose <- cbind(Adjclose,getSymbols.yahoo(Ticker, from="2015-01-01", to= Sys.Date(), verbose=FALSE, auto.assign=FALSE)[,6]) #get data for companies in list, [,6] = keep the adjusted prices
FinalAdjclose <- na.locf(Adjclose) #copy last traded price in empty cell
FinalAdjclose1 <- Adjclose[apply(Adjclose,1,function(x) all(!is.na(x))),] #keep only dates having closing price
####Export the data to excel####
if(!require(timeSeries)){install.packages("timeSeries")}
data <- as.timeSeries(FinalAdjclose1)
write.csv(data, "data.csv")

========================================================================

22 February 2016

Why R? Pros and Con

R is a open source programming software for statistical computing and graphics. You can download and install R in any platform either it may be Windows, Mac or Linux. the free and official download of the software can be done in the following link https://cran.r-project.org/ .

Installed R then its always recommended to use a IDE for R. there are few good IDE for R most of them are open source and you can find it at no cost. My best choice and I hope most agree with me that RStudio is the best IDE for R programming. you can download the RStudio using the link : https://www.rstudio.com/

Pros of using R:

1. Suits for statistical analysis: The environment in R is well suited for statistical analysis and it does handle big data quiet fast and ease.

2. open source and cross platforms: R is open source and do work in cross platforms. regardless of any OS whether it may be in Windows, Mac or Linux(debian, redhat, suse, ubuntu).

3. add-ons: This is the important reason that I recommend using R. researcher around the globe develop and distribute the packages in CRAN (Comprehensive R Archive Network). two useful websites where you find addons are here:

Comprehensive R Archive Network Crantastic

these networks grow very fast and The packages number are increasing day by day.

4. Flexibility: Because R is programmed by users the package is very flexible and when you make use of the available scripts it will save you a huge time .

Con of using R

1.Minimal GUI: you cant simply say that R is completely command based. there are few GUI that makes analysis easy. (for ex: Rcmdr). people used to work with GUI will find it hard when the task is to be done in command based environment.

Over all I would say that for researchers who are interested in exploring the data will feel R as very helpful and It can help you to develop your ideas in to package and share it with CRAN which makes your research significant among the users.