› Forums › Development › Use R in trading
Tagged: programming, R, Script, Statistics, trading
 This topic has 33 replies, 5 voices, and was last updated 8 years, 4 months ago by simplex.

AuthorPosts

December 7, 2015 at 11:23 am #10014
Hello traders,
this thread is dedicated to all of you who are interested in the use of the statistical programming language R. This powerful language is quite popular among scientists of different research fields, especially to perform some statistics on their data sets. Nevertheless, there are also some applications R can be used for in trading.
I initiate this thread as I have been asked to give a general introduction into R. As I’m not a fulltime trader, I have a day job and thus time I can contribute here is limited. Thus I’ll only give the most important functions that should help everybody to start with learning R. Therefore you have to do your homework (reading. practicing, contributions here, questions) if you want to grasp all the sweets.
To make it easier for everybody, I invite every member of penguintraders that a familiar with R to give further support and to write further introductory posts.
In the course of this thread following topics will be discussed:
 Sources
 Basic commands
 Graphics
 Basic statistics (linear models)
 Scripting
If there’s enough interest, I think there are many other topics to focus on.
Table of contents
(will be complemented in due course)
 Installing R and RStudio
 Getting started with R
 Input of R commands
 Working with variables
 R as overgrown calculator – Simple mathematical operations and functions
 Superior data structures
 Indexing
 Import and export data
 Use of packages
 Help
 Obtain FX and stock data from internet sources
 Graphics
 Scatterplots
 Histogram
 Graphical representation of FX/stock data
 Testing hypotheses
 Fundamental statistical principles
 Test for normality (normal distribution)
 Test for variances
 Examining relationships between two or more variables
 Correlation
 Regression
 Simple linear regression
 Multiple linear regression
 Nonlinear regression
 Autocorrelation
 Crosscorrelation
 Writing scripts (examples)
 This topic was modified 8 years, 7 months ago by Anti.
 This topic was modified 8 years, 7 months ago by Anti.
 This topic was modified 8 years, 7 months ago by Anti.
 This topic was modified 8 years, 7 months ago by Anti.
 This topic was modified 8 years, 7 months ago by Anti.
 This topic was modified 8 years, 7 months ago by Anti.
 This topic was modified 8 years, 7 months ago by Anti.
 This topic was modified 8 years, 7 months ago by Anti.
 This topic was modified 8 years, 7 months ago by Anti.
 This topic was modified 8 years, 7 months ago by Anti.
December 7, 2015 at 11:42 am #10015I initiate this thread as I have been asked to give a general introduction into R.
I’m glad you did it! Thanks a lot for taking the lead with introducing R.
Installed R and RStudio 5 minutes ago.
Subscribed!
s.
A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top and bottomfishing to people on an ego trip. (Dr. Alexander Elder)
December 7, 2015 at 11:56 am #10016INSTALLING R AND RSTUDIO
To start, please download and install R from https://cran.rproject.org/. There are versions for common OS (Linux, Mac, Windows). If installation has been completed you can start the RGui (graphical user interface). It should look like this:
This is all what you need.
But the RGui is not as handy as I like it. Thus, I use another R editor. It’s called RStudio. Please download also the free open source edition of RStudio Desktop. During the installation you will be asked to link RStudio with the R software downloaded previously. When installation has been finished, please open RStudio.
In comparison to the simple R console, you’ll now see a split screen with four areas:
 script editor/data set prompt
 console (simple editor)
 data import/storage
 prompt for function help files, plots, etc.
Please ask if you have some problems during installation. Will wait some days until continuation …
 This reply was modified 8 years, 7 months ago by Anti.
December 7, 2015 at 12:05 pm #10019Thank you, @simplex. I’ll wait one or two days as I hope that there are other interested persons. But time will be used to prepare next sessions …
December 8, 2015 at 9:12 am #10057GETTING STARTED WITH R
Input of R commands
As shown in the end of last chapter there are different ways to input commands into R. You can either use the prompt (console) or the R script to execute some commands. For exercises of the first lectures the prompt is all you need. Nevertheless, it doesn’t matter if you use scripts right from the start.
Now start RStudio. The prompt/console will be ready for use if
>
is shown as first symbol in front of the cursor. All commands now typed in the prompt can be executed by simply pressingenter
. Results of calculations and outputs of commands will be shown in prompt in next line/s. Sometimes, if you made a mistake (e.g. if you used a nondefined variable, a wrong function name, use of false syntax, etc.) the output will be an error which gives you some info on what went wrong. If a function haven’t been completed, the next line starts with+
. The most common error which causes+
is a forgotten bracket in code. If so, you can fix it by just completing expression behindx
.If you made such a mistake, you don’t have to type in all previous code again. In the R console you can access former lines of code and navigate it by using the up/down arrow keys of your keyboard.
Compared to the input of commands using the console, the advantage of R scripts is that you can save all lines of your code and access it in later sessions after loading saved scripts. When you open RStudio for the first time, there’s probably no R code subwindow (no. 1 in first lecture) visible. In that case you can create one by executing
Ctrl + Shift + N
. In R scripts you can execute a single line of code by navigating to it with the cursor and pressingCtrl + Enter
orCtrl + R
. To execute multiple lines, you first have to select it before execution.Working with variables
You can store particular values in variables and acces them later by calling the variable names. This assignment is done by
<
. For instance, if we want to save the value1.53123
in a variable called “open
“, we have to type> open
in console. When we now call the variable name
open
R returns> open [1] 1.53123
R is case sensitive. It means that you can introduce two variable names which differ only due to the use of lower and upper cases. Thus,
open
andOpen
are treated as two different variables. The name of variables can contain most combination of letters, numers, and periods (.):`> Open.1 < 1.53123
> h1gh < 1.53166
> l0W < 1.52997
> 13.18Close < 1.53401`are all valid variables, but
`> 1</p>
<p style=”textalign: left;”>returns the error</p>
<p style=”textalign: left;”>`Error in 1as 1 is recognized as digit and not as name of a variable.
R as overgrown calculator – Simple mathematical operations and functions
In R, the basic arithmetic functions can be accessed in similar way as in most other programming languages, too. They are

<li style=”textalign: left;”>addition (e.g.
 subtraction (e.g.
2  3
),  multiplication (e.g.
2 * 3
), and  division (e.g.
2/3
).
2 + 3
),Other useful functions are listed below:
 roots
 square root:
sqrt(x)
 other roots:
x^(1/n)
for then
th root ofx
 square root:
 exponentiation
 of any numbers:
a^b
ora**b
with base a and exponent b.  to Euler number as bas:
exp(b)
with Euler numbere = 2.718
as base andb
as exponent.
 of any numbers:
 logarithmization
 logarithm of
c
to basea
:log(c,a)
 natural logarithm of
c
with basee = 2.718
:log(c)
log10(c)
log2(c)
 logarithm of
 sine functions
 sine:
sin(x)
 cosine:
cos(x)
 tangent:
tan(x)
 sine:
 absolute value (
=  x 
):abs(x)
Please use brackets to give more complex calculations are clear structure. Futhermore, R will calculate all results regarding usual calculation rules (i.e. commutative law, distributive law).
 This reply was modified 8 years, 7 months ago by Anti.
 This reply was modified 8 years, 7 months ago by Anti.
 This reply was modified 8 years, 7 months ago by Anti.
 This reply was modified 8 years, 7 months ago by Anti.
 This reply was modified 8 years, 7 months ago by Anti.
 This reply was modified 8 years, 7 months ago by Anti.
 This reply was modified 8 years, 7 months ago by Anti.
 This reply was modified 8 years, 7 months ago by Anti.
 This reply was modified 8 years, 7 months ago by Anti.
 This reply was modified 8 years, 7 months ago by Anti.
 This reply was modified 8 years, 7 months ago by Anti.
 This reply was modified 8 years, 7 months ago by Anti.
December 8, 2015 at 7:55 pm #10097Wow @Anti!
You’re really making some efforts to get us started with R – thanks a lot! Hopefully I won’t be your only student at last.
When doing my homework (see screenshot) I had one problem: after keying in
x<(3,4)
according to your 1st screenshot I only got errors. I interpreted this as an array (comma separated).x<(3.4)
worked as a scalar.When testing some simple cases, I was amazed to get
sin(pi) = 1.224606e16
andsin(0) = 0
and256.000001 ^ (1/8) = 2
.s.
Attachments:
You must be logged in to view attached files.A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top and bottomfishing to people on an ego trip. (Dr. Alexander Elder)
December 8, 2015 at 8:20 pm #10099Hopefully I won’t be your only student at last.
At the moment you are. But maybe in due course some others will have some questions, too.
after keying in
x<(3,4)
according to your 1st screenshot I only got errors.Whenever you will save more than two values in a variable (thus it becomes a vector), you have to type
c
(concatenate) before the vector’s entries. Thus, you have to use `xI interpreted this as an array (comma separated).
x<(3.4)
worked as a scalar.Yes, you’re right. I forgot to mention it.
When testing some simple cases, I was amazed to get
sin(pi) = 1.224606e16
Yes, R knows some predefined numbers.
and
sin(0) = 0
Why not?!
 This reply was modified 8 years, 7 months ago by Anti.
December 8, 2015 at 10:39 pm #10106Why not?!
sin(0) = 0
is fine for me, but I expectedsin(pi) = 0
and256.000001 ^ (1 / 8 ) = 2,000000001
. I was amazed that sin(0) is different from sin(pi).A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top and bottomfishing to people on an ego trip. (Dr. Alexander Elder)
December 9, 2015 at 8:24 am #10119Mysterious R. AFAIK this is due to some rounding errors that are associated with use of floating point numbers in R.
December 9, 2015 at 10:05 am #10121Yes, that’s just what I wanted to point out:
rounding errors that are associated with use of floating point numbers in R
Several years ago, during an IT project for an large company, one guy calculated a polynomial regression based on a large data sample with SPSS. I doubted his results, because the nature of his resulting curves just did not fit any expected behaviour.
So I ran a VBA coded regression in Excel (not Excel standard regression), with higher precision, and the curves were totally different, and they worked.
My VBA code just was based on a higher number of digits for the resulting polynomial coefficients. SPSS cut them after 6 digits or so. Default R behaviour looks similar to SPSS, imo.
So I would conclude that we always should be aware of possible rounding errors. I’m sure standard precision will be sufficient in most cases, but not all.
A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top and bottomfishing to people on an ego trip. (Dr. Alexander Elder)
December 9, 2015 at 10:35 am #10122Well. Think the solution to
sin(pi) != 0
is that pi as irrational number can’t be saved exactly in a numerical form. To minimize rounding errors, you can change R settings from the standardoptions(digits = 0)
to
options(digits = 22)
 This reply was modified 8 years, 7 months ago by Anti.
December 10, 2015 at 5:01 pm #10172Superior data structures
Using vectors
To store more than just one value within a variable, you can construct vector variables with the command
c()
(c: concatenate) in which two values are separated using comma:`> x < c(1.53123,1.53166,1.52997,1.53401)
> x
[1] 1.53123 1.53166 1.52997 1.53401`You can also apply basic functions on numerical values within vectors like
exp(x)
which returns a vector of same length with results of calculations for each entry. And again, we can also use basic arithmetic calculations to add, subtract, multiply or divide two or more vectors as long as the vector save the same lengths.Now, as we have more than just one numerical value to work with, I’ll introduce basic functions for the calculation of sample parameters like
 mean (
mean(x)
),  variance (
var(x)
), and  standard deviation (
sd(x)
).
If you plan to use some numerical vectors that contain simple series like
1 2 3 4 5
or0.0 0.1 0.2 0.3 0.4 0.5
, for instance to generate sequential numbers, you can just use1:5
,seq(1:5)
(cf. first example), orseq(0,0.5,0.1)
(cf. second example).Nevertheless, vectors can also be used to store some character strings, for instance the names of different currency pairs. To achieve it we use the concatenate command again, and set the strings into quotation marks:
> c("EURUSD","GBPUSD","EURGBP") [1] "EURUSD" "GBPUSD" "EURGBP)
Another kind of vectors which can be sometimes useful are logical vectors that contain
TRUE
andFALSE
:> c(TRUE,FALSE,TRUE) [1] TRUE FALSE TRUE
Instead of writing
TRUE
andFALSE
we can also useT
andF
:c(T,F,T) [1] TRUE FALSE TRUE
Twodimensional arrays
In most cases we don’t analyze only onedimensional data (vectors). Instead we usually use multivariate data that is stored using twodimensional arrays. Best known representatives of such arrays is called m x n matrix in which m gives the number of rows and n the number of columns. We can create matrices just by binding together two or more vectors. To do so we use the commands
rbind(x,y)
orcolbind(x,y)
. Example:> x < c(1.53123,1.53166,1.52997,1.53401) > y < c(1.53400,1.53597,1.32570,1.53580) > X < rbind(x,y) > X x 1.53123 1.53166 1.52997 1.53401 y 1.53400 1.53597 1.32570 1.53580
To check if an object is a mtrix we can type
is.matrix(x)
. If the output isTRUE
, the objectx
is a matrix.Beside matrices there are other important twodimensional arrays in R: data frames and tables. As some R functions can’t handle all those types, we have to transform it into each other. This can be done by applying the functions
as.table(x)
,as.data.frame
, oras.matrix
on the objectx
. And to check if our “transformation” was successfull, we can apply similarlyis.table(x)
,is.data.frame(x), and
is.matrix`.Some data sets like the exported prize history from MT4 we will use don’t have column and/or row names. To don’t choose and work with wrong data of big data sets is is recommended to give each variable clear and unique names. We can do this with the commands
rownames(x) < rnames
andcolnames(x) < cnames
. With this command now we can label the matrixX
from previous example with “Open”, “High”, “Low”, and “Close”:December 11, 2015 at 11:54 am #10231Great job, @Anti! I really appreciate your efforts in setting this tutorial up.
Different views on data are interesting. The table representation seems to be similar to what we’re used to from working with relational databases.
What I did not understand yet is why there’s a differentiation between matrices and data frames. Ok, there are certain functions that require a certain representation of data, maybe to be discussed later. But why was R built this way? Just remnants of version history that will be overcome in some future build, or some deeper sense behind it?
BTW: I like RStudio! A simple and clean GUI. Thanks for suggesting this one.
Attachments:
You must be logged in to view attached files.A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top and bottomfishing to people on an ego trip. (Dr. Alexander Elder)
December 11, 2015 at 1:36 pm #10237Hi simplex, unfortunately I also don’t know it. Maybe I have to read more in detail.
December 11, 2015 at 3:24 pm #10241Hi Anti,
I don’t think it’s so important. Practice matters to get started, and your lessons are wonderful to provide an easy start! Background knowledge will grow as time passes by.
A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top and bottomfishing to people on an ego trip. (Dr. Alexander Elder)
December 11, 2015 at 3:27 pm #10242Thank’s for warm words …
December 12, 2015 at 3:52 am #10257Hello traders, this thread is dedicated to all of you who are interested in the use of the statistical programming language R. This powerful language is quite popular among scientists of different research fields, especially to perform some statistics on their data sets. Nevertheless, there are also some applications R can be used for in trading. I initiate this thread as I have been asked to give a general introduction into R. As I’m not a fulltime trader, I have a day job and thus time I can contribute here is limited. Thus I’ll only give the most important functions that should help everybody to start with learning R. Therefore you have to do your homework (reading. practicing, contributions here, questions) if you want to grasp all the sweets. To make it easier for everybody, I invite every member of penguintraders that a familiar with R to give further support and to write further introductory posts. In the course of this thread following topics will be discussed:
 Sources
 Basic commands
 Graphics
 Basic statistics (linear models)
 Scripting
If there’s enough interest, I think there are many other topics to focus on.
 Installing R and RStudio
 Getting started with R
 Input of R commands
 Working with variables
 R as overgrown calculator – Simple mathematical operations and functions
 Superior data structures
 Indexing
 Import and export data
 Use of packages
 Help
 Obtain FX and stock data from internet sources
 Graphics
 Scatterplots
 Histogram
 Graphical representation of FX/stock data
 Testing hypotheses
 Fundamental statistical principles
 Test for normality (normal distribution)
 Test for variances
 Examining relationships between two or more variables
 Correlation
 Regression
 Simple linear regression
 Multiple linear regression
 Nonlinear regression
 Autocorrelation
 Crosscorrelation
 Writing scripts (examples)
Thanks Anti. I know R last year from a good friend, good coder and good trader. He combines R with Python on Ninja (so far i can remember). But i am a slow lerner, and got some hone works to be done in Mql. I am interested in R, but not contribute yet, may be later. I will follow with interest … one question: can we really use R and combine it with Mql4 ? Tq
December 12, 2015 at 7:48 am #10260Hi @smallcat,
I found this one. It’s a bit older and I don’t know whether it works on latest R / MT4 builds. Also some sample code in that thread.
s.
A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top and bottomfishing to people on an ego trip. (Dr. Alexander Elder)
December 12, 2015 at 8:11 am #10261And this one seems to be interesting.
s.
A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top and bottomfishing to people on an ego trip. (Dr. Alexander Elder)
December 12, 2015 at 8:21 am #10263December 12, 2015 at 9:40 am #10267Welcome, Smallcat & GG!
@gg53: Would you later on please discuss it and write a small “chapter” on how to set up and use it?! Would be great!December 12, 2015 at 12:43 pm #10270December 12, 2015 at 3:31 pm #10274Indexing
If we want to refer to or extract some elements within a vector, data frame, matrix, or table, we can use
[]
or the$
sign. The application of both is discussed below.First let’s assume that we work with a vector
x
. To obtain thei
th element from that vector, we simply have to typex
:`x < 1:10
x[2]
[1] 2`Now let’s assume that we will know the
i
th to thej
th values withinx
. To get it we usex
`x[2:4]
[1] 2 3 4`If we don’t need some entries within
x
, we can use squared brackets indexing for data manipulation, too. For instance, if we want to delete thei
th to thej
th element withinx
, we usex[(i:j)]
:`x[(3:7)]
[1] 1 2 8 9 10`In a similar way we can call the the entry of the
m
th row andn
th column of a twodimensional array using x[m,n], e.g.To access entries or a single row or column we can use
x[m,]
for them
th row orx[,n]
forn
th column. To select values from more than one row/column at once, usex[,5:7]
for the 5th, 6th, and 7th column orx[c(1,3,5),]
for the 1st, 3rd, and 5th row ofx
.If you work with data frames whose variables (columns) are named, you can get the values of one of this columns typing
x$name
. This is shown below where previous example is continued. I started with transformation of previously created matrix to data frame:Sometimes we only need some data of one variable of big data sets for which another variable fulfills some conditions. Those data are obtained with help of the comparison operators < (less than), > (greater than), == (equal to), <= (less than or equal to), >= (greater than or equal to), and != (not equal to). In R we can also combine such conditions with & (and), / (or), and ! (not) to select for multiple conditions. To practice the use of indexing in a more reallife example I suggest to use some builtin data sets in R. To see a list of available data sets use the function
data()
.Let’s choose “EuStockMarkets”. We can just call it using
EuStockMarkets
. To obtain compact info on variables of this data set we can usesummary(EuStockMarkets)
. Now you can see that the data sets contains four variables (columns) with the name DAX, SMI, CAC, and FTSE. To further obtain the dimensions of the “EuStockMarkets” data set trydim(EuStockMarkets)
.`dim(EuStockMarkets)
[1] 1860 4`Here, the first value gives the number of rows and the second value the number of columns (variables).
Now let’s search for all DAX daily closing prices for which corresponding SMI and CAC values are greater than 4,000 but where FTSE is smaller than 5,700. As we will use variable names for our search we first have to transform the data set (matrix) into a data frame:
`> EuStockMarkets < as.data.frame(EuStockMarkets)
> EuStockMarkets$DAX[EuStockMarkets$SMI>4000 & EuStockMarkets$CAC>4000 $ EuStockMarkets$FTSE<5700]
[1] 5598.32`Thus, the “EuStockMarkets” data set contains only one DAX value for which the conditions are fulfilled.
If you just work with one data set you can save much time by attaching the name of the data set.
> attach(EuStockMarkets)
Now you can just use the variable names without writing complete data set names and referencing with
$
. Thus, for the previous example you only have to type`DAX[SMI>4000 & CAC>4000 & FTSE<5700]
[1] 5598.32`.If you’ll later on work with another data set, you can simple detach old data set with
detach()
.December 12, 2015 at 6:02 pm #10284Just realized that some expressions written as code are not shown completely (e.g. x[m,n]). Fixed it by leaving the ‘ … ‘ away. But maybe there are other expressions that are not fixed and which I didn’t see. If so please post/ask here …
December 14, 2015 at 8:37 pm #10360Thanks again, @Anti!
I had saved the workspace from this lesson, so I had the matrix, table and dataframe at hand. When accessing the table in the same way as the matrix and the dataframe, I expected an error message. But instead the same cell was accessed, obviously the table was implicitly pivoted.
The syntax used isn’t similar to anything I’ve used until now, but I think I’m getting used to it.
s.
A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top and bottomfishing to people on an ego trip. (Dr. Alexander Elder)

AuthorPosts
 You must be logged in to reply to this topic.