Forums Development Use R in trading

Viewing 25 posts - 1 through 25 (of 34 total)
  • Author
    Posts
  • #10014
    Anti
    Participant

      Hello traders,

      this thread is dedicated to all of you who are interested in the use of the statistical programming language R. This powerful language is quite popular among scientists of different research fields, especially to perform some statistics on their data sets. Nevertheless, there are also some applications R can be used for in trading.

      I initiate this thread as I have been asked to give a general introduction into R. As I’m not a full-time trader, I have a day job and thus time I can contribute here is limited. Thus I’ll only give the most important functions that should help everybody to start with learning R. Therefore you have to do your homework (reading. practicing, contributions here, questions) if you want to grasp all the sweets.

      To make it easier for everybody, I invite every member of penguintraders that a familiar with R to give further support and to write further introductory posts.

      In the course of this thread following topics will be discussed:

      • Sources
      • Basic commands
      • Graphics
      • Basic statistics (linear models)
      • Scripting

      If there’s enough interest, I think there are many other topics to focus on.
                                                                                                                                                                    

      Table of contents

      (will be complemented in due course)

                                                                                                                                                                     

      • This topic was modified 8 years, 7 months ago by Anti.
      • This topic was modified 8 years, 7 months ago by Anti.
      • This topic was modified 8 years, 7 months ago by Anti.
      • This topic was modified 8 years, 7 months ago by Anti.
      • This topic was modified 8 years, 7 months ago by Anti.
      • This topic was modified 8 years, 7 months ago by Anti.
      • This topic was modified 8 years, 7 months ago by Anti.
      • This topic was modified 8 years, 7 months ago by Anti.
      • This topic was modified 8 years, 7 months ago by Anti.
      • This topic was modified 8 years, 7 months ago by Anti.
      #10015
      simplex
      Moderator

        @Anti:

        I initiate this thread as I have been asked to give a general introduction into R.

        I’m glad you did it! Thanks a lot for taking the lead with introducing R.

        Installed R and RStudio 5 minutes ago.

        Subscribed!

        s.

        A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top- and bottom-fishing to people on an ego trip. (Dr. Alexander Elder)

        #10016
        Anti
        Participant

          INSTALLING R AND RSTUDIO

          To start, please download and install R from https://cran.r-project.org/. There are versions for common OS (Linux, Mac, Windows). If installation has been completed you can start the RGui (graphical user interface). It should look like this:

          R graphical user interface

          This is all what you need.

          But the RGui is not as handy as I like it. Thus, I use another R editor. It’s called RStudio. Please download also the free open source edition of RStudio Desktop. During the installation you will be asked to link RStudio with the R software downloaded previously. When installation has been finished, please open RStudio.

          RStudio

          In comparison to the simple R console, you’ll now see a split screen with four areas:

          1. script editor/data set prompt
          2. console (simple editor)
          3. data import/storage
          4. prompt for function help files, plots, etc.

          Please ask if you have some problems during installation. Will wait some days until continuation …

           

          • This reply was modified 8 years, 7 months ago by Anti.
          #10019
          Anti
          Participant

            Thank you, @simplex. I’ll wait one or two days as I hope that there are other interested persons. But time will be used to prepare next sessions …

            #10057
            Anti
            Participant

              GETTING STARTED WITH R

              Input of R commands

              As shown in the end of last chapter there are different ways to input commands into R. You can either use the prompt (console) or the R script to execute some commands. For exercises of the first lectures the prompt is all you need. Nevertheless, it doesn’t matter if you use scripts right from the start.

              Now start RStudio. The prompt/console will be ready for use if > is shown as first symbol in front of the cursor. All commands now typed in the prompt can be executed by simply pressing enter. Results of calculations and outputs of commands will be shown in prompt in next line/s. Sometimes, if you made a mistake (e.g. if you used a non-defined variable, a wrong function name, use of false syntax, etc.) the output will be an error which gives you some info on what went wrong. If a function haven’t been completed, the next line starts with +. The most common error which causes + is a forgotten bracket in code. If so, you can fix it by just completing expression behind x.

              uncompleted expression causing next line starting with +

              If you made such a mistake, you don’t have to type in all previous code again. In the R console you can access former lines of code and navigate it by using the up/down arrow keys of your keyboard.

              Compared to the input of commands using the console, the advantage of R scripts is that you can save all lines of your code and access it in later sessions after loading saved scripts. When you open RStudio for the first time, there’s probably no R code subwindow (no. 1 in  first lecture) visible. In that case you can create one by executing Ctrl + Shift + N. In R scripts you can execute a single line of code by navigating to it with the cursor and pressing Ctrl + Enter or Ctrl + R. To execute multiple lines, you first have to select it before execution.

                            

              Working with variables

              You can store particular values in variables and acces them later by calling the variable names. This assignment is done by <-. For instance, if we want to save the value 1.53123 in a variable called “open“, we have to type

              > open

              in console. When we now call the variable name open R returns

              > open
              [1] 1.53123

              R is case sensitive. It means that you can introduce two variable names which differ only due to the use of lower and upper cases. Thus, open and Open are treated as two different variables. The name of variables can contain most combination of letters, numers, and periods (.):

              `> Open.1 <- 1.53123
              > h1gh <- 1.53166
              > l0W <- 1.52997
              > 13.18Close <- 1.53401`

              are all valid variables, but

              `> 1</p>
              <p style=”text-align: left;”>returns the error</p>
              <p style=”text-align: left;”>`Error in 1

              as 1 is recognized as digit and not as name of a variable.

                             

              R as overgrown calculator – Simple mathematical operations and functions

              In R, the basic arithmetic functions can be accessed in similar way as in most other programming languages, too. They are

                <li style=”text-align: left;”>addition (e.g. 2 + 3),

              • subtraction (e.g. 2 - 3),
              • multiplication (e.g. 2 * 3), and
              • division (e.g. 2/3).

              Other useful functions are listed below:

              • roots
                • square root: sqrt(x)
                • other roots: x^(1/n) for the n-th root of x
              • exponentiation
                • of any numbers: a^b or a**b with base a and exponent b.
                • to Euler number as bas: exp(b)with Euler number e = 2.718 as base and b as exponent.
              • logarithmization
                • logarithm of c to base a: log(c,a)
                • natural logarithm of c with base e = 2.718: log(c)
                • log10(c)
                • log2(c)
              • sine functions
                • sine: sin(x)
                • cosine: cos(x)
                • tangent: tan(x)
              • absolute value (= | x |): abs(x)

              Please use brackets to give more complex calculations are clear structure. Futhermore, R will calculate all results regarding usual calculation rules (i.e. commutative law, distributive law).

              • This reply was modified 8 years, 7 months ago by Anti.
              • This reply was modified 8 years, 7 months ago by Anti.
              • This reply was modified 8 years, 7 months ago by Anti.
              • This reply was modified 8 years, 7 months ago by Anti.
              • This reply was modified 8 years, 7 months ago by Anti.
              • This reply was modified 8 years, 7 months ago by Anti.
              • This reply was modified 8 years, 7 months ago by Anti.
              • This reply was modified 8 years, 7 months ago by Anti.
              • This reply was modified 8 years, 7 months ago by Anti.
              • This reply was modified 8 years, 7 months ago by Anti.
              • This reply was modified 8 years, 7 months ago by Anti.
              • This reply was modified 8 years, 7 months ago by Anti.
              #10097
              simplex
              Moderator

                Wow @Anti!

                You’re really making some efforts to get us started with R – thanks a lot!  :good:   Hopefully I won’t be your only student at last.

                When doing my homework (see screenshot) I had one problem: after keying in x<-(3,4) according to your 1st screenshot I only got errors. I interpreted this as an array (comma separated). x<-(3.4) worked as a scalar.

                When testing some simple cases, I was amazed to get sin(pi) = 1.224606e-16 and sin(0) = 0  and 256.000001 ^ (1/8) = 2.

                s.

                Attachments:
                You must be logged in to view attached files.

                A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top- and bottom-fishing to people on an ego trip. (Dr. Alexander Elder)

                #10099
                Anti
                Participant

                  Hopefully I won’t be your only student at last.

                  At the moment you are. But maybe in due course some others will have some questions, too.

                  after keying in x<-(3,4) according to your 1st screenshot I only got errors.

                  Whenever you will save more than two values in a variable (thus it becomes a vector), you have to type c (concatenate) before the vector’s entries. Thus, you have to use `x

                  I interpreted this as an array (comma separated). x<-(3.4) worked as a scalar.

                  Yes, you’re right. I forgot to mention it.

                  When testing some simple cases, I was amazed to get sin(pi) = 1.224606e-16

                  Yes, R knows some predefined numbers.

                  and sin(0) = 0

                  Why not?!

                  • This reply was modified 8 years, 7 months ago by Anti.
                  #10106
                  simplex
                  Moderator

                    Why not?!

                    sin(0) = 0 is fine for me, but I expected sin(pi) = 0 and 256.000001 ^ (1 / 8 ) = 2,000000001. I was amazed that sin(0) is different from sin(pi).

                    A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top- and bottom-fishing to people on an ego trip. (Dr. Alexander Elder)

                    #10119
                    Anti
                    Participant

                      Mysterious R. AFAIK this is due to some rounding errors that are associated with use of floating point numbers in R.

                      #10121
                      simplex
                      Moderator

                        Yes, that’s just what I wanted to point out:

                        rounding errors that are associated with use of floating point numbers in R

                        Several years ago, during an IT project for an large company, one guy calculated a polynomial regression based on a large data sample with SPSS. I doubted his results, because the nature of his resulting curves just did not fit any expected behaviour.

                        So I ran a VBA coded regression in Excel (not Excel standard regression), with higher precision, and the curves were totally different, and they worked.

                        My VBA code just was based on a higher number of digits for the resulting polynomial coefficients. SPSS cut them after 6 digits or so. Default R behaviour looks similar to SPSS, imo.

                        So I would conclude that we always should be aware of possible rounding errors. I’m sure standard precision will be sufficient in most cases, but not all.

                        A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top- and bottom-fishing to people on an ego trip. (Dr. Alexander Elder)

                        #10122
                        Anti
                        Participant

                          Well. Think the solution to sin(pi) != 0 is that pi as irrational number can’t be saved exactly in a numerical form. To minimize rounding errors, you can change R settings from the standard

                          options(digits = 0)

                          to

                          options(digits = 22)

                          • This reply was modified 8 years, 7 months ago by Anti.
                          #10172
                          Anti
                          Participant

                            Superior data structures

                            Using vectors

                            To store more than just one value within a variable, you can construct vector variables with the command c() (c: concatenate) in which two values are separated using comma:

                            `> x <- c(1.53123,1.53166,1.52997,1.53401)
                            > x
                            [1] 1.53123 1.53166 1.52997 1.53401`

                            You can also apply basic functions on numerical values within vectors like exp(x) which returns a vector of same length with results of calculations for each entry. And again, we can also use basic arithmetic calculations to add, subtract, multiply or divide two or more vectors as long as the vector save the same lengths.

                            Now, as we have more than just one numerical value to work with, I’ll introduce basic functions for the calculation of sample parameters like

                            • mean (mean(x)),
                            • variance (var(x)), and
                            • standard deviation (sd(x)).

                            If you plan to use some numerical vectors that contain simple series like 1 2 3 4 5 or 0.0 0.1 0.2 0.3 0.4 0.5, for instance to generate sequential numbers, you can just use 1:5, seq(1:5) (cf. first example), or seq(0,0.5,0.1) (cf. second example).

                            Nevertheless, vectors can also be used to store some character strings, for instance the names of different currency pairs. To achieve it we use the concatenate command again, and set the strings into quotation marks:

                            > c("EURUSD","GBPUSD","EURGBP")
                            [1] "EURUSD" "GBPUSD" "EURGBP)

                            Another kind of vectors which can be sometimes useful are logical vectors that contain TRUE and FALSE:

                            > c(TRUE,FALSE,TRUE)
                            [1] TRUE FALSE TRUE

                            Instead of writing TRUE and FALSE we can also use T and F:

                            c(T,F,T)
                            [1] TRUE FALSE TRUE

                                           

                            Two-dimensional arrays

                            In most cases we don’t analyze only one-dimensional data (vectors). Instead we usually use multivariate data that is stored using two-dimensional arrays. Best known representatives of such arrays is called m x n matrix in which m gives the number of rows and n the number of columns. We can create matrices just by binding together two or more vectors. To do so we use the commands rbind(x,y) or colbind(x,y). Example:

                            > x <- c(1.53123,1.53166,1.52997,1.53401)
                            > y <- c(1.53400,1.53597,1.32570,1.53580)
                            > X <- rbind(x,y)
                            > X
                            x 1.53123 1.53166 1.52997 1.53401
                            y 1.53400 1.53597 1.32570 1.53580

                            To check if an object is a mtrix we can type is.matrix(x). If the output is TRUE, the object x is a matrix.

                            Beside matrices there are other important two-dimensional arrays in R: data frames and tables. As some R functions can’t handle all those types, we have to transform it into each other. This can be done by applying the functions as.table(x), as.data.frame, or as.matrix on the object x. And to check if our “transformation” was successfull, we can apply similarly is.table(x), is.data.frame(x), andis.matrix`.

                            Some data sets like the exported prize history from MT4 we will use don’t have column and/or row names. To don’t choose and work with wrong data of big data sets is is recommended to give each variable clear and unique names. We can do this with the commands rownames(x) <- rnames and colnames(x) <- cnames. With this command now we can label the matrix X from previous example with “Open”, “High”, “Low”, and “Close”:

                            R code to label matrix X

                            • This reply was modified 8 years, 7 months ago by Anti.
                            • This reply was modified 8 years, 7 months ago by Anti.
                            • This reply was modified 8 years, 7 months ago by Anti.
                            #10231
                            simplex
                            Moderator

                              Great job, @Anti! I really appreciate your efforts in setting this tutorial up. :rose:

                              Different views on data are interesting. The table representation seems to be similar to what we’re used to from working with relational databases.

                              What I did not understand yet is why there’s a differentiation between matrices and data frames. Ok, there are certain functions that require a certain representation of data, maybe to be discussed later. But why was R built this way? Just remnants of version history that will be overcome in some future build, or some deeper sense behind it?

                              BTW: I like RStudio! A simple and clean GUI. Thanks for suggesting this one.

                              Attachments:
                              You must be logged in to view attached files.

                              A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top- and bottom-fishing to people on an ego trip. (Dr. Alexander Elder)

                              #10237
                              Anti
                              Participant

                                Hi simplex, unfortunately I also don’t know it. Maybe I have to read more in detail.

                                #10241
                                simplex
                                Moderator

                                  Hi Anti,

                                  I don’t think it’s so important. Practice matters to get started, and your lessons are wonderful to provide an easy start! Background knowledge will grow as time passes by.

                                  A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top- and bottom-fishing to people on an ego trip. (Dr. Alexander Elder)

                                  #10242
                                  Anti
                                  Participant

                                    Thank’s for warm words …

                                    #10257
                                    smallcat
                                    Participant

                                      Hello traders, this thread is dedicated to all of you who are interested in the use of the statistical programming language R. This powerful language is quite popular among scientists of different research fields, especially to perform some statistics on their data sets. Nevertheless, there are also some applications R can be used for in trading. I initiate this thread as I have been asked to give a general introduction into R. As I’m not a full-time trader, I have a day job and thus time I can contribute here is limited. Thus I’ll only give the most important functions that should help everybody to start with learning R. Therefore you have to do your homework (reading. practicing, contributions here, questions) if you want to grasp all the sweets. To make it easier for everybody, I invite every member of penguintraders that a familiar with R to give further support and to write further introductory posts. In the course of this thread following topics will be discussed:

                                      • Sources
                                      • Basic commands
                                      • Graphics
                                      • Basic statistics (linear models)
                                      • Scripting

                                      If there’s enough interest, I think there are many other topics to focus on. Table of contents (will be complemented in due course)

                                      [/quote]

                                      Thanks Anti. I know R last year from a good friend, good coder and good trader. He combines R with Python on Ninja (so far i can remember). But i am a slow lerner, and got some hone works to be done in Mql. I am interested in R, but not contribute yet, may be later. I will follow with interest … one question: can we really use R and combine it with Mql4 ? Tq

                                      #10260
                                      simplex
                                      Moderator

                                        Hi @smallcat,

                                        I found this one. It’s a bit older and I don’t know whether it works on latest R / MT4 builds. Also some sample code in that thread.

                                        s.

                                        A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top- and bottom-fishing to people on an ego trip. (Dr. Alexander Elder)

                                        #10261
                                        simplex
                                        Moderator

                                          And this one seems to be interesting.

                                          s.

                                          A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top- and bottom-fishing to people on an ego trip. (Dr. Alexander Elder)

                                          #10263
                                          gg53
                                          Participant

                                            And this one seems to be interesting. s.

                                            That’s the one I use, with some mod’s.

                                             

                                            G.

                                            #10267
                                            Anti
                                            Participant

                                              Welcome, Smallcat & GG!


                                              @gg53
                                              : Would you later on please discuss it and write a small “chapter” on how to set up and use it?! Would be great!

                                              #10270
                                              smallcat
                                              Participant

                                                Thanks to @simplex and @gg53 . will look at it …

                                                #10274
                                                Anti
                                                Participant

                                                  Indexing

                                                  If we want to refer to or extract some elements within a vector, data frame, matrix, or table, we can use [] or the $ sign. The application of both is discussed below.

                                                  First let’s assume that we work with a vector x. To obtain the i-th element from that vector, we simply have to type x:

                                                  `x <- 1:10
                                                  x[2]
                                                  [1] 2`

                                                  Now let’s assume that we will know the i-th to the j-th values within x. To get it we use x

                                                  `x[2:4]
                                                  [1] 2 3 4`

                                                  If we don’t need some entries within x, we can use squared brackets indexing for data manipulation, too. For instance, if we want to delete the i-th to the j-th element within x, we use x[-(i:j)]:

                                                  `x[-(3:7)]
                                                  [1] 1 2 8 9 10`

                                                  In a similar way we can call the the entry of the m-th row and n-th column of a two-dimensional array using x[m,n], e.g.

                                                  To access entries or a single row or column we can use x[m,] for the m-th row or x[,n] for n-th column. To select values from more than one row/column at once, use x[,5:7] for the 5th, 6th, and 7th column or x[c(1,3,5),] for the 1st, 3rd, and 5th row of x.

                                                  If you work with data frames whose variables (columns) are named, you can get the values of one of this columns typing x$name. This is shown below where previous example is continued. I started with transformation of previously created matrix to data frame:

                                                  Sometimes we only need some data of one variable of big data sets for which another variable fulfills some conditions. Those data are obtained with help of the comparison operators < (less than), > (greater than), == (equal to), <= (less than or equal to), >= (greater than or equal to), and != (not equal to). In R we can also combine such conditions with & (and), / (or), and ! (not) to select for multiple conditions. To practice the use of indexing in a more real-life example I suggest to use some built-in data sets in R. To see a list of available data sets use the function data().

                                                  selection of built-in data sets in R

                                                  Let’s choose “EuStockMarkets”. We can just call it using EuStockMarkets. To obtain compact info on variables of this data set we can use summary(EuStockMarkets). Now you can see that the data sets contains four variables (columns) with the name DAX, SMI, CAC, and FTSE. To further obtain the dimensions of the “EuStockMarkets” data set try dim(EuStockMarkets).

                                                  `dim(EuStockMarkets)
                                                  [1] 1860     4`

                                                  Here, the first value gives the number of rows and the second value the number of columns (variables).

                                                  Now let’s search for all DAX daily closing prices for which corresponding SMI and CAC values are greater than 4,000 but where FTSE is smaller than 5,700. As we will use variable names for our search we first have to transform the data set (matrix) into a data frame:

                                                  `> EuStockMarkets <- as.data.frame(EuStockMarkets)
                                                  > EuStockMarkets$DAX[EuStockMarkets$SMI>4000 & EuStockMarkets$CAC>4000 $ EuStockMarkets$FTSE<5700]
                                                  [1] 5598.32`

                                                  Thus, the “EuStockMarkets” data set contains only one DAX value for which the conditions are fulfilled.

                                                  If you just work with one data set you can save much time by attaching the name of the data set.

                                                  > attach(EuStockMarkets)

                                                  Now you can just use the variable names without writing complete data set names and referencing with $. Thus, for the previous example you only have to type

                                                  `DAX[SMI>4000 & CAC>4000 & FTSE<5700]
                                                  [1] 5598.32`.

                                                  If you’ll later on work with another data set, you can simple detach old data set with detach().

                                                  • This reply was modified 8 years, 7 months ago by Anti.
                                                  • This reply was modified 8 years, 7 months ago by Anti.
                                                  • This reply was modified 8 years, 7 months ago by Anti.
                                                  #10284
                                                  Anti
                                                  Participant

                                                    Just realized that some expressions written as code are not shown completely (e.g. x[m,n]). Fixed it by leaving the ‘ … ‘ away. But maybe there are other expressions that are not fixed and which I didn’t see. If so please post/ask here …

                                                    #10360
                                                    simplex
                                                    Moderator

                                                      Thanks again, @Anti!

                                                      I had saved the workspace from this lesson, so I had the matrix, table and dataframe at hand. When accessing the table in the same way as the matrix and the dataframe, I expected an error message. But instead the same cell was accessed, obviously the table was implicitly pivoted.

                                                      The syntax used isn’t similar to anything I’ve used until now, but I think I’m getting used to it.

                                                      s.

                                                      A good trader is a realist who wants to grab a chunk from the body of a trend, leaving top- and bottom-fishing to people on an ego trip. (Dr. Alexander Elder)

                                                    Viewing 25 posts - 1 through 25 (of 34 total)
                                                    • You must be logged in to reply to this topic.
                                                    Scroll to Top