Algocrowd / How can an algorithm be tested?

An algorithm works, if following all of its instructions in the right order leads to the desired result. Example: The recipe for spaghetti carbonara works, if following its instructions in the right order will result in proper spaghetti carbonara. Hence, algorithms are a type of theories. They tell us: If you do this and then this and then that, you will get this or that result.

According to theory of science, there are basically two types of theories:

Empirical theories: theories about objects that we can perceive and observe with our senses, such as theories about how the human body works or how animals behave.
Analytical theories: theories about objects that we can not perceive with our senses, but which are, somehow (we do not exactly know how) accessible to us, such as theories about the objects of mathematics and logic (numbers, sets, triangles, ...)

Analytical theories can be tested, i.e. verified or falsified by pure reasoning. Mathematicians and logicians do not observe any objects with their senses. They do not use devices, such as microscopes, to observe the objects their theories are about. They just think about their objects and write down the results of their thinking.

Empirical theories, on the other hand, can not be tested by reasoning alone. With the objects of empirical theories, such as, say, the behavior of squirrels, reasoning may help to a certain degree, but it can never be sufficient to prove or disprove anything. Empirical theories are tested with observations, which include experiments. The more a theory is supported by the observations made, the more it can be trusted.

Let us dig a little deeper into this. Consider the following question:

Do squirrels eat tuna?

What type of theory is needed to properly answer this question? Can we answer it by just thinking about squirrels? Obviously not. We have to do experiments. We have to observe squirrels.

Of course, just observing some squirrels in some way will not be enough to properly answer the question. A lot can go wrong with our observations. We might, for example, only observe three squirrels during four days in the woods of central Siberia. Three squirrels are not enough to be statistically relevant. Four days is too short a period of time. And there is no tuna in the woods of central Siberia. Hence, observations of this kind will not be scientifically valid.

So, while observing the behavior of squirrels, and that includes, experimenting with squirrels by offering them tuna, is the way to go, observations alone will not do the job. We also need some understanding of how empirical hypotheses such as the hypothesis that squirrels eat tuna, are tested in a scientifically valid way.

How does all of this apply to trading? Let us look again at the trading algorithm from the last section:

If the price of gold falls three days in a row, buy gold.

Is this an empirical problem or an analytical problem? Obviously, it is just like the squirrel problem: The price of gold depends on the behavior of the other traders, i.e. market participants. When a lot of market participants buy gold, the price rises. When a lot of market participants sell gold, the price falls.

Why could the above trading algorithm work?

Well, there might be a countermovement. When the price of gold falls three days in a row, some people, maybe a lot of people, might think: "Gold has become cheaper. This is an opportunity to buy." And when a lot of people buy something, its price rises. Obviously, the algorithm might work for other "securities" as well, for the very same reason. (In trading, that which is bought or sold, such as a stock, a currency, a commodity, and so on, is often called a "security".)

However, as this is an empirical question, reasoning alone will not be sufficient. We have to test our hypothesis, which happens to be a trading algorithm, by making observations. Obviously, we will not observe physical traders on a physical market place directly, as we would observe physical squirrels in a physical wood directly. (This might have been the only possibility, though, say, 500 years ago.) We will only observe some data traces of the behavior of real physical traders.

You can easily test the above trading algorithm yourself without knowing anything about trading: Just search the internet for historical gold prices, such as, say, the prices of gold in the last three months, and look at all series of three days in which the price of gold rose in a row. Suppose you buy gold on the fourth day in each of these events. And suppose you sell your gold on the fifth day. Write down all profits as positive numbers and all losses as negative numbers and sum up all these numbers. If the sum is a positive number, then you would have made profit with the algorithm. If the sum is a negative number, then you would have lost money with the algorithm.

Several observations can be made with respect to the last paragraph: First, the size of your data sets matters. If you only test the trading algorithm on three months' worth of data, it will not be a scientifically valid test. (This would be like just observing three squirrels.) Secondly, we silently passed over that fact that each trade comes with certain costs. To get accurate results we would have to account for these costs in our backtests. (Tests of algorithms on historical data are called "backtests".) Thirdly, it turns out that the original definition of our trading algorithm was rather vague: It didn't say anything about what to do with the bought gold. Some options:

keep it forever
sell it on the fifths day
sell it on the sixths day
sell it one month after the purchase
sell it when the gold price is 5% higher than at buy time
sell it when the gold price is 10% higher than at buy time
...

Obviously, there is an unlimited number of options here. And with each option you get a different algorithm that will yield different trading results. Another obvious way to find variants of the above algorithm is to try out other periods of time in which the price of gold is supposed to rise. Why not buy gold after series of 5 days in which the price falls? Or after 3 months or 10 minutes? And what about selling? Why not sell gold after its price has risen three days in a row?

It is easy to see that it would be a lot of work to try out all these possibilities manually, especially on larger data sets. As we live in the era of computers, it would probably be a lot better to let them do this work for us. How could this be done? We will look into this question immediately. But, before we do that, let us ask ourselves if our way of backtesting our trading strategy can really be trusted.