I've been busy working on the open-source QSForex system over the past week. I've made some useful improvements and I thought I'd share them with you in this forex trading diary update.
In particular, I've made the following changes, which will be discussed at length in this entry:
- Modification to the
Positionobject to fix an error with how position openings and closings are handled
- Added historical data capability via tick data files through DukasCopy downloads
- Built the first version of an event-driven backtester based on this daily tick data
For those of you who are unfamiliar with QSForex and are coming to this forex diary series for the first time, I strongly suggest having a read of the following diary entries to get up to speed with the software:
- Forex Trading Diary #1 - Automated Forex Trading with the OANDA API
- Forex Trading Diary #2 - Adding a Portfolio to the OANDA Automated Trading System
- Forex Trading Diary #3 - Open Sourcing the Forex Trading System
As well as the Github page for QSForex:
Position Handling Error Fix
The first change I want to discuss is how the
Position object is handling buy/sell orders.
Initially I designed the
Position object to be quite lean, delegating the majority of the work of calculating position prices to the
However, this lead to needless complexity in the
Portfolio class, which I eventually realised would confuse new users of the software.
This would likely become especially problematic as I'm sure you would wish to eventually develop your own custom portfolio handling capability without having to worry about "boilerplate" position handling.
In addition I realised I was actually making a mistake because I had mixed the buying and selling of orders with having a long or short position. This meant that upon the close of a position the calculation of P&L was incorrect.
I've now modified the
Position object to accept bid and ask prices, rather than "add" and "remove" prices, which were originally determined upstream of the
Position object via the
This means that the
Position now tracks whether it is long or short upon being opened and uses the correct bid or ask price as the purchase or closing value.
I've also had to modify the unit tests to reflect the new interface. Despite the fact that these modifications take some time to complete, it provides greater confidence in the results. This is especially true when we consider more sophisticated strategies.
You can see the new
position.py file in its entirety below:
from decimal import Decimal, getcontext, ROUND_HALF_DOWN class Position(object): def __init__( self, position_type, market, units, exposure, bid, ask ): self.position_type = position_type # Long or short self.market = market self.units = units self.exposure = Decimal(str(exposure)) # Long or short if self.position_type == "long": self.avg_price = Decimal(str(ask)) self.cur_price = Decimal(str(bid)) else: self.avg_price = Decimal(str(bid)) self.cur_price = Decimal(str(ask)) self.profit_base = self.calculate_profit_base(self.exposure) self.profit_perc = self.calculate_profit_perc(self.exposure) def calculate_pips(self): getcontext.prec = 6 mult = Decimal("1") if self.position_type == "long": mult = Decimal("1") elif self.position_type == "short": mult = Decimal("-1") return (mult * (self.cur_price - self.avg_price)).quantize( Decimal("0.00001"), ROUND_HALF_DOWN ) def calculate_profit_base(self, exposure): pips = self.calculate_pips() return (pips * exposure / self.cur_price).quantize( Decimal("0.00001"), ROUND_HALF_DOWN ) def calculate_profit_perc(self, exposure): return (self.profit_base / exposure * Decimal("100.00")).quantize( Decimal("0.00001"), ROUND_HALF_DOWN ) def update_position_price(self, bid, ask, exposure): if self.position_type == "long": self.cur_price = Decimal(str(bid)) else: self.cur_price = Decimal(str(ask)) self.profit_base = self.calculate_profit_base(exposure) self.profit_perc = self.calculate_profit_perc(exposure)
As always you can find the latest version of the full code at the Github page.
Historical Tick Data Capability
The next major task in creating a useful full trading system is to have a high-frequency backtesting capability.
An essential prerequisite involves creating a data-store for currency pair tick data. Such data can become quite large. For instance, I downloaded a day's worth of tick data for a single currency pair from DukasCopy in CSV format and it came to 3.3Mb.
One can easily see that high-frequency backtesting of 20+ currency pairs, over multiple years, with significant parameter variations, can rapidly lead to gigabytes of trading data that must be ingested.
Such data eventually needs special handling, including the creation of an efficient fully-automated securities master database. We will discuss such a system in the future, but for now, daily CSV files will suffice for our needs.
In order to put the backtesting data and the live streaming data on the same footing, I have created an abstracted price handling class called
PriceHandler is an example of an abstract base class that requires any subclasses to override "pure virtual" methods.
The only mandated method is
stream_to_queue, which is called via the pricing thread when the system is activated (either live trading or backtest).
stream_to_queue takes price information, from a location that depends upon the particular class implementation, and then uses the
.put() method of the queue to add
In this way all
PriceHandler subclasses can interface with the rest of the trading system without the remaining components knowing (or caring!) how the pricing information is generated.
This gives us substantial flexibility in coupling flat-files, file-stores such HDF5, relational databases such as PostgreSQL or even external resources such as websites, to the backtesting or live trading engine.
Here is the snippet for the
from abc import ABCMeta, abstractmethod .. .. class PriceHandler(object): """ PriceHandler is an abstract base class providing an interface for all subsequent (inherited) data handlers (both live and historic). The goal of a (derived) PriceHandler object is to output a set of bid/ask/timestamp "ticks" for each currency pair and place them into an event queue. This will replicate how a live strategy would function as current tick data would be streamed via a brokerage. Thus a historic and live system will be treated identically by the rest of the QSForex backtesting suite. """ __metaclass__ = ABCMeta @abstractmethod def stream_to_queue(self): """ Streams a sequence of tick data events (timestamp, bid, ask) tuples to the events queue. """ raise NotImplementedError("Should implement stream_to_queue()")
I've created a subclass called
HistoricCSVPriceHandler, which possesses two methods.
The second method,
stream_to_queue, iterrates through this DataFrame and at each iteration adds a
TickEvent object to the events queue.
In addition the current bid and ask prices are set at the class level, which are later queried via the
Here is the listing of the
class HistoricCSVPriceHandler(PriceHandler): """ HistoricCSVPriceHandler is designed to read CSV files of tick data for each requested currency pair and stream those to the provided events queue. """ def __init__(self, pairs, events_queue, csv_dir): """ Initialises the historic data handler by requesting the location of the CSV files and a list of symbols. It will be assumed that all files are of the form 'pair.csv', where "pair" is the currency pair. For GBP/USD the filename is GBPUSD.csv. Parameters: pairs - The list of currency pairs to obtain. events_queue - The events queue to send the ticks to. csv_dir - Absolute directory path to the CSV files. """ self.pairs = pairs self.events_queue = events_queue self.csv_dir = csv_dir self.cur_bid = None self.cur_ask = None def _open_convert_csv_files(self): """ Opens the CSV files from the data directory, converting them into pandas DataFrames within a pairs dictionary. """ pair_path = os.path.join(self.csv_dir, '%s.csv' % self.pairs) self.pair = pd.io.parsers.read_csv( pair_path, header=True, index_col=0, parse_dates=True, names=("Time", "Ask", "Bid", "AskVolume", "BidVolume") ).iterrows() def stream_to_queue(self): self._open_convert_csv_files() for index, row in self.pair: self.cur_bid = Decimal(str(row["Bid"])).quantize( Decimal("0.00001", ROUND_HALF_DOWN) ) self.cur_ask = Decimal(str(row["Ask"])).quantize( Decimal("0.00001", ROUND_HALF_DOWN) ) tev = TickEvent(self.pairs, index, row["Bid"], row["Ask"]) self.events_queue.put(tev)
Now that we have a basic historic data capability, we are in a position to create a fully event-driven backtester.
Event-Driven Backtesting Capability
As I keep reiterating on QuantStart I am extremely keen on using backtesting environments that are as close as possible to the live deployment.
This is due to the fact that sophisticated transaction cost handling, especially at high frequency, is often the real determinant as to whether a strategy will be profitable or not.
Such high-frequency transaction cost handling can only really be simulated with the use of a multi-threaded event-driven execution engine.
While such a system is significantly more complicated than a basic vectorised P&L "research" backtester, it will capture a lot more of the true behaviour and allow us to make far better decisions when choosing strategies.
In addition, it means that we can iterate more rapidly as time goes on, because we won't have to continually make the transition from "research level" strategy to "implementation grade" strategy as they are the same thing.
The only two components that change are the price streaming class and the execution class. Everything else will be identical between the backtesting and live trading systems.
In fact, this means that the new
backtest.py code is almost identical to the
trading.py code that handles live or practice trading with OANDA.
All we're really changing is the import of the
HistoricPriceCSVHandler and the
SimulatedExecution classes instead of
StreamingPriceHandler and the
OANDAExecutionHandler. Everything else remains the same.
Here is the listing for
import copy import Queue import threading import time from decimal import Decimal, getcontext from qsforex.execution.execution import SimulatedExecution from qsforex.portfolio.portfolio import Portfolio from qsforex import settings from qsforex.strategy.strategy import TestStrategy from qsforex.data.price import HistoricCSVPriceHandler def trade(events, strategy, portfolio, execution, heartbeat): """ Carries out an infinite while loop that polls the events queue and directs each event to either the strategy component of the execution handler. The loop will then pause for "heartbeat" seconds and continue. """ while True: try: event = events.get(False) except Queue.Empty: pass else: if event is not None: if event.type == 'TICK': strategy.calculate_signals(event) elif event.type == 'SIGNAL': portfolio.execute_signal(event) elif event.type == 'ORDER': execution.execute_order(event) time.sleep(heartbeat) if __name__ == "__main__": # Set the number of decimal places to 2 getcontext().prec = 2 heartbeat = 0.0 # Half a second between polling events = Queue.Queue() equity = settings.EQUITY # Load the historic CSV tick data files pairs = ["GBPUSD"] csv_dir = settings.CSV_DATA_DIR if csv_dir is None: print "No historic data directory provided - backtest terminating." sys.exit() # Create the historic tick data streaming class prices = HistoricCSVPriceHandler(pairs, events, csv_dir) # Create the strategy/signal generator, passing the # instrument and the events queue strategy = TestStrategy(pairs, events) # Create the portfolio object to track trades portfolio = Portfolio(prices, events, equity=equity) # Create the simulated execution handler execution = SimulatedExecution() # Create two separate threads: One for the trading loop # and another for the market price streaming class trade_thread = threading.Thread( target=trade, args=( events, strategy, portfolio, execution, heartbeat ) ) price_thread = threading.Thread(target=prices.stream_to_queue, args=) # Start both threads trade_thread.start() price_thread.start()
One of the drawbacks of using a multi-threaded execution system for backtesting is that it is not deterministic.
This means that over multiple runs of the same data we will see changes, albeit small ones, across the results.
This is because we cannot guarantee that the threads will execute instructions in the same order over multiple runs of the same simulation.
For instance, when placing items onto the queue, we might get nine
TickEvent objects placed onto the queue in run #1, but may get eleven in run #2.
Strategy object is polling the queue for
TickEvent objects, it will see different bid/ask prices across the two runs and thus will open a position at different bid/ask prices. This will lead to (small) differences in the returns.
Is this a major problem? I don't really think so. Not only is this how the live system will function anyway, but it also lets us know how sensitive our strategy is to the speed of data receipt.
For instance, if we calculate the variance of the returns across all of our simulated runs with the same data then it will give us an idea of how susceptible the strategy is to data latency.
Ideally we want a strategy that has a small variance across each of our runs. However, if it has a large variance, it means we should be very concerned about deployment.
We could even eliminate the problem of determinism entirely by simply using a single-thread in our backtesting code (as with the QuantStart equities event-driven backtester. However, this has the drawback of reducing the realism with the live system. Such are the dilemmas of high-frequency trading simulation!
Another issue that I keep bringing up is that the system is only capable of handling a base currency of GBP and a single currency pair, GBP/USD.
Now that the
Position handling has been substantially modified, it will be a lot easier to extend it to handle multiple currency pairs. This is the next step.
At that point we will be able to try multi currency pair strategies and eventually introduce Matplotlib to graph the results.
Don't forget to check out the current version of QSForex at the Github page.