Mastering the Fine Art of Capital Markets Data Quality

Share on twitter
Share on linkedin
Share on facebook
Share on email

If you work in the securities industry, you’ve probably said at one time or another that our industry is different – different from commercial banking, insurance, retail and a whole host of other industries. If you were talking about data quality, you’d be right. With data quality, the use and type of data determine what matters.

For this piece, we would like to take a more technical look at the elements of data quality that were encompassed by The Market Data Bill of Rights we recently co-authored with BondCliq. We’ll examine both the definition of data quality and how it is achieved in practice

Defining Market Data Quality

Capital markets data quality is all about ensuring that market data meets the needs of industry professionals. Keeping this in mind, we define data quality as including the following elements.

  • Accuracy – Based on exactly what the execution venue published, which typically means a PCAP file with precise timestamps in the exchange’s native format. Capture must happen at both primary and secondary data centers to ensure lossless, synchronized data and nanosecond timestamp granularity.

  • Availability – Available in a timely manner at the location of our clients’ choosing.

  • Completeness – Both within a feed and across feeds.

  • Within a feed – Redundant capture across multiple sites, capturing A/B lines to ensure a clean and complete data stream.
  • Across feeds – Full range of Level 1, 2 and 3 data. Depth-of-book feeds across all major markets, including equities, equity options, derivatives, futures, commodities, etc. 

  • Latency – Captured within timeframes required to support low-latency use cases.

Achieving Data Quality

From onboarding a new feed to ongoing operations, ensuring data quality requires capturing multiple copies of the feed while putting the right monitoring and validation procedures in place along with a robust and automated issue resolution process. It is not enough to know you have a problem, you need to fix it before a user ever accesses the data.

The figure below illustrated the elements we focus on when onboarding a new feed:


Capturing data - the more the merrier

Redundant packet captures are a key element to how we achieve lossless data. Our most backed-up feeds – the U.S. Equity SIP feeds – each have more than 14 unique captures associated with them, 7 for Line A and 7 for Line B. As part of our process, we capture both Line A and Line B of each feed. We do the same for C, D, and E if they are offered. This allows us to arbitrate between different captures to create the most complete picture possible of what an exchange published.

Check networks early and often

Throughout the trading day, we monitor our networks in real-time for any packet drops at a network interface controller (NIC) level. If any issues are detected, we will modify that night’s processing to instead pull from non-degraded captures. We run validations again in the early mornings and can access other captures to fill any gaps.  

Quality begets quality

While we capture exactly what exchanges publish, capital markets professionals often require normalized data for workflows such as cross-venue analysis. Focusing on achieving data quality with raw exchange data translates into quality normalized data as well. We store the data in raw PCAPs so any normalization issue can be fixed in the code and re-run as needed.

Separate the trees from the forest

By capturing nanosecond timestamps, analysts are able to drill down to the most granular level and reconstruct the market with precision. When the sequence of orders and trades matters, nanosecond timestamps and exchange sequence numbers come to the rescue. We have successfully conducted analysis for clients who wanted options quotes with the underlying equity quotes in the same snapshot. Using nanoseconds, we deterministically matched individual options quotes to underlying quotes down to the single quote level.

Summary: quality market data is key to a successful business

The expression “data is the lifeblood of business” has become a truism in nearly every industry, but none more so than the capital markets. With all significant market participants now requiring incredibly sophisticated approaches to data ingestion and consumption, the race has shifted to be as much about quality as about speed. Ultimately, data quality is about meeting the needs of industry professionals.

Our approach to quality data

To learn more about our approach to data quality, check out MayStreet Data Quality: An In-Depth Examination or reach out if you’d like to set time to talk directly to a member of our data capture team.

Naftali Cohen, Chief Revenue Officer