Background: Markets have long sought to generate alpha from policy developments. The 24/7 media cycle, turbo-charged by social media and algorithmic high frequency trading, accelerates execution cycles and generates information overload which complicates strategy formation. This blog series articulates rules for designing news-related trading strategies. They apply across all asset classes and financial instruments.
Structured Data 101
Investors need data the way people need oxygen. Capital markets have consistently been on the frontier of technological innovation precisely for the purpose of acquiring more and better information faster. Before we had Twitter, there was the ticker tape, the wire services, and then the Bloomberg terminal.
Capital markets may measure success through the financial gains acquired through alpha generation, but the business of investing is only incidentally about money.
Successful intermediation is about being able to manage, understand, and capitalize on information flows better than the competition.
This is why the financial industry has, and will remain, at the innovation frontier regarding mechanisms to deliver information more efficiently. Investment funds are mobilized only after an investor has acquired sufficient information to justify the move.
Data provides concrete, objective information for the foundation of any investment thesis and portfolio strategy. Data points span a broad spectrum from firm-specific micro data (quarterly results, sales revenues, ROI, etc.) to sectoral data (industry sales trends, marketing statistics, market trading data) to macroeconomic data (GDP, interest rates, trade flows, stock market trading data including indices).
This array of data points has one thing in common. Within the technology universe, these data points are considered “structured” data.
At the highest level, “structured” data consists of “any data that resides in a fixed field within a record or file. This includes data contained in relational databases and spreadsheets.” It covers all integers traditionally used by market analysts to determine whether a specific investment is appropriate for a given investment objective (usually, delivering alpha or executing on a hedging strategy). It covers all integers used by economists and published by governments. But it also covers words that have been incorporated into a spreadsheet.
The point is that structured data only exists in the context of an organizing framework. It provides the fodder for data science, which seeks to identify patterns such as correlations and covariances within the data. Increasingly, this analysis can be automated using machine learning and artificial intelligence utilities. Whether a person or a machine analyzes the data, in order to generate insights, the data must appear in a structured format.
Data, and the science of analyzing data, currently are all the rage given the proliferation of new data thrown off by smart devices and internet usage. Capital markets again appear on the frontier here, purchasing access to new kinds of data from new kinds of vendors.
As noted in this blogpost, the universe of alternative data used in finance may be vast but it is not necessarily revolutionary. Location data has long been used to assess creditworthiness (often with controversy and legal restrictions). Lifestyle habits have long been used to assess premium pricing for health, life, and auto insurance. Weather data has long been used to assess premium pricing for property insurance.
Alternative data from smart devices accelerate and deepen the level of information available regarding location, lifestyle, weather, and spending patterns. Sophisticated IT processing makes it possible to combine these data points in new ways and to generate deeper analysis of existing data in order to generate better insights — and better decisions — in order to deploy investment funds effectively.
Two frontiers exist in the alternative data space:
(1) unstructured data and
(2) entirely new kinds of data.
“Unstructured” data consists of content that cannot easily be crammed into a spreadsheet cell such as images, audio, email, and PDF files. In 2013, a Gartner blog defined the term as components that are “human-generated and people-oriented content that does not fit neatly into database tables.” They then boldly (and probably prematurely) declared that processes to convert this content into structured data had made information overload premature.
It is certainly true that rapid advances in translating unstructured data into a machine-readable format have been accelerating for the last decade. Our own patent for converting words into numbers illustrates the point nicely. But the conversion process has created entirely new kinds of data for which no historical precedent exists. See the questions and issues at the end of this blogpost for the kinds of analytical challenges this creates. These developments are incredibly exciting for those of us on the innovation frontier, but they also generaet challenges for investment professionals experimenting with how best to use this data.
Managing the Risks of Alternative Data
Using alternative data to support investment decisions generates different kinds of risks. Incorporating habit tracker data for insurance premium setting is less risky at the conceptual level than using entirely new kinds of data which have never been used before in the investing process. Convincing your risk committee or your individual investors to rely on this information when committing capital to an investment thesis can present challenges.
Among other things, the alternative data may not have a sufficient track record. Even if this hurdle has been overcome, if a firm chooses to rely on alternative data, it will want to be able to continue relying on that data for an extended period of time (particularly for long-dated, multi-year positions). The data vendor may not have been in business long enough to generate confidence by an investment committee that they will be able to provide a steady stream of high quality data to support a multi-year relationship.
These are legitimate concerns. But the concerns also create barriers to innovation. Staying on the sidelines creates the possibility that an investor will forego substantial alpha-generation opportunities that accrue to early adopters.
The data revolution is not going away any time soon. Those who understand how to maximize the utility of newly available information streams will reap significant rewards through smarter, better investment decisions.
Investment professionals interested in relying on alternative data therefore must implement a range of risk management policies to guide experimentation with new data streams.
We suggest those risk management policies consist of the following four elements:
- Training Wheels: When accepting a new data stream for which no historical patterns exist, operate your modeling processes in parallel (one with the new data, one without) for a period of time before committing real capital. The time period for parallel runs will differ based on the firm, its risk management and compliance culture, and of course the results. Some data streams may generate reliable results in 6 weeks; others could take six months.
- Nowcasting: Don’t get too hung up on historical data. Particularly in periods of significant structural shift, historical data has limited utility when anticipating new outcomes. Consider the questions at the end of this blogpost when assessing whether any given historical analysis might (or might not be) relevant to your specific investment thesis and/or data source. Rely on Rule 8 and align your testing time horizon to shorter-term outcomes. If the new data delivers credible results in the near-term over a period of 6–12 months, then investment in more formal backtesting may be warranted.
- Operational Risk/Vendor Management: A robust regulatory and compliance framework already governs how financial firms interact with third party vendors. In the United States, policymakers like the Office of the Comptroller of the Currency have become creative in defining the parameters for mitigating vendor-related operational risks. Among other things, as this 2017 FAQ indicates, a sliding scale exists. The amount of due diligence required of a regulated firm is proportionate to the level of reliance on that firm. So at the beginning, if you are using the training wheels suggested above, the level of reliance is not high. In addition, the OCC makes clear that a range of components can be used to assess the financial viability of the start-up, including: access to funding, funding sources, net cash flow and expected growth.
- Seed Stage or Targeted Investment: Some investors may consider underwriting specific product development or formal backtesting programs to validate the information value of the proposed data stream. Deploying the firm’s capital to develop a promising technology before making that technology and data available when allocating investment capital from third party clients can generate subsequent gains when robust data streams have been developed while limiting client risk exposure. When considering this option, it is important to determine first whether any regulations prohibit or restrict this kind of investment by a regulated financial institution.
Responsible use of alternative data — particularly the new frontier data for which no historical precedent exists — can deliver dramatic improvements for investment professionals. The key is to understand that the new alternative data on offer will require an additional investment of time in order to understand well the contours of the data set and its relationship with the investment process. Implementing appropriate risk management safeguards when experimenting with the new data will provide a more reliable foundation for deploying new data to generate alpha.
BCMstrategy, Inc. provides strategic investors seeking to maximize alpha from the news cycle with daily, automated, objective, and patented policy risk momentum data and a time series of policy activity in the following areas: Brexit, Global Trade, FinTech, and Banking. Data analysis is also available through specialized publications starting June 1,