How to get more value from data: actionable solutions to common problems.
04.01.21 | BY Simon Mahony
In the past few years we have worked with many fundamental investors to help them get value from new data and over that period we’ve seen first-hand the problems funds routinely run into and the frustration that can arise as a result of them. Over time we’ve learnt that many of the problems stem from the same small group of mistakes and that these mistakes can be fixed. In this short article our goal is to describe and to codify two of the most common problems, to explain why they tend to occur, and to suggest some actionable remedies.
The incompatibility problem
One of the most common problems funds hit when bringing on new data or new data methods is difficulty getting the analysts/PMs to engage. A common pattern is that a data vendor or internal data researcher is pitching something they think should be interesting but find the analyst/PM is not engaging in the way they hoped. To the data person this can feel like outright irrationality and evidence that fundamental investors are not long for this world; to the analyst/PM it can seem like a waste of resources on hype driven but irrelevant work. In practice something like this situation often happens with sentiment scores: the data team works hard to produce scores which capture the positive/negative sentiment in a quantitative way and those scores may perform very well in a predictive capacity but when introduced to the fundamental investors after a brief spell of curiosity they fall flat and end up being ignored.
What has gone wrong in these situations is that the investor has been presented with data which is incompatible with their process. You can think of fundamental investors as machines for converting data into investment decisions, and like any kind of machine they need the right kind of input to generate the desired results – if you put diesel into your petrol powered car you won’t get very far. While no two fundamental investors will have identical processes, as a group they are similar enough for us to make some generalisations: the kind of inputs investors thrive on are things like company filings, earnings calls, broker research, industry analysis, and news. Those are datasets – and they are datasets, even if that is not how we usually think of them – which are compatible with the investment process.
What then makes one dataset compatible and another incompatible? Ultimately it comes down to understanding. For data to be compatible with an investor’s process they need to understand where to place it in their analysis, how to weigh it against other inputs, when to trust it, and when to ignore it. Sentiment scores typically fail on this test: it’s hard to know how to weigh a negative sentiment score against your higher than consensus earnings forecast, or positive M&A expectations, or long term bullishness on demographic driven sales growth; and it’s hard to know when a sentiment score pointing in one direction should trump a thesis pointing in the opposite direction. Compatibility is not a fixed concept but relative to an investor: different investors may be able to work with different datasets and the same investor may, over time, find that what is compatible for them changes as their skills and experience change.
The roots of this problem lie in a common but mistaken – in the context of fundamental investors – approach to data and extracting insights from it. Many investors don’t encounter this problem until they start working with new data, so there can be a tendency to think that the problem is the data, but really the problem is what is being done to the data. You can see this by thinking about sentiment scores again: they turn something investors are very comfortable working with (earnings calls, say) into something very alien. The reason new datasets are especially subject to this is that many of them are not consumable by investors in their original or raw form, meaning there is an extra step required. This extra step introduces data science and data scientists into the mix (not necessarily in your fund, they may be on the vendor side) and this creates fertile ground for problems to appear.
A key part of the problem is that most data scientists are trained to think about data science as an independent activity instead of one which fits into a bigger pipeline. As a result they define ‘good’ in terms of certain objective measures like model performance and tend to optimize for improving these measures. This makes perfect sense in an environment where data science is the entire process – like in a quant fund – but works very badly when data science is occupying an intermediary role – like in a fundamental fund. In our experience funds do well when instead of focusing on objective measures of correctness instead they focus on compatibility, i.e. ensuring investors understand and are comfortable working with the data they receive. Data scientists in this environment need to understand that their role is not to do the best data science in a global sense, but the most useful data science.
We have two general rules you can apply to help mitigate this problem. The first is to make the output of any data analysis, model, or signal directly, logically, and obviously impact or predict some company KPI or accounting line item. It follows from this that data scientists should avoid focusing on share prices or entirely abstract things like sentiment. This rule doesn’t prevent the data from being bad but it does guarantee that it is recognisably about something investors understand and will be compatible with their process. It also importantly gives the investor and the data scientist a common language to communicate in which allows for iteration and improvement as both parties become more familiar with the working practices and preferences of the other: ‘this just isn’t relevant to me’ becomes ‘I don’t care about KPI X for company Y for these reasons…’.
The second rule is to do as little data science as is necessary, or in other words to give investors the rawest data they can handle. For obvious human reasons this can be difficult for data scientists to follow and to some it may even sound willfully perverse. But there is a good reason, namely that every extra step in processing or each additional element of complexity adds more distance in understanding between the investor and the data. Importantly we find reducing complexity gives the best overall results even when more complexity and processing is objectively better in terms of accuracy and performance. Organisationally this means bringing investors into the data process early and making sure they understand all of the important steps in the process even when those are technical. Once again this is a dynamic state of affairs: different investors have different degrees of understanding and any one investor can learn more over time and thereby increase the range of data that is compatible with them. It also means that investors who want to get more value from data must engage with their data teams and vendors and should not shy away from digging into the nitty-gritty of collection and processing.
The actionability problem
Another common way in which data can fail to deliver for the investor is that the insights from the data may not be actionable or may be uneconomical if the cost to produce them outweighs their value. As an example you might be able to track social media engagement for a consumer brand which demonstrates that it has meaningfully better engagement than its main competitors – but that is unlikely to support an investment decision in the absence of other views on the company. The problem here is that while you have an insight it is a weak insight, meaning that on a standalone basis it is not enough to motivate an investment decision. Weak insights are not necessarily a problem, in fact most insights from data are weak in this narrow sense, but are only useful if you can ensure that you have pre-existing investment theses to plug them into or if not to ensure the cost to produce them is low enough to economically use them in screening and background research.
This problem occurs because (1) it is hard for most fundamental investors to trade small insights in isolation and (2) because the process of mining insights from data is very costly. The first issue is really less about being fundamental as such, and more a byproduct of most fundamental investors being quite concentrated: investors who run concentrated portfolios have a higher bar for inclusion that an idea must meet than a more diversified manager with an otherwise identical strategy. In our experience investors running portfolios with <100 names will often find that the insights they get from the data alone are insufficient to meet that bar (we should note that this is only generally true, in practice there are exceptions). This is one of the reasons why quantitative investors can often structurally pay more for certain data because it is easier for them to trade on all of the insights in a given dataset. The second issue comes down to the contingent fact that in 2021 data and the people who work with it are very expensive, which in turn means that the cost of insights from data are also very high. These are the two factors weighing on the ROI of your data strategy and keeping them in check requires a structured process and good discipline.
We find one of the best solutions to this problem is to work ‘from the idea to the data’. When you decide to bring in new data to your investment process there are broadly two ways you can do it: you can begin with the data and mine it for insights (going from the data to the idea), or you can begin with fully or partially formed investment ideas and look for data which would support or weaken them (going from the idea to the data). Most fundamental investors, most of the time, will find that they achieve significantly better ROI following the second approach. In practice this means reviewing your positions/ideas and asking what data in an ideal world would help you to confirm or disprove an investment thesis, and then going to look for that data. When stated like this most fundamental investors find this intuitive but the difficulty comes in maintaining discipline when confronted with seemingly infinite datasets and institutional pressure to do more with data.
This amounts to running a kind of ‘actionability’ filter ahead of time and ensures resources are only spent looking at datasets where successful insight mining will have an investment impact. It has two additional benefits: first that it gives you from the outset clear metrics for success which helps cut through vendor marketing and technical blather, and second that it allows you to estimate (necessarily very roughly) ahead of time the value of a given insight and to budget accordingly. Initially you may want to run a review process over all of your holdings and ideas in a strict fashion, but over time this is likely to become less burdensome as you only really need to go through this exercise on new names and ideas. You will also likely find some data sources demonstrate their evergreen usefulness which you keep on even if holdings change whereas others are tactical, picked up and discarded as needed. As part of the review process you will need to think about the stability and longevity of your interests in companies and sectors, particularly as you consider situations where data cannot be collected after the fact, and therefore need to decide whether you are willing to invest today to answer investment questions in the future.
FDB Systems works with investors to help them get value from data. We provide a full suite of data services so clients can focus on their core competencies and get maximum leverage from their internal resources. We serve clients with a range of different internal capabilities and budgets – at one end we work with numerous funds building data capabilities from scratch, at the other funds with large established data teams. Our clients have combined AUM of over $100bn and represent a wide range of investment strategies including activist short selling, long/short equity, concentrated long only, distressed debt and private equity.