Efficient Investing: Turning Complex Data Into Simple Strategies

Henri Caron
Feb 7, 2025
5 min read

Following my previous post on how Google Gemini has helped me expand my investment analysis, I want to dive deeper into how the optimisation process works and what the results actually mean. Investing, at its core, is about finding the right balance between risk and return (you can read more about this topic here). According to Modern Portfolio Theory, risk is expressed as the volatility of an asset (its daily price movements), while return reflects the average increase or decrease in value over time.

Step 1: Visualizing & Cleaning the Data

Each dot on the graph represents one stock or cryptocurrency, plotted based on one year of historical data for its volatility (risk) and return. To kick off the process, I begin by visually analyzing this data to identify patterns and anomalies before diving into the task of cleaning it. This step is crucial because extreme data points, or outliers, can significantly distort the results and slow down subsequent calculations.

Assets are excluded based on the following criteria:

Extreme Outliers: A small number of assets show extreme volatility or returns. These could be due to data errors or unique events that make them irrelevant for portfolio analysis. For example, a cryptocurrency that surged 10,000% due to a one-time event isn’t useful for long-term investing.

This scatter plot shows that there at least a couple of extreme outliers

High Risk for Medium Return: Zooming in on the remaining data, some assets still show disproportionately high risk for mediocre returns. These are also excluded to focus on more balanced opportunities.
Negative Average Returns: Assets with negative returns over the past year are excluded, as they’re not ideal for a growth-focused portfolio, and these will slow down the optimization calculations.

High-risk - Medium-return and negative-return assets are also removed from the analysis

In the end, only the assets represented by blue dots in the graph are kept for further analysis. These represent stocks and cryptocurrencies with reasonable risk-return profiles.

Step 2: Generating Random Portfolios

Once the data is cleaned, I calculate a series of random portfolios. Each portfolio is created by randomly selecting 25 assets from the pool (excluding extreme outliers but still including all the assets with negative returns). When you look at the graph, these portfolios are represented by green dots, showing a range of returns from -10% to +30% annually. It already becomes visible that investing in random portfolios drastically reduces volatility (= risk) whilst maintaining results that are in line with individual assets. This is the visual representation of diversification. While these results are decent, there’s still room for improvement.

Covariance Matrix: Tackling Complexity with Technology

An important part of this process involves calculating the covariance matrix, which measures how 6,000 assets relate to one another. Originally, this matrix contains 36 million data points (6,000 x 6,000). However, after removing assets with negative returns and extreme outliers, the dataset becomes significantly smaller, reducing the computational load. For example, a 33% reduction in the number of assets (from 6,000 to 4,000) leads to a 55% reduction in matrix size, resulting in around 16 million data points instead of 36 million. This makes the calculations far more manageable. Tools like Google Colab play a key role here, providing the processing power necessary to analyze these massive datasets.

Step 3: Understanding the Efficient Frontier

This is where optimization comes into play. Diversification is key, and the concept of covariance helps reduce risk. Covariance simply means that not all assets move in the same direction; when one asset’s value drops, another might rise, balancing the portfolio. By optimizing the allocation of over 6,000 stocks and cryptocurrencies (using one year of historical data), we calculate the efficient frontier.

The efficient frontier is a curve made up of dots, where each dot represents a portfolio optimized for the best risk-return tradeoff.

The efficient frontier - all possible optimized portfolio for the blue assets, are combined into the blue line

Here’s how it works:

If you could choose between two portfolios with the same expected return (e.g., 25%), one with 10% risk and the other with 25% risk, you’d obviously choose the lower-risk option. This is represented by moving as far left as possible on the efficient frontier.
Conversely, if you have a maximum risk tolerance (e.g., 15%), you’d aim for the portfolio with the highest return within that risk level. This is represented by moving as far up as possible on the curve.
Examples:
- The green star represents the portfolio with the highest return possible for a risk of 15%
- The blue start represents the portfolio with the lowest risk possible for a return of 50%

Random Portfolios vs. Optimized Portfolios

The difference between random and optimized portfolios becomes evident when comparing their performance. Random portfolios (green dots) already show lower volatility and similar returns compared to individual assets, proving the power of diversification. However, optimized portfolios take this further by carefully selecting assets to maximize returns for a given risk level or minimize risk for a desired return. For example:

A single asset with 25% expected return might come with 30% risk. A random portfolio could reduce this risk to 20%, while an optimized portfolio could push it down to 10% for the same return.
Similarly, if your goal is to keep risk at 15%, a random portfolio might deliver 20% returns, whereas an optimized portfolio could achieve 30% or more.

Opimized portfolios clearly offer better performance compared to random portfolios

All the graphs above include a restriction where no single asset exceeds 4% of the total portfolio. This ensures proper diversification and prevents over-reliance on any one stock or cryptocurrency.

Key Takeaways

Diversification Matters: If you knew which single asset would perform best in the future, you’d put all your money into it. But since no one can predict the future, diversification reduces risk while maintaining decent returns. Comparing the green dots (random portfolios) to the blue and yellow dots (individual assets), it’s clear that diversification lowers risk significantly for similar returns.
Optimization Works: By optimizing portfolios, you can achieve roughly double the return for the same risk compared to random portfolios. The efficient frontier highlights portfolios that maximize returns while minimizing risk.

Putting everything in overlay to each other shows the big picture

Limitations

It’s important to note that the model relies on historical data, which isn’t a guarantee of future performance. It is therefore no garantuee that by investing in a portfolio with expected average return of 100%, that you will effectively obtain 100%. However, research shows that trends often persist—assets on a steady upward trend are likely to continue in that direction. By combining the right assets, the goal is to achieve results that end up between the efficient frontier and randomly generated portfolios. While historic data offers valuable hints, it cannot predict the future with certainty.

This means the optimization process provides a structured approach to reduce risks and enhance returns, but it must be complemented by judgment, market awareness, and ongoing monitoring of trends.

Disclaimer

This post is for informational purposes only and does not constitute investment advice. Always do your own research and consider consulting a financial advisor before making investment decisions.

Investment Tales