Project II - Portfolio optimization
We consider the problem of choosing a long term stock portfolio, given a set of stocks and their price over some period under risk aversion parameter γ > 0.
Assume there are m stocks to be considered. The portfolio will be represented by a column vector w ∈ ℝm, such that ∑i=1..m wi = 1. If wi > 0, you use a fraction wi of your total money to buy the i‘th stock, while wi < 0 represent shorting that stock. In both cases we assume the stock is bought/shorted for the entire period.
Let pj,i represent the price of the i‘th stock at time step j. If there are n + 1 time steps, then p ∈ ℝ(n+1)×m is a matrix.
We let r ∈ ℝn×m be the matrix, where rj,i represents the fractional reward of stock i at time step j, i.e., rj,i = (pj+1,i − pj,i) / pj,i for 1 ≤ j ≤ n.
By rj we denote the j‘th row of r, viewed as a column vector (rj,1, …, rj,m).
We make the (unrealistic) assumption that we can model r by a random variable, distributed as a multivariate Gaussian, with estimated means
μ ≃ 1/n · ∑j=1..n rj
and estimated covariance matrix
Σ ≃ 1 / n · ∑j=1..n [(rj − μ)(rj − μ)T]
Note that μi and Σi,i are the estimated mean and variance for stock i.
The distribution of returns using some w is then
Rw = N(μw, σw2)
μw = wTμ
σw2 = wTΣw
Now, we want to maximize for a balance between high return μw and low risk σw2. This leads to the following optimization problem, where we want to find the value w* of w maximizing the following expresion:
maximize wTμ − γwTΣw
subject to ∑i=1..m wi = 1 ,
where γ controls the balance between risk and return. A high value of γ indicate we are willing to take low risk and vise versa.
In this project you should find w* for different values of γ and using real stock values of your choice. The project consists of the following three questions.
-
We need a module for collecting stock values, for this we will use the Python modyle
yfinancethat connects to Yahoo! Finance’s API. See https://pypi.org/project/yfinance/ for a description of how to install and use the module. Using this you should write a functionget_prices([stocks, ... ], start, end, interval), that returns a tuple,(stocks, prices). Whereprices[i, j]represent the opening price of stock j at time step i andstocks[j]is the name of the j‘th stock. Save the fetched data to a file, such that when requesting the same data again, the data is be loaded from the file instead of using the API (e.g., save a pandas dataframe as a csv file). -
Plot the loaded price data. Each stock should be labeled with its name (for example
MSFTorGOOGL). You should use at least 5 stocks. The legend should be ordered by the last price fetched for each stock. Plot both the raw stock prices and stocks normalized to all start with price 1.0. -
Calculate r, μ and Σ using the formulas above and the prices p calculated in the first question. Plot the probability density function (pdf) of the return of each stock.
Hint. The methodnorm.pdffrom the modulescipy.statsmight become convenient. Note that it takes the mean and the standard deviation as arguments. -
Solve the optimization problem defined above for different values of γ, e.g.,
gammas = numpy.linspace(1.0, 2.8, 10), and plot the pdf of each solution to a single plot with appropriate legends. Finally, create a scatter plot of how w* changes as γ changes. For each value of γ plot the fraction of each stock in the portfolio.

