Thanks for the comments, Gregor. I think you are right to be concerned about this.
Currently, this is what I do:
First, I order the securities by days till maturity, and rank them to get a feature I call `rank`. I would prefer not to do this step, but all the algorithms I have tried so far require a target variable to produce a prediction.
Then I use a regression-type algorithm to predict rank from the other features I deem most relevant. I currently have not added bond ratings, as that would take some work in data collection. But I do have variables like yield, current price, annual income, and taxable status and investment gail or loss. There is some collinearity. For instance current price is related to yield, mediated by market perception of interest rate risk. And as you point out it might be a good idea to omit DaysToDue, which is really just the rank in different scale, so very collinear; though in this portfolio most of the maturities are longer and so should not have much effect on the rank; only a few are close in.
I think you are right that I am predicting largely noise at present, and that accounts for the fluctuations.
I'd rather not predict rank; I'd like to find an algorithm that could predict a score that could be used for ranking without needing a target variable. But I don't know one. Perhaps instead I could use some version of clustering but that would require interpreting clusters.
With my random forest, each run you get a slightly different top 10! And different orders within the top 10.
I have got shapr to produce SHAP importances for each driver for the top 10 securities. They too bounce around. If I want the explanations, I have to be careful which algorithms I use as not all are supported.
If anyone knows an algorithm that can use features to produce a score without a training target, I'd like to try it.
------------------------------
Bruce Hartman
Professor
University of St. Francis
Tucson, AZ United States
bruce@ahartman.netwebsite:
https://sites.google.com/ahartman.net/drbrucehartman/Homeblog:http://supplychainandlogistics.org
------------------------------
Original Message:
Sent: 06-23-2025 11:53
From: Gregor Reich
Subject: Optimizing portfolio redemption
I fear that I still don't quite get what you want to achieve: if I understand you correctly, you regress time-to-maturity on some other features (like credit rating, the spread between price and face values, etc.), and then use the prediction from that model to obtain a new ranking. Is that correct?
If so, I have another question, and a comment:
- Why should the ranking based on the prediction produce better results than that based on the original metric (time-to-maturity)? My understanding is that you always have that quantity available for all your assets (so, technically, no need to predict it), and that the only difference between the actual value and its prediction is essentially noise, isn't it?
- I'm speculating now, but I could imagine that predicting time-to-maturity from other properties is delicate, in that few features have true explanatory power, at least not linearly: For example, a high price-to-value ratio is indicative for a long time to maturity, but not vice-versa. Other features might indeed induce multicollinearity issues: For example, low credit ratings might correlate with shorter overall term of bonds in general, and thus produce lower times to maturity on average; if, at the same time, you include the overall term as a feature directly, you might run into statistical (and numerical) problems. Now with regard to your observation that different algorithms produce vastly different rankings-I could imagine that this is a consequence of their very different handling of such challenging setups.
------------------------------
Gregor Reich
Senior Consultant
Tsumcor Research AG
Schwerzenbach, Switzerland
------------------------------
Original Message:
Sent: 06-18-2025 10:44
From: Bruce Hartman
Subject: Optimizing portfolio redemption
Gregor, thanks for the comment. And Seline, thanks too!
The securities have different coupon (income) rates, different market prices, different maturities, and different ratings (quality as judged by an external rating agency, somewhat reflected in the interest rate or current price). The bonds may also be tax-free nationally and/or at the state level, depending on whether they are issued by the same state in which you live.
In addition, as bonds approach a call date, their price converges to 100% of their face value.
Now suppose you had to raise X dollars in cash. You could offer some of the securities for sale at something close to the market price, enough to get you X in cash.
So if you rank the securities by which ones should be sold first, you could just start selling 1,2, and so on till you reach the amount X.
So my optimization so far uses the different features to produce a ranking, the order in which they should be sold.
It would be nice if there was an unsupervised algorithm that could produce such a ranking. But, for instance, a random forest regression can't do that. Instead as a surrogate I generate a preliminary ranking based on the number of days till maturity. Obviously, that is the order in which the securities would return cash to you absent any other predictors.
Then I can run a random forest regression (or any other regression routine) of rank on the other features to produce a predicted rank. Sorting predicted rank first to last produces an integer order or rank in which they could be sold.
Clearly you'd think a cash holding might come first, since it's available now without a sale.
Securities very close to maturity or prerefunded perhaps ought to come next, though there are others which are worth more than their face value, because they have interest rates higher than the current interest rates; certain corporate bonds would be an example.
Trouble is, when I run my random forests or regressions with different parameters, I get quite different rankings, and they aren't very intuitive.
Sirine Taleb pointed out that there might be a case for an explainable AI (XAI) solution. I'm currently looking at SHAP to create 'explanations' for the high-ranking items. (I've done some work with cooperative games in the past.) Apparently SHAP and LIME differ mostly in the kernels (weights) employed in the calculation.
I'm learning a lot about what's possible with R. But I'm not sure even the explanations will improve on a bad optimization.
I think there's a hole in my thinking because I'm using my surrogate rank as a target variable. But the algorithms I've tried need something to fit to.
Any suggestions are gratefully received.
------------------------------
Bruce Hartman
Professor
University of St. Francis
Tucson, AZ United States
bruce@ahartman.net
website: https://sites.google.com/ahartman.net/drbrucehartman/Home
blog:http://supplychainandlogistics.org
Original Message:
Sent: 06-18-2025 09:47
From: Gregor Reich
Subject: Optimizing portfolio redemption
Can you be more specific about what you mean by "rank the securities for selling"? In my understanding, your problem statement induces an optimization problem; but somehow, the methods you describe sound to me like you make a prediction first (and then-I'm speculating now-optimize the portfolio "heuristically" by selecting the asset with worst predicted performance to be sold?).
------------------------------
Gregor Reich
Senior Consultant
Tsumcor Research AG
Schwerzenbach, Switzerland