Thanks for the comments, Gregor. I think you are right to be concerned about this.
First, I order the securities by days till maturity, and rank them to get a feature I call `rank`. I would prefer not to do this step, but all the algorithms I have tried so far require a target variable to produce a prediction.
Then I use a regression-type algorithm to predict rank from the other features I deem most relevant. I currently have not added bond ratings, as that would take some work in data collection. But I do have variables like yield, current price, annual income, and taxable status and investment gail or loss. There is some collinearity. For instance current price is related to yield, mediated by market perception of interest rate risk. And as you point out it might be a good idea to omit DaysToDue, which is really just the rank in different scale, so very collinear; though in this portfolio most of the maturities are longer and so should not have much effect on the rank; only a few are close in.
I think you are right that I am predicting largely noise at present, and that accounts for the fluctuations.
I'd rather not predict rank; I'd like to find an algorithm that could predict a score that could be used for ranking without needing a target variable. But I don't know one. Perhaps instead I could use some version of clustering but that would require interpreting clusters.
With my random forest, each run you get a slightly different top 10! And different orders within the top 10.
I have got shapr to produce SHAP importances for each driver for the top 10 securities. They too bounce around. If I want the explanations, I have to be careful which algorithms I use as not all are supported.
If anyone knows an algorithm that can use features to produce a score without a training target, I'd like to try it.
University of St. Francis
Original Message:
Sent: 06-23-2025 11:53
From: Gregor Reich
Subject: Optimizing portfolio redemption
I fear that I still don't quite get what you want to achieve: if I understand you correctly, you regress time-to-maturity on some other features (like credit rating, the spread between price and face values, etc.), and then use the prediction from that model to obtain a new ranking. Is that correct?
If so, I have another question, and a comment:
- Why should the ranking based on the prediction produce better results than that based on the original metric (time-to-maturity)? My understanding is that you always have that quantity available for all your assets (so, technically, no need to predict it), and that the only difference between the actual value and its prediction is essentially noise, isn't it?
- I'm speculating now, but I could imagine that predicting time-to-maturity from other properties is delicate, in that few features have true explanatory power, at least not linearly: For example, a high price-to-value ratio is indicative for a long time to maturity, but not vice-versa. Other features might indeed induce multicollinearity issues: For example, low credit ratings might correlate with shorter overall term of bonds in general, and thus produce lower times to maturity on average; if, at the same time, you include the overall term as a feature directly, you might run into statistical (and numerical) problems. Now with regard to your observation that different algorithms produce vastly different rankings-I could imagine that this is a consequence of their very different handling of such challenging setups.
------------------------------
Gregor Reich
Senior Consultant
Tsumcor Research AG
Schwerzenbach, Switzerland
Original Message:
Sent: 06-18-2025 10:44
From: Bruce Hartman
Subject: Optimizing portfolio redemption
Gregor, thanks for the comment. And Seline, thanks too!
The securities have different coupon (income) rates, different market prices, different maturities, and different ratings (quality as judged by an external rating agency, somewhat reflected in the interest rate or current price). The bonds may also be tax-free nationally and/or at the state level, depending on whether they are issued by the same state in which you live.
In addition, as bonds approach a call date, their price converges to 100% of their face value.
Now suppose you had to raise X dollars in cash. You could offer some of the securities for sale at something close to the market price, enough to get you X in cash.
So if you rank the securities by which ones should be sold first, you could just start selling 1,2, and so on till you reach the amount X.
So my optimization so far uses the different features to produce a ranking, the order in which they should be sold.
It would be nice if there was an unsupervised algorithm that could produce such a ranking. But, for instance, a random forest regression can't do that. Instead as a surrogate I generate a preliminary ranking based on the number of days till maturity. Obviously, that is the order in which the securities would return cash to you absent any other predictors.
Then I can run a random forest regression (or any other regression routine) of rank on the other features to produce a predicted rank. Sorting predicted rank first to last produces an integer order or rank in which they could be sold.
Clearly you'd think a cash holding might come first, since it's available now without a sale.
Securities very close to maturity or prerefunded perhaps ought to come next, though there are others which are worth more than their face value, because they have interest rates higher than the current interest rates; certain corporate bonds would be an example.
Trouble is, when I run my random forests or regressions with different parameters, I get quite different rankings, and they aren't very intuitive.
Sirine Taleb pointed out that there might be a case for an explainable AI (XAI) solution. I'm currently looking at SHAP to create 'explanations' for the high-ranking items. (I've done some work with cooperative games in the past.) Apparently SHAP and LIME differ mostly in the kernels (weights) employed in the calculation.
I'm learning a lot about what's possible with R. But I'm not sure even the explanations will improve on a bad optimization.
I think there's a hole in my thinking because I'm using my surrogate rank as a target variable. But the algorithms I've tried need something to fit to.
Any suggestions are gratefully received.
------------------------------
Bruce Hartman
Professor
University of St. Francis
Tucson, AZ United States
bruce@ahartman.net
website: https://sites.google.com/ahartman.net/drbrucehartman/Home
blog:http://supplychainandlogistics.org
Original Message:
Sent: 06-18-2025 09:47
From: Gregor Reich
Subject: Optimizing portfolio redemption
Can you be more specific about what you mean by "rank the securities for selling"? In my understanding, your problem statement induces an optimization problem; but somehow, the methods you describe sound to me like you make a prediction first (and then-I'm speculating now-optimize the portfolio "heuristically" by selecting the asset with worst predicted performance to be sold?).
------------------------------
Gregor Reich
Senior Consultant
Tsumcor Research AG
Schwerzenbach, Switzerland
Original Message:
Sent: 06-13-2025 15:24
From: Bruce Hartman
Subject: Optimizing portfolio redemption
I'm currently working on a finance problem that might be interesting to some in INFORMS. It deals with portfolio redemption.
Say you have a portfolio composed of varied amounts of fixed income securities (to make things easier). Some are taxable, some not; they have different call dates and callable dates, and different interest rates, and generate different amounts of income. Naturally their market prices are different each day, due to changing market conditions, so the amount you'd get from selling one would be different each day. For the sake of argument, assume all securities have been held long term, so income vs capital gains tax in the US is not a factor. Including this sort of constraint would be a potential future research goal.
As an example scenario, suppose the portfolio owner wants to raise some cash fast, say to make a down payment on a house. Which of the securities should be selected to offer for sale to raise the amount of needed cash? What features would help solve this decision problem?
I have tried linear regression, principal components regression, and various random forests to rank the securities for selling. Each method gives a very different set of securities.
Does anyone have ideas about a good technique? Or does anyone know of some research on this sort of problem? What are your thoughts?
------------------------------
Bruce Hartman
Professor
University of St. Francis
Tucson, AZ United States
bruce@ahartman.net
website: https://sites.google.com/ahartman.net/drbrucehartman/Home
blog:http://supplychainandlogistics.org
------------------------------