If you're using neural networks, I can offer some suggestions. If not, I can't help you too much. So this answer is about neural networks.
Though some may disagree, I believe the main question with deep learning textual analysis is whether the sequence of words is important. The original text parsing neural networks looked at "bags of words," not words in a sequence. Lately word sequence has been of more interest among data scientists, but it can be overemphasized. Bags of words should be tried before tools to read word sequences.
In your example, it looks like overemphasis on sequences is actually messing up your object (concept) encoding (you don't mention any other outcome/prediction/classification your model needs to predict). If I'm reading you correctly, you have three key words – Bobby, library, book. The direction of the book (checked in vs checked out), if relevant, may require a separate library that translates action phrases into direction words. Other than that, you only need bags of words and the correlations among the three you are interested in, though I'm sure you're seeking something more sophisticated than that.
There are other machine learning methods that don't utilized neural network analysis. One advantage of neural networks is that they do not need to produce outcomes (e.g., classification) to be useful – unsupervised and self-supervised neural networks can uncover interesting patterns, such as the Bose-Einstein distribution in some corners of the event space.
Good luck!
------------------------------
Michael Morgan
Managing Director
Morgan Analytics Research Institute
Dallas TX
------------------------------