Thank you very much for this code and it's upgrade
I tested it, on real cases, and I could notice only one issue
##90 ##99 SOMETHING-NOT-ON-THE-LIST ##02 ##01 ##SOMETHING-NOT-ON-THE-LIST
gives
##01 ##02 ##90
instead of
##01 ##02 ##90 ##99
Apparently because it:
1] sees the marker "##"
2] reads "99"
3] continues to read to the next marker [?]
4] compares read value with the list
5] cuts out than long [merged] characters
The hypothetical "SOMETHING-NOT-ON-THE-LIST" should not be there in the first place [and is most probably to be get rid off in this process of cleaning]. But, as proven by this example, it would be better, if the code would look for a marker and then read from its beginning to the last character adjacent to it [somewhat in "as a whole word" way]. Because it is batter to loose only "SOMETHING-NOT-ON-THE-LIST" and "##SOMETHING-NOT-ON-THE-LIST" than a "##listed_number" + "SOMETHING-NOT-ON-THE-LIST"
[And if you're wondering why there are word with numbers and not just numbers, it's because I have to be absolutely sure what words I want to code with numbers. Because one I'll start using numbers, any future change would bring a lot of potential errors coming out from the changed and re-changed codes and difficulty in re-memorizing. Right now I use only some umbers from the range of 1-10 and 50-99, and after dealing with 20% of my files I need to do cleaning / evaluation]