Roman numbers in one single regular expression (regexp)

vkostas · November 19, 2014, 12:20am

Roman numbers from 1 to 3999.

Understanding roman numbers. They are split in 4 optional sections in following high-low priority. The higher the priority, the more left a section is placed. Sections priority is always respected. Sections 2-4 have identical logic (only they letters change). 1000-3000: M|MM|MMM 100-900: C|CC|CCC|CD|D|DC|DCC|DCCC|CM 10-90: X|XX|XXX|XL|L|LX|LXX|LXXX|XC 1-9: I|II|III|IV|V|VI|VII|VIII|IX
e.g. 3746 = 3000 + 700 + 40 + 6 = MMM DCC XL VI = MMMDCCXLVI
Normalize regexp (1) 1000-3000: M{1,3} 100-900: C{1,3}|CD|D|DC{1,3}|CM 10-90: X{1,3}|XL|L|LX{1,3}|XC 1-9: I{1,3}|IV|V|VI{1,3}|IX
Normalize regexp (2) 1000-3000: M{1,3} 100-900: C{1,3}|D|DC{1,3}|CD|CM 10-90: X{1,3}|L|LX{1,3}|XL|XC 1-9: I{1,3}|V|VI{1,3}|IV|IX
Normalize regexp (3) 1000-3000: M{1,3} 100-900: C{1,3}|D|DC{1,3}|C[DM] 10-90: X{1,3}|L|LX{1,3}|X[LC] 1-9: I{1,3}|V|VI{1,3}|I[VX]
Normalize regexp (4) 1000-3000: M{1,3} 100-900: C[DM]|D|D?C{1,3} 10-90: X[LC]|L|L?X{1,3} 1-9: I[VX]|V|V?I{1,3}
Normalize regexp (5) 1000-3000: M{0,3} 100-900: C[DM]|D?C{0,3} 10-90: X[LC]|L?X{0,3} 1-9: I[VX]|V?I{0,3}
Normalize regexp (5a) an equivalent approach of (5) 1000-3000: M{0,3} 100-900: C[DM]|D?C?C?C? 10-90: X[LC]|L?X?X?X? 1-9: I[VX]|V?I?I?I?
Normalize regexp (6) (M{0,3})(C[DM]|D?C{0,3})(X[LC]|L?X{0,3})(I[VX]|V?I{0,3})
Normalize regexp (7) add word anchors \b(M{0,3})(C[DM]|D?C{0,3})(X[LC]|L?X{0,3})(I[VX]|V?I{0,3})\b
Normalize regexp (8) exclude zero-length match \b(?=[MDCLXVI])(M{0,3})(C[DM]|D?C{0,3})(X[LC]|L?X{0,3})(I[VX]|V?I{0,3})\b
Final search pattern \b(?=[MDCLXVI])(M{0,3})(C[DM]|D?C{0,3})(X[LC]|L?X{0,3})(I[VX]|V?I{0,3})\b or \b(?=[MDCLXVI])(M?M?M?)(C[DM]|D?C?C?C?)(X[LC]|L?X?X?X?)(I[VX]|V?I?I?I?)\b

Usage
Action type: Replace with regular expression.
Field: (Title, etc)
Regular expression: \b(?=[MDCLXVI])(M{0,3})(C[DM]|D?C{0,3})(X[LC]|L?X{0,3})(I[VX]|V?I{0,3})\b
Replace with: $upper($0)

Regexp in Format Action type does nothing (is it a bug???? )

$regexp(%tag%,'\b(?=[MDCLXVI])(M{0,3})(C[DM]|D?C{0,3})(X[LC]|L?X{0,3})(I[VX]|V?I{0,3})\b',$upper($0))

RomanNumbers.txt (59.5 KB)

DetlevD · November 20, 2014, 6:48am

It is not allowed to use a function in the replacement part of the function $regexp.
You have to apply regex syntax.
Search the Mp3tag Forums contributions, it has been said already.

It is not necessary, for every new questioner, to reinvent the wheel.
A search in the Mp3tag Forums contributions provides all answer.
But it is nice to see that you have simplified one of the given regular expressions even further.

I have created a Mp3tag mte export script, which visualizes the results of four attempts using regular expressions, which are able to upcase Roman Numerals in different quality.
The output from this profiling test can help to decide, what will the best for this case or for the other case.

To run the "Export.TXT.20150211.Test.RomanNum.Upcase.3.mte" export script, ...
there has to be selected only one file, which is totally not important.
20150211.Test.RomanNum.Upcase.zip (2.45 KB)
See also ...
Roman Numerals

DD.20141120.0847.CET, DD.20150211.1353.CET

20150211.Test.RomanNum.Upcase.zip (2.45 KB)

Nature · December 18, 2014, 11:40pm

This seems to be a very good and thotough explanation, which is always good to find.
So, my thanks for it.

As for the reason why it doesn't work, I'm not going to test it, but a few things I learnt by using mp3 with RegEx:

Mp3Tag doesn't deal well with a few thinhs of RegEx, like the '\b', so avoit it at all costs. Use a space instead (\s) if it's possible.
If you're using data from an imported file you always need to provide: '.*' before your data and the same after, even if you're sure there is no more data. That's how it works in practice.

I know this a bit vague, I'm sorry, but I don't have the time to go into technical details right now.