Case Conversion Help (Building an action script)

Samael · May 2, 2019, 8:18pm

I'm working on a case conversion action script. So that case is more accurately reflected and spacing more consistent. For example:

The Road To The Isles/Glendaruel Highlanders/The Old Rusitc Bridge By The Mill
would be converted to:
The Road to the Isles / Glendaruel Highlanders / The Old Rusitc Bridge by the Mill
and
Chapter V - (Part I)Prophecy Fullfilled,(Part Ii)And The Dark Night Entered
would be converted to:
Chapter V - (Part I) Prophecy Fullfilled, (Part II) and the Dark Night Entered

There are a couple outlying scenarios that have been eluding me an my regex is not all that great..

Acronyms...
Any thought on how do this? As an example...
I'd like to be able to convert things like
W.y.s.i.w.y.g. (Lp Version) to W.Y.S.I.W.Y.G. (Lp Version)
and
4A.m. to 4A.M.
and
T.s.o.l. / Nothing for You to T.S.O.L. / Nothing for You

There are little things like LP, EP, CD, DJ that can be done easily, unless they're in brackets (As shown above), and they're not really worth the effort for how few they are. i.e. CD) or (CD. Don't necessarily want to do a global replace on two letter.

ohrenkino · May 2, 2019, 8:24pm

Here is an action set that considers the English way of capitalization:

All special words like DJ and LP would have to be treated with an action of the type "Replace" and not "Case conversion".

Abbreviations with special characters as separators can easily be converted with an action of the type Case Conversion, type word-wise where you specify only the . as word separator.

You can assemble all the various way to convert in one action group.

Samael · May 2, 2019, 9:29pm

Thanks, but not quite as feature rich as I'm after, though I might be able to use the Regex to simplify mine, and it's missing a bunch of words that should be lower case (Not that mine isn't missing words as well). To make it consistent across the library, I do a bunch of data cleansing along with the case conversion.

Convert _ (Underscore) and %20 to space
Convert ~ and : to -
Convert ` to '
Convert [ and { to (
Convert ] and } to )
Set the spacing around "-" and "/" so there is a single space on either side. It messes with dates and hyphenated words, but those are pretty few and far between.
Set the spacing around bracket pairs () so there is a single space on either side
Remove double spaces
Ensure the first letter is a capitol (Unless it is non-alphabetic)
Ensure the first letter in the last word is capitol (Unless it is non-alphabetic)
I try to identify and fix the case of Roman numerals as well (For numbers I to XV)

I'm not following what you mean for the abbreviations. Case Conversion only gives you the options to choose type of conversion and "Words that begin from/after", which won't necessarily work. I mean it will work, but it will catch other stuff too, like ...and Justice for All, Since the Creation... and Bring Your Daughter... to the Slaughter.

To do the actual conversions, I do some of the initial formatting fixes listed above, then I do a Case Conversion at the start to put the titles in a "predictable" starting state (The Case Conversion delivered with the software), then I use a combination of Format Value and Replace to get it in the form I want.

My action group is large, 117 actions at the moment, mostly because I suck at RegEx, so every word is a single action. 30 actions alone, just for the Roman numerals.

LyricsLover · May 3, 2019, 7:11am

A quick search in this forum and you get:

1 x Action "Replace with regular expression" replaces your other 29 actions of this type.

Samael · May 3, 2019, 3:28pm

Thanks for the link. Like I said, RegEx isn't my forte, beyond the very simple. I'll have to test it to make sure it doesn't do things like convert the word Six to SIX, which I assume it does not, but will convert something like Vii) to VII)...a fairly easy fix if it doesn't. If I could use SQL or PL/SQL, I'd be golden. Thinking about it now, there is something I could have done to reduce my 30 actions to 15 for the Roman numerals +4 for formatting manipulations to make the Roman numerals work properly. I'm not even really using the built in $regexp properly, for the most part, but now that I see how this Roman numeral one is built, I think I can also figure out my acronym issue.

ohrenkino · May 3, 2019, 3:49pm

It should be possible to replace two consecutive full stops plus an upper case letter with 2 consecutive full stops plus a lower case letter.