Splitting a title with mixed English and non-English

Suppose I have song title in which the first part is in English and then the second part is title within the song's language. So like "I'm Watching a Loneliness Just Arisen 나는 새롭게 떠오른 외로움을 봐요", where the first half is English and the second half is the Korean name.

Is there a $regexp formula I can use so that I can capture group the English part and then the Korean part? I want to move the Korean part to the %title% tag and then the English part to the %titlesort% tag.

As per the example I mentioned above, I know that the language switch happens with a whitespace (\s) in the middle. :slightly_smiling_face:

Appreciate the help in advance!

Will it always be two different scripts, not just languages? Can the English part contain diacritic marks like accents and umlauts?

Great question! For simplicity’s sake, yes, let’s consider the case where the first language uses Latin script and the second language uses something non-Latin. So yeah, this could cover Korean, Japanese, Hindi, etc.

$regexp(%title%,'[^\x00-\x7F]+',)
leaves the ASCII part
$regexp(%title%,'[\x00-\x7F]+',)
gives the non-ascii part.

So an action of the type "Guess value" could work:
Source pattern: $regexp(%title%,'[^\x00-\x7F]+',)===$regexp(%title%,'[\x00-\x7F]+',)
Target string: %titlesort%===%title%