How to Say to output without trimming whitespace?

mr_bug · May 19, 2021, 2:31pm

I'm trying to write a JSON web sources script but I'm having an issue being able to output the current line/value as-is without any whitespace trimming happening.

A random example of a MusicBrainz release in JSON: https://musicbrainz.org/ws/2/release/b1ba816e-2da4-4bf6-9c2b-4a743cb19b65?fmt=json&inc=artists

The part giving me trouble is the artist-credit array, specifically artist-credit.0.joinphrase. Its value is " & ": an ampersand with a leading and trailing space.

If I use SayRest it'll trim both the leading and trailing whitespace, resulting in Artist1&Artist2.

If I use SayRegexp ".*", I'll get Artist1& Artist2, only trimming the leading whitespace.

How can I Say the value without any trimming happening? I.e. I want the result: Artist1 & Artist2.

My current approach is to try and at least get the formatting somewhat correct using:

json_foreach "artist-credit"
    ifnot "0"
        json_select "joinphrase"
        ifnot ""
            unspace
            if ","
                sayregexp ".*"
            else
                say " "
                sayregexp ".*"
            endif
        endif
    endif
json_foreach_end

But due to how releases are credited, this is bound to mess up on some attempts at tagging. Ideally, I wouldn't have to write any custom ifs and be able to output the joinphrase value as-is, whitespace and everything.

Thanks.

Mp3tag v3.06a

LyricsLover · May 19, 2021, 3:20pm

Sorry, I can't help you with a working websource syntax.

But I would like to add to

that (AFAIK) a join phrase on MusicBrainz can be "everything", from a space to a Slash or " feat. " or whatever splitting characters are written on the release between the collaborating artists or groups.

mr_bug · May 19, 2021, 3:31pm

Thanks for the reply. I am in fact aware of this. The simplistic code above would actually work with the example you gave and that was its point: to work most of the time. But yes, it'll mess up on any fancier, less standard, joinphrases, which is what I'm trying to avoid by being able to output the joinphrases exactly as they are written.

mr_bug · May 24, 2021, 5:22pm

@Florian Sorry to tag you directly, but could you confirm whether this is a bug or not? Any solution? Thanks.

Florian · May 25, 2021, 11:50am

I'd not consider it a bug, more like a limitation of the feature.

Some background information: When the web-sources language was developed, most of the input was HTML with a varying amount of spaces used throughout the documents. Trimming spaces was a nice feature to not break the web sources on tiny changes of the source documents. Many of the existing sources still rely on that feature, so I won't change this and break backward compatibility.

Can you give some context on why you need this? Depending on your answer, I might be able to convince myself for adding support for that.

mr_bug · May 25, 2021, 4:37pm

Thanks for the reply.

The context is pretty much the original post.

The joinphrases as seen in the second post's screenshot is what's giving trouble. Due to MusicBrainz's rules, the joinphrases could be formatted in an innumerable amount of ways depending on the release's styling, so it's not a matter of simply checking for joinphrase ",", and prepending a space if "," isn't present (see first post). For example, a release may be intentionally formatted with a space preceding the comma, i.e. " , ", or more likely, the joinphrase may be "+" (no spaces) and my if would be incorrectly prepending a space in that case; I can't account for all the permutations. It all depends on the release's formatting, and I'd like to write tags identically to what is returned by the API, eliminating any guesswork.

Maybe there's another way to accomplish my goal? The only other thought I had after looking at Mp3tag's included MusicBrainz XML script is that maybe the JSON can be RegexpReplaced to join the artists and joinphrase strings into a single value, and then write that value per-track into %artist% and %albumartist%. Since joinphrases are written between artists, the joinphrases would be identical (including any pre and post whitespace) I think, but other than a RegexpReplace being possibly complex to implement reliably due to what the API may return, it could also leave some unforeseen edge cases producing wrong formatting for artist names for example.

It could be that I'm overcomplicating things; I don't have much experience with these scripts. If you were to implement a way to read a value without processing it in any way, maybe it'd be a SayRegexpRaw command for example.

mr_bug · June 7, 2021, 3:52pm

Seems I overwrote my previous post accidentally. Here it is for context:

@Florian What do you think of the idea of a SayRegexpRaw command? It wouldn't break backwards compatibility, and would let one read the values exactly as they are, pre and post whitespace too.

I'll respect your wishes if you won't change anything. I'll have to try and figure out if RegexpReplaceing the JSON in some way is the solution (if even possible), e.g. surrounding pre/post whitespace in joinphrase values with a unique symbol that can then be stripped away after tagging via an Mp3tag Action. If I could read the value as-is, I'd need no post processing step.

It turns out my idea works:

At the top of [ParserScriptAlbum]=... section I write:

json "on"
regexpreplace "\"joinphrase\":\"([^\"]+)\"" "\"joinphrase\":\"#<<$1>>#\""
json "on" "current" # Makes any prior replacements part of the input going forward.

The regexpreplace surrounds the value of the joinphrase keys in #<< and >>#. The value can then be read with a sayrest, most importantly, without Mp3tag performing any stripping of pre/post whitespace thanks to the values ending in our symbol strings, not whitespace.

These symbol delimiters can then be removed using an Mp3tag Action after tagging the files, e.g. via a Replace action for the ARTIST field, original: "#<<", new: "" i.e. nothing (the double-quotes are only present for illustrative purposes). What should be left is the original value, including any pre/post whitespace.

At least it works, but as stated, ideally Mp3tag could just read the value without trying to be helpful and failing in this case.

mr_bug · July 20, 2021, 1:02pm

Using #<< and >># for joinphrases only works for the [ParserScriptAlbum]=... section as these symbols can be removed via an Mp3tag Action after writing the tags containing them to a file.

For [ParserScriptIndex]=... however, the joinphrases are only ever displayed, never written, so they only need to be visually accurate, not technically accurate (unlike for [ParserScriptAlbum]=...). To display artists with visually accurate joinphrases in the list of search results, a different symbol can be used: an invisible zero-width character, such as "‍" (between the quotes). The character can also be copied from here.

# Invisible zero-width joiner character present before and after `$1`.
regexpreplace "\"joinphrase\":\"([^\"]+)\"" "\"joinphrase\":\"‍$1‍\""

Mp3tag doesn't consider the ZWJ character a whitespace character and so will not strip it when using sayrest or sayregexp ".*", again, preserving pre and post whitespace on joinphrases.

Florian · December 13, 2021, 3:37pm

Some progress: