How to prevent duplicate tags in Websources Script?

LyricsLover · February 21, 2022, 9:10am

I am facing the following problem in a Websources Script:

When I loop through a JSON result file with this code snippet

OutputTo "SOLOIST"          # A Solo artist can play the same instrument on different days
json_select "artist"        # therefore he/she appears multiple times on a MusicBrainz recording
json_foreach "artist"
   json_select "name"
      SayRest
json_foreach_end

the content for the SOLOIST-per-recording in the Websource script looks like this:

|Isaac Stern\\Alexander Zakin\\Isaac Stern\\Alexander Zakin\\

As you can see, the artists are doubled in this case. As soon as applied, this creates 4 SOLOIST tags
SOLOIST Isaac Stern
SOLOIST Alexander Zakin
SOLOIST Isaac Stern
SOLOIST Alexander Zakin
This is an example. A Solo artist could appear many more times and in random sort order.

Do you know a Webscript way to prevent writing such identical information into tags?

The de-duplicated content should look like this for the above case, showing every Solo artist only once:
SOLOIST Isaac Stern
SOLOIST Alexander Zakin

With this regular expression (used in a Mp3tag Action, after using the Webscript)
$regexp($meta_sep(SOLOIST,\\),'(\b\w+\b),? ?-? ?(?=.*\1)',)
we can eliminate the duplicate artists afterwards.
But I would rather detect it already in the source script and prefer not to write into the tracks at all.

Any idea how to do that?

LyricsLover · July 22, 2025, 7:53am

I just found a case where this doesn't work as expected.
If the Name is the same for two SOLOIST (as "Amott" in this example)

the first entry will be reduced to:

Please let me know if there is a way to reduce duplicate entries in a Websources Script.

Florian · July 22, 2025, 8:46am

It's unfortunately currently not possible from a Web Source.

You'd have to run an action Remove duplicate fields with Only duplicated fields with same content enabled to post-process the data.

LyricsLover · July 22, 2025, 8:54am

Thank you @Florian

The mentioned Action

for above example produces the expected 5 (deduplicated) SOLOIST fields:

yorickausyps · August 4, 2025, 9:05am

The deduplication for e.g. names in a list can be done in a websource script in the following way.

Before we store the new name in the output buffer “soloist” containing the list of names we have to check if it is already in this output buffer.
The command findinline seems to do right that, but unfortunately it works the other way around: It finds a given string in the current input line. Because we are searching for a string in another output buffer, we first have to exchange the contents of the two buffers.
We store the new name into in a temporary used output buffer “temp_name” and then load the content of the output buffer “soloist” with the command regexpreplace "^.*" "%soloist%" in the current input line.
But two more things have to be taken care of:
To actually test if the command findinline has found a match we mark the end of the line with a vertical bar | and to avoid a substring to trigger a match we have to enclose the strings with our delimiter \\.
Your example now can be extended like that:

set "SOLOIST" 
json_select "artist"
json_foreach "artist"
   json_select "name"
      set "temp_name"
      outputto "temp_name"
      SayRest
      regexpreplace "^.*" "\\\\%soloist%\\\\|"
      findinline "\\%temp_name%\\" 1 1
      movechar -1
      if "|"
         outputto "soloist"
         ifoutput "soloist"
            say "\\\\"
         endif
         sayoutput "temp_name"
      endif
json_foreach_end

It uses the new feature available since version 3.23 to reference contents of output buffers via %output% in all string parameters.
The "feature" to overload the input line with the regexreplace "^.*" "string" is very valuable for writing websource scripts, that do "currently impossible" things.

Hope this helps.

PS The number of backslashs was not correct. Thanks to Output referral and special characters - #2 by rboss

rboss · August 11, 2025, 8:49pm

I've ran into a similar situation on a MB script of mine, where I wanted to remove duplicates of media entries for music collections. On the cases where there were -let's say- 7x CD + 3x Blu-Ray + 1x vinyl, at some point I would end up with a string like:

"CD,CD,CD,CD,CD,CD,CD,Blu-Ray,Blu-Ray,Blu-Ray,Vinyl"

But for an index I was only interested in the medium type and not number, so only 1 of each medium was desired.
I also wanted an efficient solution: preferably without leaving the current buffer output, avoiding JSON commands (if possible), and using a minimum amount of commands.

Here's what I came up with:

RegexpReplace "^.*" ",${0},#"

Do
	FindInLine "," 1 1
	IfNot "#"
		OutputTo "str0"
		SayUntil ","
		Replace "%str0%," ""
		RegexpReplace "^.*" "${0}, %str0%"
		Set "str0"
	EndIf
While ","

Replace ",#, " ""

Breakdown:

The first regex adds a leading delimiter matching those in the string, and a trailing delimiter + 'break' character to escape the loop.
Then a Do... While to do the following (on each loop):

read one entry (up to its respective delimiter) into a temporary output;
use Replace to remove ALL instances of the entry from the string (including delimiters);
append 1 instance of the removed entry(ies) to the end of the string (past the break character);
reset the temporary output.

At the end of the loop, the string becomes something like
",#, CD, Blu-Ray, Vinyl"

where a final Replace ",#, " "" removes the leading unnecessary characters + whitespace.

I think there's room for improvements, but -with some modifications- a similar approach might work for your current problem.

Hope this helps.