Moving part of TITLE to GENRE

Zerow · June 27, 2015, 11:29am

Thank you very much for this code and it's upgrade

I tested it, on real cases, and I could notice only one issue

##90 ##99  SOMETHING-NOT-ON-THE-LIST   ##02   ##01  ##SOMETHING-NOT-ON-THE-LIST

gives

##01  ##02  ##90

instead of

##01  ##02  ##90  ##99

Apparently because it:
1] sees the marker "##"
2] reads "99"
3] continues to read to the next marker [?]
4] compares read value with the list
5] cuts out than long [merged] characters

The hypothetical "SOMETHING-NOT-ON-THE-LIST" should not be there in the first place [and is most probably to be get rid off in this process of cleaning]. But, as proven by this example, it would be better, if the code would look for a marker and then read from its beginning to the last character adjacent to it [somewhat in "as a whole word" way]. Because it is batter to loose only "SOMETHING-NOT-ON-THE-LIST" and "##SOMETHING-NOT-ON-THE-LIST" than a "##listed_number" + "SOMETHING-NOT-ON-THE-LIST"

[And if you're wondering why there are word with numbers and not just numbers, it's because I have to be absolutely sure what words I want to code with numbers. Because one I'll start using numbers, any future change would bring a lot of potential errors coming out from the changed and re-changed codes and difficulty in re-memorizing. Right now I use only some umbers from the range of 1-10 and 50-99, and after dealing with 20% of my files I need to do cleaning / evaluation]

Zerow · June 27, 2015, 11:45am

On the other note: there is an visual issue with a too long list

DetlevD · June 27, 2015, 1:02pm

... you have to make sure, that the LIST_SORT text string offers the basic set of elements, ...
i. e. the list of all possible items, in the order of your desire.
The regular expressions in the algorithm can only work with items, remove items, return items, ...
which have been defined a priori.

You may try this modification, whether it works better for you or not ...
$regexp($regexp($regexp(%LIST_SORT%,'\b('$regexp(%LIST_SORT%,'(?>\b('%LIST_IN%')\b)',)')\b',),'\|+','|'),'\|$',)

DD.20150627.1814.CEST

Zerow · June 28, 2015, 11:33am

I understand the base on which this code is build

And also that system of mine requires the existence of a marker, so theoretically this should not be an issue at all

But it would be better if the code [in aforementioned example] would show that "##99"; because it I will most likely spot "SOMETHING-NOT-ON-THE-LIST" [as not beginning with "##"] and the removal of ##SOMETHING-NOT-ON-THE-LIST is most likely not to occur at all [and if it occurs then I will probably miss it]

And to achieve that, aside from making an impossible list containing all possible combinations / errors, I would additionally need to [before running the main code]:
cut every character [except pause], that is not in a string that begins with sign "#"

This would simply serve as a workaround / fails safe. Because

##SOMETHING-NOT-ON-THE-LIST

would stay [and be cut out in next action], and the

SOMETHING-NOT-ON-THE-LIST

would be cut out. [And as I explained above, the difference for me comes to the probability of occurring]

Can you provide me please with this additional code?

DetlevD · June 28, 2015, 3:18pm

Do you want a process, which runs prior to the already known sort process?
Hmm, the following expression should deliver all items, ...
which are member of the input list, but not member of the basic list of sorted items.
$regexp(%LIST_IN%,'(?>\b('%LIST_SORT%')\b)',)

DD.20150628.1918.CEST

Example:

The input list to work on ...
LIST_IN <== 'BAD1|111|101|01|09|WATER|WATERFALL|BAD2|04|VILLAGE|06|UNDERWATER|03|10|11|BAD3'
Note: The bad values are 'BAD1','BAD2','BAD3' ... and should be removed.

The basic sorted list of allowed items ...
LIST_SORT <== '01|02|03|04|05|06|07|08|09|10|11|101|111|UNDERWATER|VILLAGE|WATER|WATERFALL'

Get all items from LIST_IN, which are not member of LIST_SORT ...
LIST_TMP1 <== $regexp($regexp($regexp(%LIST_IN%,'(?>\b('%LIST_SORT%')\b)',),'\|+','|'),'^\||\|$',)

LIST_TMP1 = BAD1|BAD2|BAD3

Remove all items from LIST_IN, which are member of LIST_TMP1 ...
LIST_TMP2 <== $regexp($regexp($regexp(%LIST_IN%,'(?>\b('%LIST_TMP1%')\b)',),'\|+','|'),'^\||\|$',)

LIST_TMP2 = 111|101|01|09|WATER|WATERFALL|04|VILLAGE|06|UNDERWATER|03|10|11

Get all items from LIST_SORT, which are not member of LIST_TMP2 ...
LIST_TMP3 <== $regexp($regexp($regexp(%LIST_SORT%,'(?>\b('%LIST_TMP2%')\b)',),'\|+','|'),'^\||\|$',)

LIST_TMP3 = 02|05|07|08

Remove all items from LIST_SORT, which are member of LIST_TMP3, so get the sorted list output ...
LIST_OUT <== $regexp($regexp($regexp(%LIST_SORT%,'(?>\b('%LIST_TMP3%')\b)',),'\|+','|'),'^\||\|$',)

LIST_OUT = 01|03|04|06|09|10|11|101|111|UNDERWATER|VILLAGE|WATER|WATERFALL

Remove tag-fields LIST_TMP1, LIST_TMP2, LIST_TMP3, LIST_SORT, LIST_IN ...
LIST_TMP1 <== $char(0)
LIST_TMP2 <== $char(0)
LIST_TMP3 <== $char(0)
LIST_SORT <== $char(0)
LIST_IN <== $char(0)  Test2015_20150701.zerow.sort.mta (1.14 KB)
DD.20150701.1006.CEST, DD.20150701.1532.CEST

Test2015_20150701.zerow.sort.mta (1.14 KB)

Zerow · June 30, 2015, 5:00pm

Yes

It should be step 0 in this already established and tested action group

QUOTE (zerow @ Jun 24 2015, 23:51) <{POST_SNAPBACK}>

[..]

Here is my action group: 1] Guess values %GENRE%: ##%GENRE%

2] Replace "GENRE": "##" -> |

3] Replace "GENRE": "  " -> ""

4] Format value "GENRE_ORDER": '00|01|02|03|04|05|06|07|08|09|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|2 5|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|TRIBAL
| UNDERWATER|VILLAGE|WASTELAND|WATER|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|
7 5|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|98|99|9X'

5] Format value "GENRE": $regexp($regexp($regexp(%GENRE_ORDER%,$regexp(%GENRE_ORDER%,'(?>('%GENRE%'))',),),'\|+','|'),'\|$',)

6] Replace "GENRE": "|" -> "  ##"

7] Format value "GENRE": $trim(%GENRE%)

8] Format value "GENRE": ##%GENRE% It works if run it on a GENRE tag that has values in format     ##00  ##00  ##AAAAAAA  ##00  ##BBB [two hastags + two digit number or predifined capitalized word + two pauses]

[...]

If I use it on

##90  ##99  SOMETHING-NOT-ON-THE-LIST  ##02  ##01  ##SOMETHING-NOT-ON-THE-LIST

in place of step 5, it delivers

##90  ##99  SOMETHING-NOT-ON-THE-LIST  ##02  ##01  ##SOMETHING-NOT-ON-THE-LIST

If I use it between step 4 and 5, it delivers empty field

I think it would be easier and safer for me, knowing the kind of mistakes I can possibly make in this system of mine, if I would just cut out at the very begining all

SOMETHING-NOT-ON-THE-LIST

and leave all

#SOMETHING-NOT-ON-THE-LIST

which wiould also automatically leave all the
##SOMETHING-NOT-ON-THE-LIST

DetlevD · July 1, 2015, 6:08am

See example there ...
Moving part of TITLE to GENRE

DD.20150701.1008.CEST

Zerow · July 1, 2015, 8:41am

The provided example leaves me with empty LIST_IN; first it singles out errors [unwanted stuff not on the LIST_SORT] and then it cuts them out [leaving blank space]. What about all the stuff [listed on LIST_SORT] that should be left in LIST_IN. How is that suppose to happen???

I'm telling you, it will be safer for my system, if I could [before establishing LIST_SORT] cut out from LIST_IN strings of from any kind of characters not beginning with "#"

DetlevD · July 1, 2015, 11:34am

See new example there ...
Moving part of TITLE to GENRE

DD.20150701.1534.CEST

Zerow · July 2, 2015, 2:41pm

I am that stupid or this just gives out me every time

01|02|03|04|05|06|07|08|09|10|11|101|111|UNDERWATER|VILLAGE|WATER|WATERFALL'

in the LIST_OUT, no matter what value was in the LIST_IN

But I've also came up with a workaround solution, without creating so many additional tag fields:

1]
Replace "GENRE": " " -> "##"

2]
Replace "GENRE": "####" -> |

3]
Replace "GENRE": "##" -> |

4]
Replace "GENRE": " " -> ""

5]
Replace "GENRE": "\+|" -> "|"


6]
Replace "GENRE": "|ACTION" -> "|11"
[...]
Replace "GENRE": "|TRIBAL" -> "|38"
Replace "GENRE": "|UNDERWATER" -> "|39"
Replace "GENRE": "|VILLAGE" -> "|41"
Replace "GENRE": "|WASTELAND" -> "|42"
Replace "GENRE": "|WATER" -> "|43"

7]
Replace "GENRE": "|" -> "|0"

8]
Guess values %GENRE%: |%GENRE%

9]
Format value "GENRE_ORDER": '000|001|002|003|004|005|006|007|008|009|010|011|012|013|014|015|016|017|018|
019|020|021|022|023|024|025|026|027|028|029|030|031|032|033|034|035|036|037|038|
0
39|040|041|042|043|044|045|046|047|048|049|050|051|052|053|054|055|056|057|058|0
5
9|060|061|062|063|064|065|066|067|068|069|070|071|072|073|074|075|076|077|078|07
9
|080|081|082|083|084|085|086|087|088|089|090|091|092|093|094|095|096|097|098|099
|
09X'

10]
Format value "GENRE": $regexp($regexp($regexp(%GENRE_ORDER%,$regexp(%GENRE_ORDER%,'(?>\b('%GENRE%')\b)',),),'\|+','|'),'\|$',)

11]
Replace "GENRE": "011" -> "ACTION"
[...]
Replace "GENRE": "038" -> ""TRIBAL"
Replace "GENRE": "039" -> "UNDERWATER"
Replace "GENRE": "041" -> "VILLAGE"
Replace "GENRE": "042" -> "WASTELAND"
Replace "GENRE": "043" -> "WATER"

12]
Replace "GENRE": "|" -> "  ##"

13]
Format value "GENRE": $trim(%GENRE%)

14]
Format value "GENRE": ##%GENRE%

15]
Replace "GENRE": "##0" -> "##"

16]
Replace "GENRE": "##" -> "" (only as a whole word)

17]
Replace "GENRE": "####" -> "##"

18]
Format value "GENRE_ORDER": $char(0)

At first I seemed that all I had to do with my previous version [while having already the data in aforementioned format "##00 ##00 ##ABCABCABC ##00", with the addition of possible errors like "##SOMETHING-NOT-ON-THE-LIST ABCABCABC"] was to just add at the beginning a simple

Replace "GENRE": "  " -> "##
"Replace "GENRE": "####" -> "|"

and eventually change code words to code numbers- but then of course in further testing the WATER came at me with its full power

But now this whole process works even better than I intended; because:
A] If I have a semi-correct "WATER" [without the marker], during the whole process it will automatically become fully correct "##WATER" [with the marker]
B] There is no more UNDERWATER / WATER / WATERFALL issue [by the way; thank you for directing my attention to WATERFALL variation of this glitch]
C] It will be easier to apply changes to temporary digit codes and much more error proof [that why I used a separate entry for every word-to-number / number-to-word replacement, instead of $replace command with a long list

Plus aside from that one main crucial line, all of those actions are readable for me and not just some bunch of %(|'$,WTF?). And that will come in handy in the future if I decide to make some big change and will have to adjust. And of course the minus is having not only a GENRE_ORDER / LIST_SORT but also kind of manually repeating it two times

[And when I was designing this system I wanted to use only numbers. But as I've processed 20% of my files, I can see now that without this word descriptions I would not be so efficient in categorizing music for RPG purposes. For example I use "ACTION" as code for ACTION / CHASE / FIGHT, which further explains what kind of music should be in this group]

DetlevD · July 2, 2015, 4:51pm

The proposed sample code ...
Moving part of TITLE to GENRE
... should always return the same result, because the sample code has no room for unexpected errors.
Your experience suggests that there is any error on your side, when you run the sample code.

DD.20150702.2105.CEST

Zerow · July 3, 2015, 7:40am

I'm starting to see now, how I can use provided codes from your sample

But am I right thinking that this still does not deal with the issue of UNDERWATER / WATER / WATERFALL? nOr should I read them again and again, till it becomes clear to me

On my previous tests, depending if [not used by me ] WATERALL was listed on my GENRE_ORDER / LIST_SORT, I could be left with such results like

##FALL
##UNDER

[And I've dealt with that with a long workaround]

DetlevD · July 3, 2015, 8:41am

You may apply the given sample mta actionsgroup against a dummy test file, ...
and use extended tag view to see what has happened.

You may remove the last step from the given sample actionsgroup, ...
which is the removing of tag-fields, ...
and run the test again, ...
and use extended tag view to see what has happened for each step.

The actionsgroup reflects the problem of the set theory ...
LIST_OUT = LIST _SORT \ (LIST_SORT \ (LIST_IN \ (LIST_IN \ LIST_SORT)))

DD.20150703.1241.CEST

Zerow · July 9, 2015, 4:36pm

OK, I'll think about some more and eventually do some further tests

Thank you for all the info