How to measure script efficiency?

rboss · September 28, 2024, 8:26pm

Hello everyone;

Recently I came across a situation that made me re-evaluate how I measure script efficiency.
Up to now, as rule of thumb, I have used the debug output file's size as a way to quickly see how much text is generated by a script, and use that value as a benchmark of that script's overall efficiency.

If, by tweaking some lines or trimming some steps, I can get -for the same search input and same results- a smaller debug file, I consider it as leaner, and therefore better, script.

While testing a particular script, I had written an IfVar condition that would run several steps more if enabled; however the debug output file size was actually smaller (in a measurable amount) with the option enabled than with it disabled, despite one having many tens of steps more than the other.

This got me thinking on what makes a script efficient and how to perform a valid benchmark.
The following is unrelated to the above example; but take this example script:

json_select "country"
OutputTo "variable1"
SayNChars 1
OutputTo "variable2"
SayNChars 1
RegexpReplace "^.*" "aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ"
OutputTo "country_CAPS"
FindInLine "%variable1%"
SayNChars 1
GotoChar 1
FindInLine "%variable2%"
SayNChars 1
Set "variable1" ""
Set "variable2" ""

This example can be used to convert ISO 3166-1 alpha-2 2-letter country codes from lower to uppercase.
This next example does exactly the same, but with a difference:

json_select "country"
OutputTo "variable1"
SayNChars 1
OutputTo "variable2"
SayNChars 1
RegexpReplace "^.*" "aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZaAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ"
OutputTo "country_CAPS"
SayRegexp "%variable1%"
SayRegexp "%variable2%"
Set "variable1" ""
Set "variable2" ""

In the first example, the conversion is achieved in 14 steps, and uses regex once.
In the 2nd example, the conversion is achieved in 13 steps, but uses regex 3 times, and on a longer string.

What I'm trying to understand is this:
Is the one step saved by using multiple instances of regex somehow nullified by the extra resource load of using regex (and on a longer string, which will "fatten" a debug output file)?

Of course, in one or 2 instances this is moot; but in several dozens -or hundreds- of repetitions, a script with less steps may not be faster (more efficient?) than one with more steps, but with "simpler" commands.

Has anyone came across a similar situation before?
What's your opinion? How to best evaluate a script?

Thank you in advance.

yorickausyps · September 29, 2024, 4:39pm

Hello,
I think the size of the debug file can only give a very rough first impression of the performance of a script. It would be much better to know how costly the commands are, in relation to each other. If we then count the number of executions of 'expensive' commands in the debug file, we could try to use cheaper commands, if these numbers are high.
I recently tried to perform sort of a frequncy analysis with notepad++ by counting a few costly commands but found it to be too tedious. Influenced by your post today I detected that there exists a analyze plugin to npp that probably allows to perform a simple frequency analysis of the used commands from the debug output file. I hope that such a frequncy table can help me to identify the 'hot spots' in my script.
IMHO a really simple and obvious example of efficent programming is to replace

say " ("
sayoutput "Temp_Value"
say ")"

by

say " (%Temp_Value%)"

My conclusion so far was to make use of the new syntax (since october 2023) to refer to output buffer contents by %output% wherever possible.

You could run such a test with e.g. 1000 loops with

do
   ....< your example code>
while "true" 1000

and measure the execution time by the clock at the wall or with task manager.
I assume the first one with two findinline is faster than compared to the other with two sayregexp instead, because regular expression processing should use more time than a simpler scan for a string.

rboss · September 29, 2024, 11:44pm

I have a similar train of thought. This is merely speculating, of course, but I'm thinking of a ranking system; of "cheaper" (as you mentioned) commands as having a value -or weight- of 1, and "expensive" commands as having a value of 1.2, or 1.5, 1.6, etc., which would vary according to the execution load of said command.

Then, to get a measurable/comparable benchmark, the formula

step 1 * 1 + step 2 * 1.2 + step 3 * 1.6 + .... + last step * load factor of last step

would be sound.

Hypothetically, if an "expensive" command had a load factor >2, then using 2 cheaper commands would be more efficient than that particular "expensive" command. But without a table of commands and their respective theoretical load factors, it's hard to make that validation.

Now that you mentioned it, I realize that the regex I used in the example was incorrect:

SayRegexp "%variable1%"
SayRegexp "%variable2%"

should be replaced with

SayRegexp "(?<=%variable1%)."
SayRegexp "(?<=%variable2%)."

in order for it to work as intended.
I did manage to get my point across, but apologies to any (other) readers are in order.

That is a good idea on paper, but one that I think would be hard to apply in a real world scenario. Depending on what processes a PC is running at any given time, even the same script ran at different times of the day would most likely produce different results. I've had occasions where the same script, with the same settings, same search query, and same results, produce slightly (a few bytes) different sized output files.

OS interference would need to be ruled out as a variable before running such a test; or the time-consuming task of running many times a script, and the recording and comparing each execution time in order to find a trend, would be needed.

And then repeat the whole process with every change made to a script, which would be more trouble than that change (and respective debugging) itself.

Out of curiosity, what is the name of the plugin?
I don't use npp in scripting (vanilla Windows Notepad works just fine, and it's available on machines other than my own, so even while at work I can write down ideas), but I can (maybe) try to use it to run an analysis of my own scripts and test if it's a viable tool to add to my scripting workflow.

LyricsLover · September 30, 2024, 6:42am

Just as a side note:
Notepad++ is also available as Portable Version.
One of its (many) advantages ist the Syntax Highlighting.

rboss · October 2, 2024, 2:23pm

@LyricsLover I had browsed your post on Syntax Highlighting for np++, set it up, opened one of my WSS, and thought "this is interesting, but I'll need to get acquainted with all the 'bells and whistles' of np++". This was in July.

I still want to look into it in depth, I but I just can't find the time......

But something I'm already familiar with is MS Excel, and this quote gave me the idea of trying to perform a similar analysis in a spreadsheet. With a few COUNTIF formulas and some formatting I got this

The Mp3Tag commands were lazily copied from the list of Parser commands found in the respective tag source development page.

It was interesting to find that, when copying the debug output file's content onto Excel I was getting this

The "A" column has ParserIndex output, and the "M" column has ParserAlbum output.
On the latter the line breaks are not being respected; but this has nothing to do with Mp3Tag.

Apparently pasting HTML strings containing tags like "<!DOCTYPE html>" or "<script>" in Excel does not only not paste those terms, but also disrupts the text formatting.

But the idea is sound, and if I may, I'd like to suggest for the Mp3Tag debug output file to include info regarding script execution, like

imagem

yorickausyps · October 2, 2024, 4:30pm

I also decided to use a sreadsheet for frequency analysis of executed commands and script-lines, using the debug file from mp3tag. With npp I opened the debug file and first selected all lines with "Command" and copied them to a new file, then I removed in every line all text up to the colon, getting lines with only the command itself remaining. Then I copied and pasted all these lines in a spreadsheet (calc from libreoffice). After finding instructions how to perform a frequency analysis with this command table I finally came to this result with my test case:

If there were weights for each of the commands the graph could be improved to better show the computing time used by them. I suspect that the regexp-type commands are the most expensive ones.

I also used the same procedure as above for the script-line info in the debug file, getting a table of line numbers and how often they had been executed. But because these line numbers are not always correct, the picture I got for the frequency of lines was not so clean:

This graph simply shows that the most used part of my script are the lines between 700 and 800, a fact that was actually clear in advance, because this part deals with the track list of the album. So I got not much news from that analysis.

rboss · October 9, 2024, 1:58pm

Also having a script line number analysis is a very good idea.

I added the same feature to my Excel spreadsheet, and for the same album as in your screenshot, I got the following:

Numbering bugs not withstanding, having a visual representation of where certain lines/command are more frequent gives a much better idea of where to check if a script is too slow and it's hard to figure out where, and can be used as a benchmark between different scripts.

And while I do still think it is a good idea, the more I think about it, the more a command "weight" table becomes impractical. Normal script execution is already affected by system load, other processes, etc., so even if the developer were somehow able to create such a table, it would only be valid for their system. And besides, cover art file size has significantly more impact in execution time that any other commands.

I would still like to se a Total Commands Executed counter in the debug file; something I believe would not be difficult to implement.

As for the Notepad-Excel formatting issue mentioned in a previous post, it has nothing to do with Mp3Tag; and only happens when copying HTML code between the MS Notepad application and Excel (in my case). As a quick workaround, I now open the debug.txt file in a browser window, and from there I can copy/paste to Excel without issues.