Problem with stripping text


#1

I am writing a program in VB.NET (Personal Project) and need your guys help with Regex (or another solution), I have been thinking for a solution for days but I can't solve it.

A line which can looks like this (and maybe more)

But I only want the 1st Part so

The only common standard about the format is that the text I want starts after a " chr(34) but after that there are (unfortunately) No more common identifiers (only that text I want is in ALL CAPS [A-Z] :wink:

I can't seem to find a solution to this problem, the text file formatting is just to messed up and to do it manually would take me days and days (60.000 Lines)


#2

In the examples, IMHO the blank is the common separator.
So
$regexp('ABCDEFGHIJK ~ AB Ab Cdefgh"',(.?) .,$1)
returns
ABCDEFGHIJK


#3

Thanks for the Reply

But 1 thing.

That would fail if the text has a space in between. For example.

"ABCD EFGH I Abcd efgh

As I would like to get as Result ABCD EFGH

And the regex would only return ABCD

And in "the Regex Coach" the (.*?) doesn't give any matches :frowning:


Edit: I came up with this (please don't laugh as I am ABSOLUTELY not a REGEX guy :wink: )

[A-Z](\s)?(-)?(/)?[A-Z]


But that already fails if the text has Ü (Umlaut) in it :wink: or will with the text (as it grabs tooo Much)

ABCD EFGH I (and I only want ABCD EFGH)




#4

That is true, but in your example none of the result examples has a string in it with a blank ... so I can only be as good as the example is.

Perhaps you have to go through the list twice:
first cut all the bits in lower case, then deal with ones in mixed case.

This regexp will leave over anything that starts with capitals:
$regexp('ABCD EFÜGH I Abcd efgh',(?-i)(\u.) .,$1)

and this one
$regexp('ABCD EFÜGH I Abcd efgh',(?-i)(\u.) \u\l.,$1)

chucks off all strings that start with a captital and continue with a lower case.


#5

Are you sure :wink: ???

Example line Nr 6

And I want as result

Thanks again, will try those Regexes. Great help !!

Good one about stripping first the lower cases and then the mixed cases !!

Edit: I most certainly could do it manually, but this list changes sometimes once a week or sometimes twice and then a long while it doesn't change, so Id rather do it "semi auto"


#6

Not any more ...