Converting PDF to TXT

Filling in the UNSYNCEDLYRICS tag field is a tedious task that involves collecting from various sources. In addition to scripts for websources, general Google search, Bandcamp, etc. I also often get my information from artists or fan websites via C&P. Sometimes the PDF format is used for the song lyrics.

Unfortunately I have not yet found a way from PDF (if they are not locked, which is very rarely the case) to insert the text into UNSYNCEDLYRICS format via C&P without losing the paragraph formatting for verses, choruses, etc., so that I always have to insert these manually afterwards.

Since I would like to save myself this additional work, I am looking for a possibility of a PDF-TXT converter that transfers PDF paragraphs into double CR-LF and thus achieves the desired end result for me via an intermediate step. I have already tried several online solutions for conversions. However, there was none that did not simply ignore these paragraphs.

Who knows a solution?

You could try it with a "Generic text only printer" and print the document to file. Rename the resulting .prn file to .txt and open it with the Editor.

I've used pdftotext and pdf2text in the past.

Take a look at ilovepdf.com. Quite a lot of features and functions available, many that are not found easily any where else.

Which one does extract the text from a PDF into a plain TXT files as the OP ask for?
And just for others: The tools are not that free as you probably think at first sight.

1 Like

So sorry!

Considering the range of complex tasks that it can accomplish, I thought that something as simple as TXT would be there.

I used that site for 5-6 years to help me produce a 40-page newsletter every month. Very happy that I'm no longer in that "volunteer" position.

Thanks for the suggestions.
I've tried these and various other online and offline solutions. All of them failed in the task of transferring the PDF paragraph formatting to text format with CR-LFs. However, since some support the conversion into other formats such as Word and Powerpoint and since the adoption of the paragraph formatting works with these, you can transfer them to MP3Tag as an intermediate step via C&P.

In the end I found what I was looking for. The PDF24 that I have already used masters the task of converting PDFs into text format without my being aware of it, and thereby receives the paragraph formatting.

2 Likes

Have you tried PDF Shaper*? The free version also includes your "PDF to TXT" function and works (in my short test) very well. Just beware that you DECLINE the additional bloatware during the setup :face_with_monocle: :wink:

The free version comes with a GUI. If you want to have the same functionality in a command line too you have to buy at least the Premium version. According to the help file it should convert files like this:
PDFShaper.exe pdftotxt "ForTextExport.pdf" "MyExport.txt" x
If you need OCR for your text recognition, you need to buy the Professional version.

Let us know if this would work with your paragraph formatting request.

* I'm not affiliated in any kind with this software or the developers.