HTM files & Planetpress! Can it be done!

Posted by: adequickcopy

HTM files & Planetpress! Can it be done! - 01/10/13 01:34 PM

I have a job that my data file is in an htm format. After some searching, it seems that Planetpress is unable to use htm files. I am hoping I am incorrect in this assumption, and someone can let me know how or where to find the steps to be able to use the htm file.

If I am correct, is there a way to convert the htm file into a useable data file?

Posted by: Raphael Lalonde Lefebvre

Re: HTM files & Planetpress! Can it be done! - 01/10/13 01:45 PM


Alas, I'm afraid you're right: PlanetPress doesn't natively support html format data. The only way you could use htm files in PlanetPress as data files would be to write your own HTML interpreter entirely in PressTalk. This would mean hours of work and a great knowledge of the suite, PressTalk and of the HTML language.

As for converting the htm file to another format, well, it all depends what format you want to convert it into. However, because we have no built-in HTML emulation mode, you would most likely have to use a script to convert it to a different format, and use your own data extraction methods. The closest file format to HTML that we support would be XML, though we also support other formats, like CSV, plain text, etc... Whichever is better/easier for you.

Raphaƫl Lalonde Lefebvre
Posted by: ppuserd

Re: HTM files & Planetpress! Can it be done! - 01/11/13 03:42 AM

try using the ms word to pdf connector. word understands html (quite) well and so you will probably find that the resulting pdf is more suited to your needs.
Posted by: -nth-

Re: HTM files & Planetpress! Can it be done! - 03/05/13 02:40 PM

Have you been able to determine how complex the HTML is? If it's not very complex (name, address, etc.) running it through a "pre-process" workflow to transform it to a usable CSV or text file may be doable. Also check out XHTML. Converting your HTML to XHTML would allow PlanetPress to "see" the files as XML and be used accordingly.

