IMPORTANT ANNOUNCEMENT

These forums were permanently set to read-only mode on July 20, 2022. From that day onwards, no new posting or comment is allowed on the site, but the historical content remains intact and searchable.

A new location for posting questions about PlanetPress Suite is now available:

OL Learn - PlanetPress Classic (opens in new tab)

Topic Options
#41969 - 01/17/13 02:51 PM Design/workflow not reading PDF's
Document Options Offline
OL Newbie

Registered: 04/20/10
Posts: 9
Loc: Crawley
Hi all,

I have a problem with either design or workflow

I am trying to get either design or the workflow to read a string of text from a pdf planet press has created itself??

The "string to find" box just shows "???????????????????????" as it can't read it. Any ideas?



Thanks

Top
#41973 - 01/17/13 03:19 PM Re: Design/workflow not reading PDF's [Re: Document Options]
Raphael Lalonde Lefebvre Offline
OL Expert

Registered: 10/14/05
Posts: 4956
Loc: Objectif Lune Montreal
Hi,

Here's two ideas:

1. Are you trying to select text from a non-western language, such as Chinese, Japanese, Arabic, etc...? If so, it's possible that you might not be able to read it, unless your machine's regionnal settings are set to the language that you are trying to read.

2. Is what you are selecting really text, or it's part of a raster image? For a data selection to work, you need to actually have selectable text in the pdf. If it's part of an image, it may look like text, but it's not real selectable text and PlanetPress won't be able to read it. Open the pdf with Acrobat, and see if you can select the text. If so, then it should be real text and it should be readable.

I would start with that. Let us know the results!

Regards,
Raphaël Lalonde-Lefebvre

Top
#41977 - 01/17/13 03:37 PM Re: Design/workflow not reading PDF's [Re: Raphael Lalonde Lefebvre]
Document Options Offline
OL Newbie

Registered: 04/20/10
Posts: 9
Loc: Crawley
Thanks for the reply Raphael, It's selectable text - checked in acrobat pro and the language is english.

The problem is showing when it passes through the create pdf action in a workflow.

The workflow starts with 1000's of pdf's we are then combining them and adding bar codes for our mailing machine. If I take one of the original pdf files and use that as sample data it works fine. As soon as it gets combined with a ptk file and pdf'd it stops being able to read it?

Thanks

Top
#41994 - 01/21/13 02:47 PM Re: Design/workflow not reading PDF's [Re: Document Options]
Philippe F. Offline
OL Expert

Registered: 09/06/00
Posts: 1984
Loc: Objectif Lune, Montreal, Qc
This problem is most likely due to the use of Identity fonts in your original PDF files. Identity fonts define each character in the font subset as they are encounterend in the file. Therefore character 1 in the font table might be a "T" and character 2 might be a "H", etc.

Those Identity fonts always come with a conversion table that's stored in the PDF as well so the PDF reader knows how to interpret which character in the table should be represented with which letter/digit. So for instance, instead of the standard ASCII representation (where A=65, B=66, C=67, etc.), that table would contain something like T=1, H=2, E=3, etc.).

That's all fine and dandy until you reprocess that PDF file with Create PDF: the task first converts the incoming PDF to its PostScript equivalent (to be more precise, to a number of EPS pages) which are then merged with your PlanetPress template. But the problem is: PostScript has no mechanism for storing the conversion tables for Identity fonts... so they are lost in the process. And once the task converts the entire stream back to PDF, most of the characters in the font are represented with an index lower than 32, which Acrobat displays as "?".

In such instances, you have two possibilities to workaround the issue:
1 - Modify how the original PDF files are created. Make sure whatever application that generates the PDF doesn't make use of Identity fonts. If you have no control over how that application generates the PDF's, then you'll have to use the second alternative.
2 - Do not map the incoming PDF files to the background of your PlanetPress template. That way, the PDF won't be converted into a series of EPS files (which causes the loss of the conversion tables).
Obviously, if you use Create PDF now, you'll only get a PDF with barcodes on it, and nothing else. But that PDF will contain the exact same number of pages as your input PDF has. So now, you have to PDF files: the original one, and a new one that you are going to "stamp" over the first one.
To do that, you'd use a script similar to the following:
Code:
Set MyOriginalPDF = Watch.GetPDFEditObject
MyOriginalPDF.Open Watch.GetJobFileName, False
MyOriginalPDF.MergeWith "My_Barcoded_Document.pdf"
MyOriginalPDF.Save False

This script stamps the pages in "My_Barcoded_Document.pdf" (which is presumably the PDF generated by the Create PDF task, and which you stored somewhere) on top of the original PDF (which is the one that contains the Identity fonts).

That should do the trick. And you'll also find that it probably runs faster than your original process did because the entire process of converting the original PDF to EPS and then back to PDF has been eliminated.




Edited by Philippe F. (01/22/13 02:22 PM)
Edit Reason: Changed "incremental" to "Identity"
_________________________
Technical Product Manager
I don't want to achieve immortality through my work; I want to achieve immortality through not dying - Woody Allen

Top
#42023 - 01/22/13 01:43 PM Re: Design/workflow not reading PDF's [Re: Document Options]
Yannick Fortin Offline
OL Expert

Registered: 08/25/00
Posts: 354
Loc: Objectif Lune Montréal
Just a quick note, so as to clear up any confusion: when Phil says "incremental", he actually means "Identity". Incremental in the context of a font is a related but different concept.
_________________________
Yannick Fortin, Team OL

Top
#42025 - 01/22/13 02:23 PM Re: Design/workflow not reading PDF's [Re: Document Options]
Philippe F. Offline
OL Expert

Registered: 09/06/00
Posts: 1984
Loc: Objectif Lune, Montreal, Qc
Thanks Yannick. I have updated my original post to reflect your comments.
_________________________
Technical Product Manager
I don't want to achieve immortality through my work; I want to achieve immortality through not dying - Woody Allen

Top
#42048 - 01/23/13 03:07 PM Re: Design/workflow not reading PDF's [Re: Philippe F.]
Document Options Offline
OL Newbie

Registered: 04/20/10
Posts: 9
Loc: Crawley
Thanks very much for taking the time to explain a solution. We don't have any control/influence over how the pdf files are created, so I will have a go with option 2 and let you know how I get on.

Thanks again

Top
#42059 - 01/24/13 05:45 PM Re: Design/workflow not reading PDF's [Re: Philippe F.]
Document Options Offline
OL Newbie

Registered: 04/20/10
Posts: 9
Loc: Crawley
Option 2 worked just fine, thanks for your help. smile Is there a best practise of combining lots of PDF together? As they come out of the workflow the combining it is taking ages to create a print file.

We have split the combining over different processes to speed things up but it still seems slow.

Thanks

Top
#42066 - 01/25/13 08:35 AM Re: Design/workflow not reading PDF's [Re: Document Options]
Philippe F. Offline
OL Expert

Registered: 09/06/00
Posts: 1984
Loc: Objectif Lune, Montreal, Qc
The combining of PDF's is incrementally slower as you add pages. You have to remember that a PDF file is a series of dictionnaries and indexes (at its most basic, a dictionnary of Pages with an Index of where to find each page). So every time you add pages to the destination dictionnary, the process not only adds the pages but adjusts the Index accordingly. In other words, adding two pages to a PDF that already contains 1000 pages means the process would (possibly, not always) have to adjust all 1000 existing page indexes after having added the pages. Add another two pages, and the previous 1002 indexes may have to be adjusted, and so on.

Your approach of distributing the combining is the proper one: the most efficient way of combining large numbers of PDF files is to work with chunks: for instance, instead of adding 1000 PDF files one by one to a single destination file, use 5 processes to add 200 files to 5 temporary files, then use a single process to concatenate all 5 temporary files into your final PDF file.

Note that multiplying the number of intermediate processes will not necessarily go faster (in fact, it may slow down the entire process). That's because every such process requires an instance of the PDF library from the RIP pool, and there is a maximum number of instances available at any one time (the setting depends on how many CPU cores you have on your system). So if you only have 2 cores, requesting 10 instances of the RIP will have all 10 processes fighting over the available instances of the RIP, which may slow down the entire processing.

In other words, you'll have to experiment a bit to fine-tune the workflow.
_________________________
Technical Product Manager
I don't want to achieve immortality through my work; I want to achieve immortality through not dying - Woody Allen

Top