Topic Options
#57601 - 07/29/20 03:49 PM Extract Pages
PFloyd Offline
OL Newbie

Registered: 07/29/20
Posts: 3
Good afternoon
I need to extract records from a PDF
Each record is 2 pages typically. 1 is the data, 1 is the address.

So if there are records with maybe 2 data pages I would need to extract 4 pages total. Each data page would have a blank after it that's already in the file, or it would have the address page.

Page N of N appears on the page. I think if we key in on that and make it page one. So anything that says page 1 of 1 would remain, anything else would be extracted until we see 1 of 1 again which would mean that it's back to the 1st page in the next record?

Also there could be foreigns that need to be extracted as well. We can key in on the word "foreign" and extract those.

Maybe need to account for multi page foreigns as well?

How would that look? I don't think I need a planetpress design document because we're strictly using PDFs without putting any data on it?
I'm going to play around with it but I thought maybe someone with more experience could read this and give their 2 cents?

Top
#57610 - 08/03/20 11:31 AM Re: Extract Pages [Re: PFloyd]
Jean-Cédric Offline
OL Expert

Registered: 10/03/16
Posts: 650
Loc: Québec, Canada
When you say extract...you mean you need to extract data from a PDF file and do something with it or you need to extract the actual PDF page(s) and do something with it?
_________________________
♪♫♪♫
99 frigging bugs in my code
99 frigging bugs
Take one down
Code around
127 frigging bugs in my code
♪♫♪♫

Top
#57614 - 08/03/20 05:59 PM Re: Extract Pages [Re: PFloyd]
jim3108 Online   content
OL Expert

Registered: 04/19/10
Posts: 311
Loc: London, UK
If I'm understanding you correctly, we would need to "remove" rather than "extract" page from a PDF based on conditions.

Based on what you have said and presuming I understand what you mean, you could do this with a script making use of the AlambicEdit API.

You'd have to loop over your pages, check for the 'remove' condition and then use .Delete to remove that page accordingly.

Regards,

James.

Top
#57633 - 08/26/20 05:54 PM Re: Extract Pages [Re: Jean-Cédric]
PFloyd Offline
OL Newbie

Registered: 07/29/20
Posts: 3
Originally Posted By: Jean-Cédric
When you say extract...you mean you need to extract data from a PDF file and do something with it or you need to extract the actual PDF page(s) and do something with it?


I would have to remove the pages from the input PDF and create a new PDF from them. I would have to find a keyword on the page that specifies that it's foreign and remove it and the next page along with it.
I'm sorry if I can't describe what I need better.

Top
#57634 - 08/26/20 05:59 PM Re: Extract Pages [Re: jim3108]
PFloyd Offline
OL Newbie

Registered: 07/29/20
Posts: 3
Originally Posted By: jim3108
If I'm understanding you correctly, we would need to "remove" rather than "extract" page from a PDF based on conditions.

Based on what you have said and presuming I understand what you mean, you could do this with a script making use of the AlambicEdit API.

You'd have to loop over your pages, check for the 'remove' condition and then use .Delete to remove that page accordingly.

Regards,

James.


Thank you for your help. I'm not much of a coder but I'll see what I can work up. If you had an example of the coding it would help

Top