#48881 - 07/31/14 07:01 PM
Image extraction from a file?
|
OL Newbie
Registered: 09/22/11
Posts: 16
|
I am working with a company the does statements for a bank. The data that I get is in XML format for the statements, and also includes a file of images. The images are in tiff format, with 44 characters of header information preceding each image in this file. In an ideal world, Planet Press would look at the XML of the statement data, which includes the header information, and extract the image from the file and place it on the page. Seeing as these are check images, and I want it to be multiple on a page, I need the extracted image to print, step right, print, step left and down, print, etc until a page is filled and then go to the next page. Anyway, I guess the most important part is... can PP take this image blob and extract an image out of it? If so, where should I be looking to process it? Thanks!
|
Top
|
|
|
|
#48882 - 08/01/14 11:02 AM
Re: Image extraction from a file?
[Re: Chris S.]
|
OL Expert
Registered: 07/23/08
Posts: 152
Loc: Flint, Michigan
|
So do you have tifs or do you actually have something that is no longer a valid tif because somebody stuck 44 characters of header information inside the file?
Do you have a couple dummy records that you could post somewhere to look at?
Repeating the same image multiple times in varying positions is straight forward. It is just a matter of how to get the header and image out of the xml?
|
Top
|
|
|
|
#48888 - 08/01/14 05:58 PM
Re: Image extraction from a file?
[Re: Chris S.]
|
OL Newbie
Registered: 09/22/11
Posts: 16
|
Not sure how well this will copy/paste, but I will trim it for clarity sake.
00000000000000102008F 00000259301004001599II*0»»ò`
`¬ %(]Ñfi
{{insert gibberish here}}
C$Y∂4C1óQ$±∏€Ä00000000000000102010F 00000259301004001602II*0»»∞(
(<B%(
The first bit there is the start of the file, so 44 characters from there is the header, then the gibberish of a tif file, then the next header, gibberish, and so on. So as best as I can see, I need to take a bit of data from my main xml file to match the last bit of the numbering, split it from the beginning of the 44 characters, and continue to the next set of 44. The tif data is all good, just mashed together, and if possible I'd like to not have to split these apart. The mailings we will be doing will be about 4000 pieces each, and assuming an average of 4 checks each (just pulling a number out), I could easily end up with 16K images, for the small mailings, and who knows how many for the quarterly mailings (if they have images at all). If needed, I can link the sample files I am working from to provide additional clarity. Thanks for taking a gander!
|
Top
|
|
|
|
#48891 - 08/04/14 10:07 AM
Re: Image extraction from a file?
[Re: Chris S.]
|
OL Guru
Registered: 07/03/12
Posts: 106
|
can't you just write the contents of your node to a jobinfo, use it in create file and go from there. once processed the tiffs could be uploaded to the virtual drive or saved to the local har disk to be used within design.
usually though, what you would expect to see is base64 for example.. is that tiff data held within a CDATA xml tag-> or are they simply dumping binary data into a standard xml node??
|
Top
|
|
|
|
#48892 - 08/04/14 10:35 AM
Re: Image extraction from a file?
[Re: Chris S.]
|
OL Newbie
Registered: 09/22/11
Posts: 16
|
All of the image data is in a separate file (currdate.img), *not* in the XML itself. The only information in the XML is the 'filename' as such that would be used to locate a given image in the image data file.
Also, just to be clear, I do get these images as a separate file, but 1 file with all of them tied together, not many individual files. I am trying to avoid having to write a script to break the images apart, thus my questions.
Edited by Chris S. (08/04/14 10:44 AM)
|
Top
|
|
|
|
#48893 - 08/04/14 10:43 AM
Re: Image extraction from a file?
[Re: Chris S.]
|
OL Guru
Registered: 07/03/12
Posts: 106
|
can you run the file through a generic data splitter and use split data file on a word, with 'the word' being your header?
it seems the key here would be to split up your .img file, and PlanetPress certainly has the tools for that.. You might end up having to insert a form feed using regular expressions with the search and replace plugin and split on that but i'm sure it can be done at the end of the day.
Perhaps if you were to post the .img file to support then you could get help with that?
|
Top
|
|
|
|
#48894 - 08/04/14 12:27 PM
Re: Image extraction from a file?
[Re: Chris S.]
|
OL Newbie
Registered: 09/22/11
Posts: 16
|
Here is a link to my sample XML and image files. https://www.dropbox.com/sh/8qyrmuopbkq8wie/AABmLMr_R8DS6UpkHPyFUn2KaI had not thought about using the splitter, and that might be an option. The tricky part as far as I can see is that the 44 characters don't have a beginning delimiter, so I'm not sure if that would work or not.
Edited by Chris S. (08/04/14 12:29 PM)
|
Top
|
|
|
|
#48895 - 08/04/14 12:50 PM
Re: Image extraction from a file?
[Re: ppuserd]
|
OL Expert
Registered: 10/14/05
Posts: 4956
Loc: Objectif Lune Montreal
|
can't you just write the contents of your node to a jobinfo, use it in create file and go from there. once processed the tiffs could be uploaded to the virtual drive or saved to the local har disk to be used within design.
usually though, what you would expect to see is base64 for example.. is that tiff data held within a CDATA xml tag-> or are they simply dumping binary data into a standard xml node??
The thing is that you'd have to also remove the header that is present in the data stream. And it's not a 100% safe method: if, during the copying of the image stream, some of the "garbage" characters cannot be properly copied, then it will create an invalid image file. MAYBE it will work... but it's not something that we can guarantee.
|
Top
|
|
|
|
#48900 - 08/05/14 06:40 AM
Re: Image extraction from a file?
[Re: Chris S.]
|
OL Guru
Registered: 07/03/12
Posts: 106
|
i don't think this is going to work, unless you can find a way to edit the file and create a single image from it manually first.
how do you know if they are valid tiff files? were you able to succeed in doing the above? -> i failed miserably..
more generally, i would push to get the images in a recognised format such as .tif or .jpg for example.
|
Top
|
|
|
|
#48904 - 08/05/14 11:25 AM
Re: Image extraction from a file?
[Re: ppuserd]
|
OL Newbie
Registered: 09/22/11
Posts: 16
|
Your answer is about as I expected, but I thought I would ask here of all the much more knowledgeable PPS minds. They are valid tiffs, I used Notepad+ to trim everything past the first image, then clipped the first 44 characters, and ended up with a valid image.
As far as getting the company providing the data, they have been less than helpful in providing any information about what other companies do in making these type of statements, so I'm doubtful they will change just for me. It looks like, barring additional information from them, that I will whip a script up that will split these apart and name them usefully to work from the XML.
Thanks for your help!
|
Top
|
|
|
|
#48906 - 08/05/14 11:33 AM
Re: Image extraction from a file?
[Re: Chris S.]
|
OL Guru
Registered: 07/03/12
Posts: 106
|
can you explain in more depth? so we take off the first 44 chars, and how do you know when the image ends?
i can see that the xml includes an image number, and find that within the file as well.
if you tell me how you do it manually then i'm pretty sure a script can do the same..
|
Top
|
|
|
|
#48911 - 08/05/14 05:45 PM
Re: Image extraction from a file?
[Re: Chris S.]
|
OL Newbie
Registered: 09/22/11
Posts: 16
|
I had missed this myself previously, but the XML actually contains the start point and character counts of the image within the blob. ImgFrntOff Numeric Starting position of front image.
ImgFrntLen Numeric Length of front image.
ImgBackOff Numeric Starting position of back image.
ImgBackLen Numeric Length of back image.
• ImgFrntOff Front OFFSET starts at the beginning of the image header. • ImgFrntLen Front length is the actual length of the image starting at the beginning of the front image. Does not include the header length. • ImgBackOff Back OFFSET starts at the next position following the end of the front image. • ImgBackLen Back length is the actual length of the back image.
So it actually looks a little easier, scripting-wise, than I had previously thought, although it would still be nice if it could all be done within PPS. --Raphael-- I think you are missing what has been said.... The .img file is a group of tiffs that have been placed all together within 1 file and having header information specific to each tiff. The XML does *not* contain the actual tiff. The XML DOES refer to the file, and the location of the image within the .img file.
|
Top
|
|
|
|
#48914 - 08/06/14 09:30 AM
Re: Image extraction from a file?
[Re: Chris S.]
|
OL Guru
Registered: 07/03/12
Posts: 106
|
i was going to ask you what the other nodes were for. but with that info it is indeed 'easy' to solve. though i don't see PPS ever supporting such a file format(s), even why it should. the extension .img historically has to do with floppy disks, now CD images and i guess this format was just made up for a particular custom/ proprietary application. it would have far been better if the creators had opted to create multi-page tiffs, and simply referenced a particular page number within the xml instead of all those instructions, what were they thinking?? anyway, at least you now know exactly what to do, that makes me happy
|
Top
|
|
|
|
|
|