Page 1 of 2 1 2 >
Topic Options
#55871 - 04/10/18 05:25 AM General guidelines on optimizing throughput
douglasb Offline
OL Toddler

Registered: 08/24/04
Posts: 42
Loc: Cheshire, UK
I have a customer who has about 40 processes in Production (7.6.2.9999). There is one process in particular where throughput is important to them (warehouse delivery notes). Different people in the warehouse(s) can kick these jobs off and expect to see something happening at their printer pretty much instantaneously. I've been through the log files for the process and I can see that once Planet Press picks up the spool file it takes around 4 seconds to create the print file. This is quite consistent and is an acceptable time.

Where there seems to be a delay is in picking up the spool file from a Hot Folder. The input file can be sitting in the hotfolder for up to 10 seconds before it is picked up. This is not consistent as some files may be picked up pretty much instantly but more than 60% of files take some time to be picked up.

The files are small ASCII files (up to about 5KB in size) and are picked up from a network shared folder. The process is self-replicating as there may be more than one of these documents required at any one time. Polling time is set to zero seconds. The server has 8 cores and I have never seen them all in use at one time.

I've read through the Optimization Techniques section of https://ollearn.objectiflune.com/course/view.php?id=48. I'd appreciate if you could provide any other guidelines that might enable me to have the input files picked up consistently faster.

Top
#55872 - 04/10/18 06:56 AM Re: General guidelines on optimizing throughput [Re: douglasb]
Philippe F. Offline
OL Expert

Registered: 09/06/00
Posts: 1928
Loc: Objectif Lune, Montreal, Qc
In addition to the process being self-replicating, I would increase its Max percentage of threading setting. By default, it is set to 20% but you could increase it to 50%.
Given that you have 8 cores, I would also increase the Maximum number of self-replicated processes (Preferences>Plug-in>General) from its default value of 50 to something like 100.

Assuming you kept the default values until now, your Delivery Notes process could, at most, handle 10 jobs at once (20% of 50 processes). With these new settings, it would jump to 50 (50% of 100).

You should monitor your logs for a few days after making the changes. You may have to tweak those values up or down, depending on the results you get.
_________________________
Technical Product Manager
I don't want to achieve immortality through my work; I want to achieve immortality through not dying - Woody Allen

Top
#55874 - 04/10/18 10:39 AM Re: General guidelines on optimizing throughput [Re: douglasb]
Jean-Cédric Offline
OL Expert

Registered: 10/03/16
Posts: 471
Loc: Québec, Canada
It could also be the result of delay on the network. If all that Philippe as gave you has change do not seems to improve the speed, you might want to have you IT take a look at your network speed.

Top
#55885 - 04/11/18 04:03 AM Re: General guidelines on optimizing throughput [Re: douglasb]
douglasb Offline
OL Toddler

Registered: 08/24/04
Posts: 42
Loc: Cheshire, UK
Thank you for these answers. I'll try the tweaks suggested and see if we can improve the performance.

@Phillipe - I had understood that with 8 cores there could only be 8 processes running at any one time. From your comments about increasing the threading/self-replication settings it seems that my understanding was wrong and that it is possible to run more processes than there are cores?

Top
#55887 - 04/11/18 10:18 AM Re: General guidelines on optimizing throughput [Re: douglasb]
Philippe F. Offline
OL Expert

Registered: 09/06/00
Posts: 1928
Loc: Objectif Lune, Montreal, Qc
Yes, each Core is able to run several threads (which is why even a single Core machine can run Windows). And each Workflow process (and each replicated Process) uses one thread. So 50 replicated processes take up 50 threads. Theoretically, you could even run all of them on a single Core machine, but pretty quickly you'd notice a degradation in performance because that Core would also be used for everything else on your system. With 8 Cores, however, you should have plenty of CPU power to increase the total number of threads used by Workflow.

What may have confused you is the fact that if any of your processes use the PDF processing library (aka PlanetPress Alambic), there is a limit to how many concurrent instances of that library can be efficiently run on a system. Usually, we recommend not launching more instances than you have Cores because that module is very CPU-intensive, meaning that a single thread may actually use up all of a Core's resources. You can control that parameter through the Workflow Preferences>Plug-In>Messenger.

So in the end, you have to strike the proper balance between the number of threads you want to use and how intensive each thread is likely to be. That's why you'll need to experiment with the numbers I proposed previously to determine which ones are just right for you.
_________________________
Technical Product Manager
I don't want to achieve immortality through my work; I want to achieve immortality through not dying - Woody Allen

Top
#55980 - 05/17/18 03:46 AM Re: General guidelines on optimizing throughput [Re: douglasb]
douglasb Offline
OL Toddler

Registered: 08/24/04
Posts: 42
Loc: Cheshire, UK
While making the process self-replicating improved performance a side effect was that the order that the jobs printed in was not necessarily the same order as they were submitted. I can understand this being down to different file sizes, etc., however is there any way to force self-replicated processes to output in the order in which jobs are submitted?

Top
#55992 - 05/18/18 02:02 AM Re: General guidelines on optimizing throughput [Re: douglasb]
Philippe F. Offline
OL Expert

Registered: 09/06/00
Posts: 1928
Loc: Objectif Lune, Montreal, Qc
Short answer: no.

Longer explanation: with a multi-threaded process, there is no sure way for you to know which job will be output first unless you wait for all the jobs to be processed before you start outputting them to the printer. For instance, two users may me submitting a job at a one second interval, but the first job being submitted is a large file that requires 2 seconds more than the other one. In that case, the second job will finish first even though it came in 1 second later than the first one.

Even with a single process, you can't be sure: the first file being submitted might be just large enough that a smaller file coming in a few milliseconds later will become available in the Hot Folder before the first one is finished writing, so the second file will be picked up first.

There's no perfect way of getting around that problem, but one possible way that may work for you would be to first ensure that whatever process drops the job files in the Hot Folder, it names them consistently and sequentially (Job2018-05-18-14.02.45.017, for instance). You could then output all processed jobs to another hot folder, using their original filename. A second, single threaded, process would then pick them up from that second Hot Folder in alphabetical order and send them to the printer in the proper order.

Although you'd be losing the multi-threaded output - which shouldn't make much of a difference since at that stage the printer becomes the bottleneck - you'd still retain the multi-threaded processing, which is where you want to gain performance to start with.

Perhaps other users have found a better way to handle this, so let's see if the community can contribute!
_________________________
Technical Product Manager
I don't want to achieve immortality through my work; I want to achieve immortality through not dying - Woody Allen

Top
#55993 - 05/18/18 03:58 AM Re: General guidelines on optimizing throughput [Re: douglasb]
douglasb Offline
OL Toddler

Registered: 08/24/04
Posts: 42
Loc: Cheshire, UK
Thanks for the comprehensive answer Phillipe. The situation that you describe of multiple users submitting jobs of differing sizes is exactly what I am encountering.

I am going to try a different approach to improve throughput. At the moment the delivery note process is a single process that serves multiple printers. One of the first steps in the process extracts the printer name from the data and the final steps select that printer queue.

Rather than having a single process handling multiple printers I can change this to a process per printer and have a "receive" process to carry out the recognition and pass the data to the "printer" process. Each printer process will be single threaded but each will run in parallel so I should get the throughput and retain the correct job order.

Any other ideas from the community will be welcome though!

Top
#55994 - 05/18/18 09:57 AM Re: General guidelines on optimizing throughput [Re: douglasb]
Philippe F. Offline
OL Expert

Registered: 09/06/00
Posts: 1928
Loc: Objectif Lune, Montreal, Qc
Your "Receive" process should also be set to self-replicate, which will also speed up the distribution of files to all "Printer" processes.
_________________________
Technical Product Manager
I don't want to achieve immortality through my work; I want to achieve immortality through not dying - Woody Allen

Top
#55996 - 05/22/18 04:59 AM Re: General guidelines on optimizing throughput [Re: douglasb]
stuartg Offline
OL Expert

Registered: 03/06/03
Posts: 713
Loc: Swindon, England
Doug
If you are running with logging set to "all events with details", you should try switching to logging errors only. I realise that this loses a lot of information that you would naturally want to see whilst trying to optimise things, but I've seen 100% increase in throughput when I turned logging off. It's worth a try.
If you want to, selected info can be written to the log using a script and watch.log, these are treated as error messages and appear even in the minimal log.
It sounds as if the main delay is getting the jobs onto the planetpress server, can you change the delivery method to eg. ftp of lp?

Top
Page 1 of 2 1 2 >