KFX Manga Helper Scripts

Last time I wrote about creating KFX manga. This time I am writing about tools I made to help creating KFX mangas easier.

The scripts can be obtained from
https://gist.github.com/innocenat/2b5ddaf52e1b5dc1a19eab33d1f249fb

This article is old

This article is written in 2022, and I no longer use these tools. So use them at your own risks.

AZW3 to KFX helpers

First tools is for converting de-DRMed Amazon AZW3 to convert to KFX. The process is as follow:

Convert AZW3 to ZIP in calibre.
Extract the ZIP to get images file.
Try to merge double-spread page
Process the images using Kindle Comic Converter (KCC).
Use Kindle Create to create KPF
Convert KPF to KFX file.

Step 5-6 is detailed in my previous post. Step 1 is trivial. My scripts automate step 2-4.

We will look at do-comic.sh. Note that this is written in shell, but it should be trivial enough to port this to Windows Batch file.

Note that we use https://github.com/darodi/kcc fork of KCC, which included profile for newer device and mozjpeg support.

The usage is:

1$ do-comic.sh inputfile.zip

inputfile.zip is the ZIP created by calibre from AZW3. We will then walk through what the script does.

15KCC="/path/to/kcc/kcc-c2e.py"
16SPREAD="/path/to/merge-spread.py"

This just define path to KCC’s kcc-c2e.py script and the included merge-spread.py.

19NAME=${1%.*}

This save the file name without .zip extension to the variable $NAME.

23unzip -jd "_tmp" "$1" "*.jpg" "*jpeg" "*.png

This unzip only the image file from .zip. -j option is to flatten the structure. The -d is to specify output directory. We only detect .jpg, .jpeg, and .png, though I think all files from Amazon will be in .jpg.

The reason we need to flatten structure is because the cover image and internal image is in different directory. The directory name inside .ZIP is also randomised, flattening the structure makes our life easier.

26COVER=$(ls _tmp | grep cover)
27mv "_tmp/$COVER" "_tmp_cover.${COVER##*.}"

This bit move the cover image, usually name cover.jpg, to temporary name because we don’t want to process this with KCC (so that in calibre we will get nice colour cover)

30python $SPREAD rtl "_tmp" "_tmp_merged"

The spread-image.py script is executed to try to combine double-page spread. We will talk about this later. The rtl option specified that we are reading from right to left.

40python $KCC -p ${2:-KPW5} -m -u -r 2 -c 1 --mozjpeg -f CBZ -o "_tmp.cbz" "_tmp_merged"

We then invoke KCC to process our image. The device profile can be specified in the parameter to this script, but is default to Kindle Paperwhite 5 (the device I am currently using). We also force upscale, set it to rotate and split double page spread, crop margins only, and use mozjpeg. Output to CBZ, which is just a zip file.

I don’t think using mozjpeg make any different since Kindle Create will re-compress the file using JXR format anyway, but it doesn’t make it noticeably slower so I leave it there.

The reason we don’t want to to crop page number is two reason: first, crop page number is really, really slow; and second, crop page number doesn’t have limit, so it will crop out everything white. So for blank page that has manga title only on bottom-left? Boom, that logo is now the entire page and is very pixelated. Not nice.

43unzip "_tmp.cbz" -d "[KCC] $NAME"
44mv "_tmp_cover.${COVER##*.}" "[KCC] $NAME/0000.${COVER##*.}"

We then extract the CBZ to target directory, and move the cover back together with the rest of the image file.

49rm "_tmp.cbz"
50rm -r "_tmp"
51rm -r "_tmp_merged"

This clean up all our temporary files. And that’s it.

On a side note: I actually use my own fork of KCC. This has two new feature I mainly use:

It use high quality Lanczos scaling for everything.
For page spread left-right splitting: the original version just split in the center. My version will try to fill the target device screen by cropping the left and right side of the image by what is required to fill the screen. Thus, this will result in some part of the spread being included in both left and right split, but it’s much nicer to look at in my opinion.

merge-spread.py

Comics bought from most stores will have a double-page spread provided as one file, but in Kindle, the double spread is provided as two separate file, with some metadata telling the software that this is a double spread.

Which doesn’t work if we are creating our own KFX. As far as I know, there are no currently documented way to do so in Kindle Create yet. So we relied on rotate-and-split feature of KCC to handle double spread.

That leads to a problem of how to tell which pair of images are double spread. So I decide to just slap some Python together and use heuristics to determine the double spread.

The main heuristic is the contrast between pixels on the edge of the image. The idea is that if two images are actually single image, then the contrast at the edge should be very low.

The contrast can be easily calculate: it’s just the colour different between each respective pixel pair.

But this runs into one main problem: most of the time the edge is totally white. So this give us zero contrast. But they aren’t double spread.

After experimenting with many methods to handle this, I just come down to say that anything above 95% white is background, and we don’t count those pixels.

We also don’t count anything more than 95% black, to handle black background.

(Note: we calculate two different contrast, one assuming white background, and another assuming black background. The final contrast is whatever contrast is calculated from more pixels.)

(Note 2: ignoring both black and white doesn’t work, simply because most of the times manga is just black and white, so we will be ignoring everything)

The final heuristics is that the percent of the pixels used in contrast calculation must be more than 15% of the image height, and that the contrast must be less than 25% RMS.

If such condition is met, the image is combined into single image.

It’s not perfect, but it’s pretty darn good.