MaskTools

Abstract

author: kurosu and Manao
version: 1.5.6
download: http:/ /www.geocities.com/manao47/Filters/
category: Misc Plugins
requirements: YV12 Colorspace

Table of contents

I) About MaskTools

1) Simple version

After a processing, you may need to keep only a part of the output. Say, you have a clip named smooth that is the result of a smoothing (blur() for instance) on a clip named source.
Most of the noise from source have disappeared in smooth, but so have details. You could therefore want to only keep filtered pixels and discard those where there are big difference of color or brightness. That's what does MSmooth by D. Graft for instance. Now consider that you write on an image pixels from smooth that you want to keep as white pixels, and the other ones from source as black pixels. You get what is called a mask. MaskTools deals with the creation, the enhancement and the manipulating of such mask for each component of the YV12 colorspace.

2) Description

This Avisynth 2.5 YV12-only plugin offers several functions manipulating clips as masks:

In addition, all functions take 3 parameters: Y, U and V (except the FitPlane functions, where obviously the name tells what is processed). Depending on their value, different operations are applied to each plane:

A last point is the ability of some functions to process only a part of the frame:

This was intended for modularity and atomic operations (or as useful as possible), not really speed. It became both bloated and slow. I let you decide whether this statement is totally true, or a bit less... The examples in III) are most probably much faster applied with the original filters.

II) Function descriptions

Binarize

Binarize (clip, int "threshold", bool "upper")

The Binarize filter allows a basic thresholding of a picture. If upper=true, a pixel whose value is strictly superior to threshold will be set to zero, else to 255. On the contrary, if upper=false, a pixel whose value is strictly superior to threshold will be set to 255, else to zero.

Defaults are threshold = 20 and upper = true.

CombMask

CombMask (clip, int "thY1", int "thY2")

This filter produces a mask showing areas that are combed. The thresholds work as for the other filters : after calculating the combing value, if this one is under thY1, the pixel is set to 0, over thY2, it is set to 255, and inbetween, it is set to the combing value divided by 256.

The combing value is (upper_pixel - pixel)*(lower_pixel - pixel). Thus, it is not normalized to the range 0..255, because if it was done, value would be close to 1 or 2, no more. That means you can use threshold higher than 255, even if they should not be useful.

Defaults are thY1 = 10 and thY2 = 10 ( thus making a binary mask ).

DEdgeMask

DEdgeMask (clip, int "thY1", int "thY2", int "thC1", int "thC2", string "matrix", float "divisor", bool "setdivisor", bool "vmode")

This filter creates an edge mask of the picture. The edge-finding algorithm uses a convolution kernel, and the result of the convolution is then thresholded with thY1 and thY2 ( luma ) and thC1 and thC2 ( chroma ). The thresholding happens like that ( r is the result of the convolution ) :

In order to create a binary mask, you just have to set th1=th2.

The choice of the convolution kernel is done with matrix. The matrix must be a 3 by 3 matrix, whose coefficients are integers, separated by a single space. Hence, the strings "-1 -1 -1 -1 8 -1 -1 -1 -1" and "0 -1 0 -1 0 1 0 1 0" will respectively give the kernels "laplace" and "sobel" of the filter EdgeMask.

As coefficients must be integers, divisor is used to refine the result of the convolution. This result will simply be divided by divisor. If divisor isn't defined, it is defaulted to the sum of the positive coefficient of the matrix, thus allowing a classic normalization. It can be either a float or an integer, the later being the faster.
setdivisor is present only for backward compatibility. Do not use it.
Finally vmode allows to output a mask centered to 128 instead of zero. Defaults are : thY1 = 0, thY2 = 20, thC1 = 0, thC2 = 20, matrix = "-1 -1 -1 -1 8 -1 -1 -1 -1" and vmode=false.

EdgeMask

EdgeMask (clip, int "thY1", int "thY2", int "thC1", int "thC2", string "type")

This filter creates an edge mask of the picture. The edge-finding algorithm uses a convolution kernel, and the result of the convolution is then thresholded with thY1 and thY2 ( luma ) and thC1 and thC2 ( chroma ). The thresholding happens like that ( r is the result of the convolution ) :

In order to create a binary mask, you just have to set th1=th2.

The choice of the convolution kernel is done by type :

Finally, there are also two other possible values for type ( "cartoon" and "line" ), which have behaviors which are not documented here.

Defaults are : thY1 = 0, thY2 = 20, thC1 = 0, thC2 = 20 and type = "sobel".

FitY2U / FitY2V / FitY2UV  FitU2Y / FitV2Y / FitU2V / FitV2U

FitPlane (clip, string resizer

FitPlane has the following incarnations:
- luma to chroma: FitY2U, FitY2V, FitY2UV
- chroma to luma: FitU2Y, FitV2Y
- chroma to chroma: FitU2V, FitV2U

You can by this mean propagate a mask created on a particular plane to another plane.

Inpand / Expand / Deflate / Inflate

Inpand (clip)
Expand (clip)
Deflate (clip)
Inflate (clip)

This filters allow to enlarge / reduce a mask. Expand will replace the value of a pixel by the highest surrounding value. Inpand will on the contrary replace it by the lowest surrounding value. Inflate will compute the mean of the surrounding pixels, and will replace the value of the pixel by it only if this mean is superior to the original value of the pixel. Deflate will do the same only if the mean is inferior to the original value.

The picture returned by Expand / Inflate will always be higher than the original picture. On the contrary, the one returned by Inpand / Deflate will always be lower.

The enlarging / reducing produced by Deflate / Inflate is softer than the one of Expand / Inpand.

HysteresyMask

HysteresyMask (mask_clip1, mask_clip2)

This filter creates a mask from two masks. Theorically, the first mask should be inside the second one, but it can work if it isn't true ( though results will be less interesting ). The principle of the filter is to enlarge the parts that belongs to both masks, inside the second mask.

This algorithm is interesting because it allows for example to obtain an edge mask with all the interesting edges, but without the noise. You build two edge masks, one with a lot of edges and noise, the other one with a few edges and almost no noise. Then, you use this filter, and you should obtain the edges, without the noise, because the noise wasn't there in the second mask.

Invert

Invert (clip, int offX, int offX, int w, int h)

This filter seplaces the pixel's value by 255-pixel's value.

Binarize(upper=false) could be seen (but isn't processed as) as 

Invert().Binarize(upper=true)

Logic

Logic (mask_clip1, mask_clip2, string "mode")

This filter produces a new mask which is the result of a binary operation between two masks. The operation is chosen with the parameter mode.

If a logical operator is used with a non binary mask, the results are unpredictable.

Default : mode = "and".

RGBLUT / YV12LUT / YV12LUTxy

YV12LUT (clip, string "yexpr", string "uexpr", string "vexpr")
RGBLUT (clip, string "Rexpr", string "Gexpr", string "Bexpr", string "AMPFile")
YV12LUTxy (clipx, clipy, string "yexpr", string "uexpr", string "vexpr")

These filters apply a function to each pixel of the picture. In order to allow a fast computation, every possible value of the function are precomputed and stored in a Look-Up Table ( hence the name ). That makes the filters fairly fast. RGBLUT works exactly the same way as YV12LUT, except that it has an additional argument AMPFile. It allows you to load a photoshop color profile. 

In order to be able to apply almost every possible function, this one is given by a string which represents an expression in reverse polish notation. The principle of this notation is to write firstly the operands / parameters of an operator / function, and then the operator / function itself. Hence, "3 + 7" becomes "3 7 +", and "sin(3)" becomes "3 sin". Going further in the explanations, "3 * 7 + 5" becomes "3 7 * 5 +", and "(3 + 7) * 5" : "3 7 + 5 *". Now, you understand the main asset of this notation : no need of parenthesis.

Computations are lead on real numbers. Positive numbers also represent a true statement, whereas negative numbers represent a false statement. In the string, the symbol "x" is tha value of the pixel before the use of the function. For YV12LUTxy you also have the symbol "y", which represents the value of the collocated pixel in the second clip. The symbols must be separated by a single space.

Some operators and functions are implemented :

Some examples :

* Binarization of the picture with a threshold at 128 : "x 128 < 0 255 ?". It is translated as : "(x < 128) ? 0 : 255".
* Levels(il, gamma, ih, ol, oh) ( have a look at the filter Levels ) : "x il - ih il - / 1 gamma / ^ oh ol - *". It is translated as "(((x - il) / (ih - il)) ^ (1 / gamma)) * (oh - ol)".

Defaults are : Yexpr = Uexpr = Vexpr = "x" ( hence, the filter does nothing ).

MaskedMerge

MaskedMerge (base_clip, overlay_clip, mask_clip)

This filter applies the clip overlay_clip on the clip base_clip, considering the clip mask_clip. More precisely, with bc, oc and mc the values of three pixels taken respectively on base_clip, overlay_clip and mask_clip, the result will be :

v = ((256 - mc) * bc + mc * oc + 128) / 
256

128 is here in order to reduce the error due to the rounding of the integer division.

So, if the mask is 255, the pixel will be the pixel from the overlay_clip, if the mask is 0, the pixel will be from the base_clip, and in between, it will be blended between both clips.

MotionMask

MotionMask (clip, int "thY1", int "thY2', int "thC1", int "thC2", int "thSD")

This filter creates a mask of the motion of the picture. As with the other filters which create masks, once the motion is computed, it is thresholded by two thresholds. This filter will also check for scene changes, and won't output a mask if one is detected.

Scene change detection is made by computing the sum of absolute differences of the picture and the previous one. This sum is averaged, and then compared to thSD. If it is more than thSD, a scene change is detected.

Motion is computed the same way as NoMoSmooth, meaning that for each pixel, we'll compute the absolute sum of differences between the pixel and its surrounding, and the pixel and its surrounding in the previous picture. The resulting value is then divided by 9, in order to normalize the result between 0 and 255.

This algorithm only gives an approximation of the motion. It will work well on the edges of an object, but not on its inside.

Defaults are : thY1= 20, thY2 = 20, thC1 = 10, thC2 = 10 and thSD = 10.

YV12Convolution

YV12Convolution (clip, string "horizontal", string "vertical", int "total", bool "automatic", bool "saturate")

This filters computes the convolution product between the picture and the kernel defined by the multiplication of horizontal by vertical. These two strings represent vectors. They must have an odd number of integer or real numbers, separated by single spaces. total is a normalization factor, by which the result of the product is divided. If automatic is set to 'true', total is the sum of the coefficients of the matrix. It means that, that way, overall brightness of the picture isn't touched. Saturate allows to choose the behavior of the filter when the result is a negative number.

If total is not defined, it is set to the sum of the coefficients of the convolution kernel, thus allowing a good normalization for bluring / sharpening kernels.

If one of the coefficients of horizontal or vertical is a real number, all the computations will be made with floats, so the filter will be slower.

Defaults are : horizontal = "1 1 1", vertical = "1 1 1" and automatic = false, saturate = true.

YV12Subtract

YV12Subtract (clip1, clip2, int tol, bool "widerange")

This filter computes the difference between the two clips. There are several ways of computing this difference, depending on the values of widerange and of tol.

Defaults are : tol = -1 and widerange = false.

III) Some practical uses (not tested extensively)

Those won't produce the exact same results as the original filters they try to mimic, in addition to be far more slower. Despite the numerous additional functions, no newer idea.

Notes: 
- I'm too lazy to update the syntax, especially regarding how mode=2 works, and how EdgeMask was updated (now longer needs of a Binarize for instance)
- Some filters I describe as 'to create' already exist (imagereader, levels for clamping, ...).

1) MSharpen

# Build EdgeMask of clip1, Binarize it and store the result into clip3
# Apply any sharpening filter to clip1 and store it into clip2
...
return MaskMerge(clip1, clip2, clip3)

The sharpened edges of clip2 higher than the threshold given to Binarize will be sharpened and used to replace their original value in clip1. You could also write a filter with a particular Look-up table (best would look like a bell), replace Binarize by it, and have a weighed sharpening depending on the edge value: that's the HiQ part in SmartSmoothHiQ

clip2 = clip1.<EdgeEnhancer>(<parameters>)
#U and V planes don't need filtering, Y needs it
#EdgeMask(<...>, "roberts", Y=3, U=-128, V=-128) for greyscale 
map
clip3 = clip1.EdgeMask(15, 60, "roberts", Y=3, U=1, V=1)
return MaskedMerge(clip1, clip2, clip3)

2) MSoften

Replace EdgeEnhancer by a spatial softener (cascaded blurs? spatialsoftenMMX?) and use upper=true to select near-flat pixels.

3) Rainbow reduction (as described here in this thread< /cite> )

Warning, this isn't a miracle solution either

clip2 = clip1 soften at maximum (using deen("m2d") or edeen for 
instance)
#Get luma edgemap and increase edges by inflating
# -> wider areas to be processed
clip3 = clip1.EdgeMask(6, "roberts", Y=3, U=1, V=1).Inflate(Y=3, U=1, 
V=1)
#Now, use the luma edgemask as a chroma mask
clip3 = YtoUV(clip3, clip3).ReduceBy2().Binarize(15, upper=false, Y=1, U=3, V=3)
#We have to process pixels' chroma near edges, but keep intact Y plane
return MaskedMerge(clip1, clip2, clip3, Y=1, U=3, V=3)

4) Supersampled fxtoon

Not tested

. Use tweak to darken picture or make a plugin that scales down Y values 
-> clip2
. Build edge mask, Supersample this mask, Binarize it with a high threshold 
(clamping sounds better), Inflate it -> clip3
. Apply the darker pixels of clip2 depending on the values of clip3

5) Warpsharp for dark luma

Not tested

. Apply warpsharp -> clip2 (replacement pixels)
. Create a clamping filter or a low-luma bypass filter -> clip3 (mask)

6) pseudo-deinterlacer (chroma will still be problematic)

Not tested

clip2 = clip1.SeparateFields().SelectEven().<Method>Resize(<parame
ters>)
clip3 = clip1.<CombingDetector>(<parameters>)
return MaskedMerge(clip1, clip2, clip3, Y=3, U=3, V=3)

(chroma even more problematic)

7) Non-rectangular overlays

In fact, this is handled more nicely by layer and mask...

#Simple hack because ImageReader needs an integer fps...
#Most sources are natively in YUY2/YV12
clip = AviSsource("test.avi").ConvertToYV12().assumefps(fps)
#Load the picture to be overlayed
image = ImageReader("mask.bmp", 0, clip.framecount()-1, 24, 
use_DevIl=false)
#Simple way: assume black is transparent 
#Any other colour would be quite more complicated*
masktemp = imageYV12.Binarize(17, upper=false, Y=3)
#We set the luma mask to fit the chroma planes
mask = Mask.FitY2UV()
#Now that we have the mask that tells us what we want to keep...
#Replace by image the parts of clip masked by mask!
MaskedMerge(clip, image, mask, Y=3, U=3, V=3)
#*solution: mask = OverlayMask(image, image.BlankClip("$xxxxxx"), 1, 
1)

8) Replace backgrounds

This example clearly would look better in RGB. To avoid typical problems due to noise or compression, you would better use blurred versions of the clip and picture.

source = AviSource("overlay.avi").AssumeFPS(24)
#blur the source
clip = source.Blur(1.58).Blur(1.58).Blur(1.58)
#load the background to replace, captured from the blurred sequence
bgnd = ImageReader("bgnd.ebmp", 0, clip.framecount()-1, 24, 
use_DevIl=false)
#load new background
new = ImageReader("new.ebmp", 0, clip.framecount()-1, 24, 
use_DevIl=false)
#integrated filter to output the mask = (clip~overlay?)
mask = OverlayMask(clip, overlay.ConvertToYV12(), 10, 10)
MaskedMerge(source, new.ConvertToYV12(), mask, Y=3, U=3, V=3)

9) K-mfToon

I need to include more info (original urls/posts) but for now I think mfToon's original author, mf (mf@onthanet.net) will not react too violently to it, while it's still not addressed.
The output of the function inside K-mfToon.avs should be identical to the output of the original mftoon.avs (also included), with twice the speed.
The requirements are:
- For mfToon:
. load the plugins called "MaskTools", "warsharp", "awarsharp" 

IV) TODO

Nothing, it all depends in feeback

V) Disclaimer

This plugin is released under the GPL license. You must agree to the terms of 'Copying.txt' before using the plugin or its source code.

You are also advised to use it in a philanthropic state-of-mind, i.e. not "I'll keep this secret for myself".

Last but not least, a very little part of all possible uses of each filter was tested (maybe 5% - still a couple of hours spent to debug ;-). Therefore, feedback is _very_ welcome (the opposite - lack of feedback - is also true...)

VI) Revisions

1.4.16

1.4.15.3

1.4.15.2

1.4.15.1

1.4.15

1.4.14.2

1.4.14.1

1.4.14

1.4.13

1.4.12

1.4.11

1.4.10

1.4.9

1.4.8

1.4.7

1.4.6

1.4.5

1.4.4

1.4.3

1.4.2

1.4.1

1.4.0

1.3.0 (private version)

1.2.0 (private version)

1.1.0 (private version)

1.0.2 (last version - public project dropped):

1.0.1: Initial release

VII) Developer's walkthrough

Skip to V) if you're not interested in developing the tools available.

The project is a VC++ 6 basic project. Each filter has its own folder which stores the header used by the interface, the source for the function members, the source for processing functions and its header. Let's look at EdgeMask:
- EdgeMask.h is included by the interface to know what the filter 'looks like' (but interface.cpp still holds the definition of the calling conventions and exported functions)
- EM_func.h describes the different processing functions (they should all have the same prototype/parameters):
. Line_MMX and Line_C
. Roberts_MMX and Roberts_C
. Sobel_MMX and Sobel_C
- EM_func.cpp, as all <filter's initials>_func.cpp, stores the implementation of the processing functions, and sometimes their MMX equivalents.
- EdgeMask.cpp implements the class; the constructor select the appropriate processing function (MMX? C? Roberts? Line? Sobel?) and uses it to fill the generic protected function pointer used in GetFrame

Interface.cpp stores the export function and all of the calling functions (AVSValue ... Create_<filter>).

ChannelMode.cpp defines the Channel operating modes. There could be added the equivalent of a debugprintf.

This quick walkthrough won't probably help most developers, as the examples of V) for users, but that's the best I've come with so far. It will improve of course over time depending on the success of the idea, which main drawback, speed, will probably make it scarcely used, if ever. <g>