Removing Watermark from PDF iTextSharp

Question

As the OP already mentioned, if you have complete control over the process originally creating the watermark, you can do as @ChrisHaas explained in his answer to the question the OP referred to.

If on the other hand the tool you create the watermark with does so in its own way, you will need a method customized for those watermarks.

This method usually will require that you edit some content stream. @ChrisHaas’ solution, by the way, does so, too.

To make this easier, one should start by creating a generic content stream editing functionality and then only use this functionality to edit out those watermarks.

Thus, here at first a sample generic content stream editor class and then a solution based thereon to edit out the OP’s sample watermark.

A generic content stream editor class

This PdfContentStreamEditor class parses the original content stream instruction by instruction keeping track of a part of the graphics state; the instructions are forwarded to its Write method which by default writes them back just as they come in, effectively creating an identical or at least equivalent copy of the original stream.

To actually edit the stream, simply override this Write method and only forward instructions you want in the result stream to the base Write method.

public class PdfContentStreamEditor : PdfContentStreamProcessor
{
    /**
     * This method edits the immediate contents of a page, i.e. its content stream.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void EditPage(PdfStamper pdfStamper, int pageNum)
    {
        PdfReader pdfReader = pdfStamper.Reader;
        PdfDictionary page = pdfReader.GetPageN(pageNum);
        byte[] pageContentInput = ContentByteUtils.GetContentBytesForPage(pdfReader, pageNum);
        page.Remove(PdfName.CONTENTS);
        EditContent(pageContentInput, page.GetAsDict(PdfName.RESOURCES), pdfStamper.GetUnderContent(pageNum));
    }

    /**
     * This method processes the content bytes and outputs to the given canvas.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void EditContent(byte[] contentBytes, PdfDictionary resources, PdfContentByte canvas)
    {
        this.canvas = canvas;
        ProcessContent(contentBytes, resources);
        this.canvas = null;
    }

    /**
     * This method writes content stream operations to the target canvas. The default
     * implementation writes them as they come, so it essentially generates identical
     * copies of the original instructions the {@link ContentOperatorWrapper} instances
     * forward to it.
     *
     * Override this method to achieve some fancy editing effect.
     */
    protected virtual void Write(PdfContentStreamProcessor processor, PdfLiteral operatorLit, List<PdfObject> operands)
    {
        int index = 0;

        foreach (PdfObject pdfObject in operands)
        {
            pdfObject.ToPdf(canvas.PdfWriter, canvas.InternalBuffer);
            canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '\n');
        }
    }

    //
    // constructor giving the parent a dummy listener to talk to 
    //
    public PdfContentStreamEditor() : base(new DummyRenderListener())
    {
    }

    //
    // Overrides of PdfContentStreamProcessor methods
    //
    public override IContentOperator RegisterContentOperator(String operatorString, IContentOperator newOperator)
    {
        ContentOperatorWrapper wrapper = new ContentOperatorWrapper();
        wrapper.setOriginalOperator(newOperator);
        IContentOperator formerOperator = base.RegisterContentOperator(operatorString, wrapper);
        return formerOperator is ContentOperatorWrapper ? ((ContentOperatorWrapper)formerOperator).getOriginalOperator() : formerOperator;
    }

    public override void ProcessContent(byte[] contentBytes, PdfDictionary resources)
    {
        this.resources = resources; 
        base.ProcessContent(contentBytes, resources);
        this.resources = null;
    }

    //
    // members holding the output canvas and the resources
    //
    protected PdfContentByte canvas = null;
    protected PdfDictionary resources = null;

    //
    // A content operator class to wrap all content operators to forward the invocation to the editor
    //
    class ContentOperatorWrapper : IContentOperator
    {
        public IContentOperator getOriginalOperator()
        {
            return originalOperator;
        }

        public void setOriginalOperator(IContentOperator originalOperator)
        {
            this.originalOperator = originalOperator;
        }

        public void Invoke(PdfContentStreamProcessor processor, PdfLiteral oper, List<PdfObject> operands)
        {
            if (originalOperator != null && !"Do".Equals(oper.ToString()))
            {
                originalOperator.Invoke(processor, oper, operands);
            }
            ((PdfContentStreamEditor)processor).Write(processor, oper, operands);
        }

        private IContentOperator originalOperator = null;
    }

    //
    // A dummy render listener to give to the underlying content stream processor to feed events to
    //
    class DummyRenderListener : IRenderListener
    {
        public void BeginTextBlock() { }

        public void RenderText(TextRenderInfo renderInfo) { }

        public void EndTextBlock() { }

        public void RenderImage(ImageRenderInfo renderInfo) { }
    }
}

Some backgrounds:

This class extends the PdfContentStreamProcessor from the iTextSharp parser namespace. This class originally is designed to merely parse content streams to return information for text, image, or graphics extraction. We make use of it to keep track of a part of the graphics state, more exactly those graphics state parameters relevant for text extraction.

If for specific editing tasks one also needs pre-processed information on e.g. the text drawn by the current instruction, one can use a custom IRenderListener implementation to retrieve that information instead of the DummyRenderListener used here which simply ignores it.

This class architecture is inspired by the PdfCleanUpProcessor from the iTextSharp.xtra extra library.

An editor to hide the OP’s watermark

As the OP has already found out, his watermarks can be recognized as the only document parts using transparency defined in an ExtGState object as ca value. To hide the watermark we therefore have to

recognize graphics state changes with respect to that value and
not draw anything when the recognized current ca value is less than 1.

Actually the watermark is built using vector graphics operations. Thus, we can restrict our editing to those operations. We can even restrict it to change the final drawing instruction (“stroke”https://stackoverflow.com/”fill”https://stackoverflow.com/”fill-and-stroke” plus certain variations) to not do the part (filling or stroking) which generates transparent content.

public class TransparentGraphicsRemover : PdfContentStreamEditor
{
    protected override void Write(PdfContentStreamProcessor processor, PdfLiteral oper, List<PdfObject> operands)
    {
        String operatorString = oper.ToString();
        if ("gs".Equals(operatorString))
        {
            updateTransparencyFrom((PdfName) operands[0]);
        }

        if (operatorMapping.Keys.Contains(operatorString))
        {
            // Downgrade the drawing operator if transparency is involved
            // For details cf. the comment before the operatorMapping declaration
            PdfLiteral[] mapping = operatorMapping[operatorString];

            int index = 0;
            if (strokingAlpha < 1)
                index |= 1;
            if (nonStrokingAlpha < 1)
                index |= 2;

            oper = mapping[index];
            operands[operands.Count - 1] = oper;
        }

        base.Write(processor, oper, operands);
    }

    // The current transparency values; beware: save and restore state operations are ignored!
    float strokingAlpha = 1;
    float nonStrokingAlpha = 1;

    void updateTransparencyFrom(PdfName gsName)
    {
        PdfDictionary extGState = getGraphicsStateDictionary(gsName);
        if (extGState != null)
        {
            PdfNumber number = extGState.GetAsNumber(PdfName.ca);
            if (number != null)
                nonStrokingAlpha = number.FloatValue;
            number = extGState.GetAsNumber(PdfName.CA);
            if (number != null)
                strokingAlpha = number.FloatValue;
        }
    }

    PdfDictionary getGraphicsStateDictionary(PdfName gsName)
    {
        PdfDictionary extGStates = resources.GetAsDict(PdfName.EXTGSTATE);
        return extGStates.GetAsDict(gsName);
    }

    //
    // Map from an operator name to an array of operations it becomes depending
    // on the current graphics state:
    //
    // * [0] the operation in case of no transparency
    // * [1] the operation in case of stroking transparency
    // * [2] the operation in case of non-stroking transparency
    // * [3] the operation in case of stroking and non-stroking transparency
    //
    Dictionary<String, PdfLiteral[]> operatorMapping = new Dictionary<String, PdfLiteral[]>();

    public TransparentGraphicsRemover()
    {
        PdfLiteral _S = new PdfLiteral("S");
        PdfLiteral _s = new PdfLiteral("s");
        PdfLiteral _f = new PdfLiteral("f");
        PdfLiteral _fStar = new PdfLiteral("f*");
        PdfLiteral _B = new PdfLiteral("B");
        PdfLiteral _BStar = new PdfLiteral("B*");
        PdfLiteral _b = new PdfLiteral("b");
        PdfLiteral _bStar = new PdfLiteral("b*");
        PdfLiteral _n = new PdfLiteral("n");

        operatorMapping["S"] = new PdfLiteral[]{ _S, _n, _S, _n };
        operatorMapping["s"] = new PdfLiteral[]{ _s, _n, _s, _n };
        operatorMapping["f"] = new PdfLiteral[]{ _f, _f, _n, _n };
        operatorMapping["F"] = new PdfLiteral[]{ _f, _f, _n, _n };
        operatorMapping["f*"] = new PdfLiteral[]{ _fStar, _fStar, _n, _n };
        operatorMapping["B"] = new PdfLiteral[]{ _B, _f, _S, _n };
        operatorMapping["B*"] = new PdfLiteral[]{ _BStar, _fStar, _S, _n };
        operatorMapping["b"] = new PdfLiteral[] { _b, _f, _s, _n };
        operatorMapping["b*"] = new PdfLiteral[]{ _bStar, _fStar, _s, _n };
    }
}

Beware: This sample editor is very simple:

It only considers transparency created by the ExtGState parameters ca and CA, it in particular ignores masks.
It does not look for operations saving or restoring the graphics state.

These limitations can easily be lifted but require more code than appropriate for a stackoverflow answer.

Applying this editor to the OP’s sample file like this

string source = @"test3.pdf";
string dest = @"test3-noTransparency.pdf";

using (PdfReader pdfReader = new PdfReader(source))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create, FileAccess.Write)))
{
    PdfContentStreamEditor editor = new TransparentGraphicsRemover();

    for (int i = 1; i <= pdfReader.NumberOfPages; i++)
    {
        editor.EditPage(pdfStamper, i);
    }
}

results in a PDF file without the watermark.

I don’t have the tools the OP exported the contents to word with, NitroPDF and Foxit, so I could not execute a final test. Adobe Acrobat (version 9.5) at least upon export to Word does not include the watermark .

If the OP’s tools still have traces of the watermark in the exported Word files, one can easily improve this class to actually drop path creation and drawing operations while transparency is active.

The same in Java

I started implementing this for iText in Java and only later realized the OP had iTextSharp in .Net on his mind. Here are the equivalent Java classes:

public class PdfContentStreamEditor extends PdfContentStreamProcessor
{
    /**
     * This method edits the immediate contents of a page, i.e. its content stream.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void editPage(PdfStamper pdfStamper, int pageNum) throws IOException
    {
        PdfReader pdfReader = pdfStamper.getReader();
        PdfDictionary page = pdfReader.getPageN(pageNum);
        byte[] pageContentInput = ContentByteUtils.getContentBytesForPage(pdfReader, pageNum);
        page.remove(PdfName.CONTENTS);
        editContent(pageContentInput, page.getAsDict(PdfName.RESOURCES), pdfStamper.getUnderContent(pageNum));
    }

    /**
     * This method processes the content bytes and outputs to the given canvas.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void editContent(byte[] contentBytes, PdfDictionary resources, PdfContentByte canvas)
    {
        this.canvas = canvas;
        processContent(contentBytes, resources);
        this.canvas = null;
    }

    /**
     * <p>
     * This method writes content stream operations to the target canvas. The default
     * implementation writes them as they come, so it essentially generates identical
     * copies of the original instructions the {@link ContentOperatorWrapper} instances
     * forward to it.
     * </p>
     * <p>
     * Override this method to achieve some fancy editing effect.
     * </p> 
     */
    protected void write(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException
    {
        int index = 0;

        for (PdfObject object : operands)
        {
            object.toPdf(canvas.getPdfWriter(), canvas.getInternalBuffer());
            canvas.getInternalBuffer().append(operands.size() > ++index ? (byte) ' ' : (byte) '\n');
        }
    }

    //
    // constructor giving the parent a dummy listener to talk to 
    //
    public PdfContentStreamEditor()
    {
        super(new DummyRenderListener());
    }

    //
    // Overrides of PdfContentStreamProcessor methods
    //
    @Override
    public ContentOperator registerContentOperator(String operatorString, ContentOperator operator)
    {
        ContentOperatorWrapper wrapper = new ContentOperatorWrapper();
        wrapper.setOriginalOperator(operator);
        ContentOperator formerOperator = super.registerContentOperator(operatorString, wrapper);
        return formerOperator instanceof ContentOperatorWrapper ? ((ContentOperatorWrapper)formerOperator).getOriginalOperator() : formerOperator;
    }

    @Override
    public void processContent(byte[] contentBytes, PdfDictionary resources)
    {
        this.resources = resources; 
        super.processContent(contentBytes, resources);
        this.resources = null;
    }

    //
    // members holding the output canvas and the resources
    //
    protected PdfContentByte canvas = null;
    protected PdfDictionary resources = null;

    //
    // A content operator class to wrap all content operators to forward the invocation to the editor
    //
    class ContentOperatorWrapper implements ContentOperator
    {
        public ContentOperator getOriginalOperator()
        {
            return originalOperator;
        }

        public void setOriginalOperator(ContentOperator originalOperator)
        {
            this.originalOperator = originalOperator;
        }

        @Override
        public void invoke(PdfContentStreamProcessor processor, PdfLiteral operator, ArrayList<PdfObject> operands) throws Exception
        {
            if (originalOperator != null && !"Do".equals(operator.toString()))
            {
                originalOperator.invoke(processor, operator, operands);
            }
            write(processor, operator, operands);
        }

        private ContentOperator originalOperator = null;
    }

    //
    // A dummy render listener to give to the underlying content stream processor to feed events to
    //
    static class DummyRenderListener implements RenderListener
    {
        @Override
        public void beginTextBlock() { }

        @Override
        public void renderText(TextRenderInfo renderInfo) { }

        @Override
        public void endTextBlock() { }

        @Override
        public void renderImage(ImageRenderInfo renderInfo) { }
    }
}

(PdfContentStreamEditor.java)

public class TransparentGraphicsRemover extends PdfContentStreamEditor
{
    @Override
    protected void write(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException
    {
        String operatorString = operator.toString();
        if ("gs".equals(operatorString))
        {
            updateTransparencyFrom((PdfName) operands.get(0));
        }

        PdfLiteral[] mapping = operatorMapping.get(operatorString);

        if (mapping != null)
        {
            int index = 0;
            if (strokingAlpha < 1)
                index |= 1;
            if (nonStrokingAlpha < 1)
                index |= 2;

            operator = mapping[index];
            operands.set(operands.size() - 1, operator);
        }

        super.write(processor, operator, operands);
    }

    // The current transparency values; beware: save and restore state operations are ignored!
    float strokingAlpha = 1;
    float nonStrokingAlpha = 1;

    void updateTransparencyFrom(PdfName gsName)
    {
        PdfDictionary extGState = getGraphicsStateDictionary(gsName);
        if (extGState != null)
        {
            PdfNumber number = extGState.getAsNumber(PdfName.ca);
            if (number != null)
                nonStrokingAlpha = number.floatValue();
            number = extGState.getAsNumber(PdfName.CA);
            if (number != null)
                strokingAlpha = number.floatValue();
        }
    }

    PdfDictionary getGraphicsStateDictionary(PdfName gsName)
    {
        PdfDictionary extGStates = resources.getAsDict(PdfName.EXTGSTATE);
        return extGStates.getAsDict(gsName);
    }

    //
    // Map from an operator name to an array of operations it becomes depending
    // on the current graphics state:
    //
    // * [0] the operation in case of no transparency
    // * [1] the operation in case of stroking transparency
    // * [2] the operation in case of non-stroking transparency
    // * [3] the operation in case of stroking and non-stroking transparency
    //
    static Map<String, PdfLiteral[]> operatorMapping = new HashMap<String, PdfLiteral[]>();
    static
    {
        PdfLiteral _S = new PdfLiteral("S");
        PdfLiteral _s = new PdfLiteral("s");
        PdfLiteral _f = new PdfLiteral("f");
        PdfLiteral _fStar = new PdfLiteral("f*");
        PdfLiteral _B = new PdfLiteral("B");
        PdfLiteral _BStar = new PdfLiteral("B*");
        PdfLiteral _b = new PdfLiteral("b");
        PdfLiteral _bStar = new PdfLiteral("b*");
        PdfLiteral _n = new PdfLiteral("n");

        operatorMapping.put("S", new PdfLiteral[]{ _S, _n, _S, _n });
        operatorMapping.put("s", new PdfLiteral[]{ _s, _n, _s, _n });
        operatorMapping.put("f", new PdfLiteral[]{ _f, _f, _n, _n });
        operatorMapping.put("F", new PdfLiteral[]{ _f, _f, _n, _n });
        operatorMapping.put("f*", new PdfLiteral[]{ _fStar, _fStar, _n, _n });
        operatorMapping.put("B", new PdfLiteral[]{ _B, _f, _S, _n });
        operatorMapping.put("B*", new PdfLiteral[]{ _BStar, _fStar, _S, _n });
        operatorMapping.put("b", new PdfLiteral[]{ _b, _f, _s, _n });
        operatorMapping.put("b*", new PdfLiteral[]{ _bStar, _fStar, _s, _n });
    }
}

(TransparentGraphicsRemover.java)

@Test
public void testRemoveTransparentGraphicsTest3() throws IOException, DocumentException
{
    try (   InputStream resource = getClass().getResourceAsStream("test3.pdf");
            OutputStream result = new FileOutputStream(new File(RESULT_FOLDER, "test3-noTransparency.pdf")))
    {
        PdfReader pdfReader = new PdfReader(resource);
        PdfStamper pdfStamper = new PdfStamper(pdfReader, result);
        PdfContentStreamEditor editor = new TransparentGraphicsRemover();

        for (int i = 1; i <= pdfReader.getNumberOfPages(); i++)
        {
            editor.editPage(pdfStamper, i);
        }

        pdfStamper.close();
    }
}

(excerpt from EditPageContent.java)

A generic content stream editor class

An editor to hide the OP’s watermark

The same in Java

More Related Contents:

Leave a Comment Cancel reply