python-pptx – How to replace keyword across multiple runs?

As one can find in python-pptx’s documentation at https://python-pptx.readthedocs.io/en/latest/api/text.html

  1. a text frame is made up of paragraphs and
  2. a paragraph is made up of runs and specifies a font configuration that is used as the default for it’s runs.
  3. runs specify part of the paragraph’s text with a certain font configuration – possibly different from the default font configuration in the paragraph

All three have a field called text:

  1. The text frame’s text contains all the text from all it’s paragraphs concatenated together with the appropriate line-feeds in between the paragraphs.
  2. The paragraphs’s text contains all the texts from all of it’s runs concatenated to a long string with a vertical tab character (\v) put wherever there was a so-called soft-break in any of the run’s text (a soft break is like a line-feed but without terminating the paragraph).
  3. The run’s text contains text that is to be rendered with a certain font configuration (font family, font size, italic/bold/underlined, color etc. pp). It is the lowest level of the font configuration for any text.

Now if you specify a line of text in a text-frame in a PowerPoint presentation, this text-frame will very likely only have one paragraph and that paragraph will have just one run.

Let’s say that line says: Hi there! How are you? What is your name? and is all normal (neither italic nor bold) and in size 10.

Now if you go ahead in PowerPoint and make the questions How are you? What is your name? stand out by making them italic, you will end up with 2 runs in our paragraph:

  1. Hello there! with the default font configuration from the paragraph
  2. How are you? What is you name? with the font configuration specifying the additional italic attribute.

Now imagine, we want the How are you? stand out even more by making it bold and italic. We end up with 3 runs:

  1. Hello there! with the default font configuration from the paragraph.
  2. How are you? with the font configuration specifying the BOLD and ITALIC attribute
  3. What is your name? with the font configuration specifying the ITALIC attribute.

One step further, making the are in How are you? bigger. We get 5 runs:

  1. Hello there! with the default font configuration from the paragraph.
  2. How with the font configuration specifying the BOLD and ITALIC attribute
  3. are with the font configuration specifying the BOLD and ITALIC attribute and font size 16
  4. you? with the font configuration specifying the BOLD and ITALIC attribute
  5. What is your name? with the font configuration specifying the ITALIC attribute.

So if you try to replace the How are you? with I'm fine! with the code from your question, you won’t succeed, because the text How are you? is actually distributed across 3 runs.

You can go one level higher and look at the paragraph’s text, that still says Hello there! How are you? What is your name? since it is the concatenation of all its run’s texts.

But if you go ahead and do the replacement of the paragraph’s text, it will erase all runs and create one new run with the text Hello there! I'm fine! What is your name? all the while deleting all the formatting that we put on the What is your name?.

Therefore, changing text in a paragraph without affecting formatting of the other text in the paragraph is pretty involved. And even if the text you are looking for has all the same formatting, that is no guarantee for it to be within one run. Because if you – in our example above – make the are smaller again, the 5 runs will very likely remain, the runs 2 to 4 just having the same font configuration now.

Here is the code to produce a test presentation with a text box containing the exact paragraph runs as given in my example above:

from pptx import Presentation
from pptx.chart.data import CategoryChartData
from pptx.enum.chart import XL_CHART_TYPE,XL_LABEL_POSITION
from pptx.util import Inches, Pt
from pptx.dml.color import RGBColor
from pptx.enum.dml import MSO_THEME_COLOR

# create presentation with 1 slide ------
prs = Presentation()
slide = prs.slides.add_slide(prs.slide_layouts[5])
textbox_shape = slide.shapes.add_textbox(Pt(200),Pt(200),Pt(30),Pt(240))
text_frame = textbox_shape.text_frame
p = text_frame.paragraphs[0]
font = p.font
font.name="Arial"
font.size = Pt(10)
font.bold = False
font.italic = False
font.color.rgb = RGBColor(0,0,0)

run = p.add_run()
run.text="Hello there! "

run = p.add_run()
run.text="How "
font = run.font
font.italic = True
font.bold = True

run = p.add_run()
run.text="are"
font = run.font
font.italic = True
font.bold = True
font.size = Pt(16)

run = p.add_run()
run.text=" you?"
font = run.font
font.italic = True
font.bold = True

run = p.add_run()
run.text=" What is your name?"
run.font.italic = True

prs.save('text-01.pptx')

And this is what it looks like, if you open it in PowerPoint:

The created presentation slide

Now if you install the python code from my GitHub repository at https://github.com/fschaeck/python-pptx-text-replacer by running the command

python -m pip install python-pptx-text-replacer

and after successful installation run the command

python-pptx-text-replacer -m "How are you?" -r "I'm fine!" -i text-01.pptx -o text-02.pptx

the resulting presentation text-02.pptx will look like this:

The changed presentation

As you can see, it mapped the replacement string exactly onto the existing font-configurations, thus if your match and it’s replacement have the same length, the replacement string will retain the exact format of the match.

But – as an important side-note – if the text-frame has auto-size or fit-frame switched on, even all that work won’t save you from screwing up the formatting, if the text after the replacement needs more or less space!

If you got issues with this code, please use the possibly improved version from GitHub first. If your problem remains, use the GitHub issue tracker to report it. The discussion of this question and answer is already getting out of hand. 😉

Leave a Comment