Python string literal concatenation

Read the reference manual, it’s in there.
Specifically:

Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus, “hello” ‘world’ is equivalent to “helloworld”. This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings,

^{(emphasis mine)}

This is why:

string = str("Some chars "
         "Some more chars")

is exactly the same as: str("Some chars Some more chars").

This action is performed wherever a string literal might appear, list initiliazations, function calls (as is the case with str above) et-cetera.

The only caveat is when a string literal is not contained between one of the grouping delimiters (), {} or [] but, instead, spreads between two separate physical lines. In that case we can alternatively use the backslash character to join these lines and get the same result:

string = "Some chars " \
         "Some more chars"

Of course, concatenation of strings on the same physical line does not require the backslash. (string = "Hello " "World" is just fine)

Is Python joining these two separate strings or is the editor/compiler treating them as a single string?

Python is, now when exactly does Python do this is where things get interesting.

From what I could gather (take this with a pinch of salt, I’m not a parsing expert), this happens when Python transforms the parse tree (LL(1) Parser) for a given expression to it’s corresponding AST (Abstract Syntax Tree).

You can get a view of the parsed tree via the parser module:

import parser

expr = """
       str("Hello "
           "World")
"""
pexpr = parser.expr(expr)
parser.st2list(pexpr)

This dumps a pretty big and confusing list that represents concrete syntax tree parsed from the expression in expr:

-- rest snipped for brevity --

          [322,
             [323,
                [3, '"hello"'],
                [3, '"world"']]]]]]]]]]]]]]]]]],

-- rest snipped for brevity --

The numbers correspond to either symbols or tokens in the parse tree and the mappings from symbol to grammar rule and token to constant are in Lib/symbol.py and Lib/token.py respectively.

As you can see in the snipped version I added, you have two different entries corresponding to the two different str literals in the expression parsed.

Next, we can view the output of the AST tree produced by the previous expression via the ast module provided in the Standard Library:

p = ast.parse(expr)
ast.dump(p)

# this prints out the following:
"Module(body = [Expr(value = Call(func = Name(id = 'str', ctx = Load()), args = [Str(s="hello world")], keywords = []))])"

The output is more user friendly in this case; you can see that the args for the function call is the single concatenated string Hello World.

In addition, I also stumbled upon a cool module that generates a visualization of the tree for ast nodes. Using it, the output of the expression expr is visualized like this:

^{Image cropped to show only the relevant part for the expression.}

As you can see, in the terminal leaf node we have a single str object, the joined string for "Hello " and "World", i.e "Hello World".

If you are feeling brave enough, dig into the source, the source code for transforming expressions into a parse tree is located at Parser/pgen.c while the code transforming the parse tree into an Abstract Syntax Tree is in Python/ast.c.

This information is for Python 3.5 and I’m pretty sure that unless you’re using some really old version (< 2.5) the functionality and locations should be similar.

Additionally, if you are interested in the whole compilation step python follows, a good gentle intro is provided by one of the core contributors, Brett Cannon, in the video From Source to Code: How CPython’s Compiler Works.

More Related Contents:

Leave a Comment Cancel reply