HTML is too complex to reliably parse with a regular expression.
If you’re looking to do this client-side, you can create a document fragment and/or disconnected DOM node (neither of which is displayed anywhere) and initialize it with your HTML string, then walk through the resulting DOM tree and process the text nodes. (Or use a library to help you do that, although it’s actually quite simple.)
Here’s a DOM walking example. This example is slightly simpler than your problem because it just updates the text, it doesn’t add new elements to the structure (wrapping parts of the text in span
s involves updating the structure), but it should get you going. Notes on what you’ll need to change at the end.
var html =
"<p>This is a test.</p>" +
"<form><input type="text" value="test value"></form>" +
"<p class="testing test">Testing here too</p>";
var frag = document.createDocumentFragment();
var body = document.createElement('body');
var node, next;
// Turn the HTML string into a DOM tree
body.innerHTML = html;
// Walk the dom looking for the given text in text nodes
walk(body);
// Insert the result into the current document via a fragment
node = body.firstChild;
while (node) {
next = node.nextSibling;
frag.appendChild(node);
node = next;
}
document.body.appendChild(frag);
// Our walker function
function walk(node) {
var child, next;
switch (node.nodeType) {
case 1: // Element
case 9: // Document
case 11: // Document fragment
child = node.firstChild;
while (child) {
next = child.nextSibling;
walk(child);
child = next;
}
break;
case 3: // Text node
handleText(node);
break;
}
}
function handleText(textNode) {
textNode.nodeValue = textNode.nodeValue.replace(/test/gi, "TEST");
}
The changes you’ll need to make will be in handleText
. Specifically, rather than updating nodeValue
, you’ll need to:
- Find the index of the beginning of each word within the
nodeValue
string. - Use
Node#splitText
to split the text node into up to three text nodes (the part before your matching text, the part that is your matching text, and the part following your matching text). - Use
document.createElement
to create the newspan
(this is literally justspan = document.createElement('span')
). - Use
Node#insertBefore
to insert the newspan
in front of the third text node (the one containing the text following your matched text); it’s okay if you didn’t need to create a third node because your matched text was at the end of the text node, just pass innull
as therefChild
. - Use
Node#appendChild
to move the second text node (the one with the matching text) into thespan
. (No need to remove it from its parent first;appendChild
does that for you.)