Google Apps Script – Label Email Based On Email Body [Optimize Code]

One method of improving performance in nested loop situations – especially duplicate identification – is to store a record of traversed content, rather than repeatedly comparing. For example, you could hash the message body (given the right hash function) and store the hashes as object properties. Note that there is no formal limit on the length of an object property so you may be able to skip hashing it yourself (to obtain a fixed length property) and just let Google Apps Script do it for you. It’s probably wise to test how large a message can be before using such an assumption in production, naturally.

function updateEmailLabels() {
  // Use an Object to associate a message's plaintext body with the
  // associated thread/message IDs (or other data as desired).
  var seenBodies = {}, // When a message is read, its plaintext body is stored.
      DUPLICATE = _getLabel("SO_Duplicates"),
      ORIGINAL = _getLabel("SO_Original");

  // getThreads() returns newest first. Start with the oldest by reversing it.
  ORIGINAL.getThreads().reverse().forEach(function (thread) {
    thread.getMessages().forEach(function (message, messageIndex) {
      // Use this message's body for fast lookups.
      // Assumption: Apps Script has no reachable limit on Object property length.
      var body = message.getPlainBody();

      // Compare this message to all previously seen messages:
      if (!seenBodies[body]) {
        seenBodies[body] = {
          count: 1,
          msgIndices: [ messageIndex ],
          threads: [ thread ],
          threadIds: [ thread.getId() ]
        };
      } else {
        // This exact message body has been observed previously.
        // Update information about where the body has been seen (or perform
        // more intricate checks, i.e. compare threadIds and message indices,
        // before treating this thread and message as a duplicate).
        seenBodies[body].count += 1;
        seenBodies[body].msgIndices.push(messageIndex);
        seenBodies[body].threads.push(thread);
        seenBodies[body].threadIds.push(thread.getId());
      }
    }); // End for-each-message. 
  }); // End for-each-thread.

  // All messages in all threads have now been read and checked against each other.
  // Determine the unique threads to be modified.
  var threadsToChange = {};
  for (var body in seenBodies) {
    if (seenBodies[body].count === 1)
      continue;
    var data = seenBodies[body];
    for (var threadIndex = 1; threadIndex < data.threads.length; ++threadIndex)
      threadsToChange[data.threadIds[threadIndex]] = data.threads[threadIndex];
  }
  // Update their labels and archive status.
  for (var id in threadsToChange) {
    var thread = threadsToChange[id];
    DUPLICATE.addToThread(thread);
    ORIGINAL.removeFromThread(thread);
    GmailApp.moveThreadToArchive(thread);
  }
}

function _getLabel(labelText) {
  var label = GmailApp.getUserLabelByName(labelText);
  return label ? label : GmailApp.createLabel(labelText);
}

You’ll definitely want to tweak the duplicate detection bits, since I don’t exactly have qualifying emails just laying around 😉 I suspect what I’ve written will classify a thread as duplicate if at least 2 messages are the same, even if that thread is the only thread with that particular message body.

Leave a Comment