Regular Expression For Escaping SOQL/SOSL Inputs

User Narrative

We aim to develop a table that incorporates search, sorting, and pagination features. In the upcoming posts, we’ll explore two approaches to achieve this, focusing on either server-side or client-side functionality. We’ll begin by addressing the fundamental aspect of searching, specifically writing our queries.

If you prefer to jump directly to the regular expression part, please scroll down to Step Two.

Context

Salesforce Object Search Language (SOSL) and Salesforce Object Query Language (SOQL) are two powerful query languages that allow us to search for records within Salesforce. We will be using SOQL for our project, but going over how to utilize both.

How do you know which to use? The documentation linked above lays it out:

When to Utilize SOSL

SOSL is appropriate in situations where the object or field containing the data is unknown, and you wish to:

  • Retrieve data related to a specific term that is known to exist within a field. SOSL excels at tokenizing multiple terms within a field and constructing a search index from this, resulting in faster searches and more relevant results.
  • Efficiently retrieve multiple objects and fields, even when they may not be directly related to each other.
  • Obtain data for a specific division within an organization using the divisions feature.
  • Access data written in Chinese, Japanese, Korean, or Thai. The morphological tokenization for CJKT terms enhances the accuracy of search results.

When to Utilize SOQL

Use SOQL when you are aware of the objects where the data is stored, and you want to:

  • Retrieve data from a single object or multiple related objects.
  • Count the number of records meeting specific criteria.
  • Sort results within the query.
  • Retrieve data from fields like number, date, or checkbox.

Let’s delve into some specifics:

Each programming language has its idiosyncrasies. For instance, in SOSL, the minimum search length is two non-wildcard characters, and it returns a maximum of 2000 records. SOQL, on the other hand, is preferable for sorting purposes. Moreover, both languages have distinct sets of reserved characters that necessitate escaping.

Escaping characters is crucial in coding as certain characters are treated as part of the code. To ensure they are interpreted as plain characters, they must be escaped.

In SOSL, these characters include: & | ! () {} [] ^ ” ~ * ? : \ ‘+ –

For SOQL, the characters needing escape are limited to: “‘_%

This distinction matters because clients may need to search for records containing special characters in their names, such as Evelyn “Another Salesforce Blog” Grizzle or Fifty%Fifty accounts. Moreover, naming conventions with characters like underscores, as in 06_30_2021_Account, require accurate searching that includes these characters. Additionally, handling special cases like accounts named Very Important Account is essential. To achieve this, escaping the necessary characters allows for precise searches.

This tutorial assumes familiarity with Lightning Web Components, Apex, SOQL/SOSL, regular expressions, and utilizes Visual Studio Code.

Resolution

Step One – Construct the Query in Apex.


Let’s simplify by querying for an account and its associated contacts. While our project will ultimately rely on a static SOQL query, let’s first explore a dynamic query approach.

DYNAMIC QUERY – SOQL

public static List<Account> getAccounts_dynamicSOQL(String searchString) {
    /** STRING SANITIZATION TO HAPPEN HERE! */
    String queryString = 'SELECT Id, Name, (SELECT Id, Name FROM Contacts) FROM Account WHERE Name LIKE :sanitizedSearchString';

    List<Account> listToReturn = database.query(queryString);
    return listToReturn;
}

STATIC QUERY – SOQL

public static List<Account> getAccounts_staticSOQL(String searchString) {
    /** STRING SANITIZATION TO HAPPEN HERE! */
    List<Account> listToReturn = [SELECT Id, Name, (SELECT Id, Name FROM Contacts) FROM Account WHERE Name LIKE :sanitizedSearchString];
    return listToReturn;
}

Suppose we wish to search for any Account or Contact with a particular name; in such cases, SOSL could be utilized. Although achievable in SOQL through multiple queries, for the sake of simplicity in this exercise, we will solely rely on SOSL.

DYNAMIC QUERY – SOSL

public static List<List<SObject>> getAccountsAndContacts_dynamicSOSL(String searchString) {
    /** STRING SANITIZATION TO HAPPEN HERE! */
    String queryString = 'FIND :sanitizedSearchString IN Name Fields RETURNING Account, Contact';
    
    List<List<SObject>> listToReturn = search.query(queryString);
    return listToReturn;
}

FIXED QUERY – SOSL

public static List<List<List<SObject>> getAccountsAndContacts_staticSOSL(String searchString) {
    /** STRING SANITIZATION TO HAPPEN HERE! */
    List<List<SObject>> listToReturn = [FIND :sanitizedSearchString IN Name Fields RETURNING Account, Contact];
    return listToReturn;
}

These queries look fine, but they’re susceptible to injection as written because we aren’t escaping single quotes. We can easily do this with the method String.escapeSingleQuotes(stringToEscape), but we also want to escape the reserved special characters, so we will handle this with a regular expression.

Step Two – Develop the Regular Expression.

Remember what I said about SOSL and SOQL having different reserved characters? This is where that is going to make a difference.

Foolishly, I thought that they had the same reserved characters, and struggled to come up with the regex as a result. My wasted weekend is your gain, as I will be sharing and explaining the search patterns below.We will be creating a reusable sanitization method, sanitizeString, that we will run our searchText input through for each of our queries. We will do this utilizing the Java Patterns Library and the Salesforce documentation.

SOSL Regular Expression

Once more, the special characters requiring escape in SOSL are: &|!(){}[]^”~*?:’+-

These align quite closely with the \p{Punct} class in Java, which encompasses all punctuation characters, such as .!#$%&'()*+,-./:;<=>?@[]^_`{|}~. Although a few characters need to be excluded from the \p{Punct} list, it serves as a solid starting point. We have the option to either construct a regex from scratch or utilize character subtraction from \p{Punct} to obtain our desired values. For simplicity, let’s use character subtraction.

public static String sanitizeStringSOSL(String searchText) {
    // \\p{Punct} in the pattern class matches all punctuation, including single quotes
    // https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
    String REGEXP = '[\\p{Punct}&&[^.#$%,/;<=>@_`]]';
    // \\$0 is the entire matched string, and \\ escapes it
    String REPLACE = '\\\\$0';

    return '%' + searchText.replaceAll(REGEXP, REPLACE) + '%';
}

Regular Expression for SOQL

Initially, we can utilize a regex generator tool like Regex101 to create a fundamental regex string. However, it’s crucial to remember that we must escape our characters for Salesforce to interpret them correctly. As a result, our string will appear slightly different, but this approach provides a helpful insight into what we need.

We first escape each character once to obtain its literal value in the regex generator. Then, we further escape each escape character to ensure readability in Salesforce. Going from right to left, the characters %_”‘:

: The backslash character requires three escapes to appear correctly in Java because it serves as the escape character itself, necessitating an escape to escape the escape. %: % isn’t a valid string literal in Salesforce, hence it needs to be escaped. _: Similarly, _ isn’t a valid string literal in Salesforce and thus requires escaping. “: To include a single double quote, we escape the backslash three times, where the initial escape acts to escape the double quote. ‘: We escape the single quote once to prevent any issues with the string literal, and then further escape that backslash with an additional escape, which in turn needs to be escaped three times. Quite straightforward, isn’t it?

public static String sanitizeStringSOQL(String searchText) {
    // regex to escape SOQL characters per documentation
    // https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql_select_quotedstringescapes.htm
    String REGEXP = '\\\\\'|\\\\"|\\_|\\%|\\\\';
    // \\$0 is the entire matched string, and \\ escapes it
    String REPLACE = '\\\\$0';

    return '%' + searchText.replaceAll(REGEXP, REPLACE) '%';
}

With our regular expressions in hand, we are now ready to commence a server-side search! Stay tuned for more updates.

Thank you for reading. Feel free to share any comments or questions you may have!