How to remove special characters from data in an expression transformation
I have a Column serial_number(STRING (90)): We get several special characters like ‘#’,”-_ in a serial number. I need to remove that characters and pass the rest of the serial number.
E.g: NSN: LIBS-12292423:060115111437
we have three options for replacing special characters
- Using the REPLACE function
- Using the REGEXP_REPLACE function
- Using the TRANSLATE function
It allows you to replace a single character in a string and is probably the simplest of the three methods. The drawback is that it only allows you to replace one character. If you want to replace multiple, you can use nested functions, which can get messy.
REPLACE(your_column, CHR(13), ' ') disadvantages
REPLACE(REPLACE(your_column, CHR(13), ' '), CHR(10), ' ')
If you want to replace a lot of special characters, using many nested REPLACE functions can get messy and could have performance impacts.
it is similar to REPLACE, but it allows for multiple characters to be replaced in a single function.
TRANSLATE(your_column, CHR(10) || CHR(13, ' ') This function will replace the first character of the second parameter (CHR(10)) with the first character of the third parameter (a space). It will then replace the second character of the second parameter (CHR(13)) with the second character of the third parameter (another space).
disadvantages If you want to replace a long list of characters, you’ll need to specify a replacement character each time.
TRANSLATE(your_column, '!@#$%^&*()', ' ') There are 10 characters in the second parameter, so there needs to be 10 characters in the third parameter. Using REGEXP_REPLACE
NOTE It’s more powerful than the REPLACE and TRANSLATE functions, but you need to understand regular expressions to be able to use it.
uses regular expressions to replace characters. It’s flexible and allows for multiple characters, but there’s a bit of a learning curve with regular expressions.
REGEXP_REPLACE(your_column, '[^0-9A-Za-z]', '') This function takes three parameters.
First, it needs the value that contains your string, in this case, it’s your_column again.
Then, it has a regular expression in the second parameter. In this example, it means all characters that don’t match numbers or letters.
The third parameter is the character to replace any matching characters with.