Internet of Things (Part 1)
April 15, 2020Gravitee Environment Variable Configuration for Docker
May 6, 2020Recently we needed to provide a hash of both an individual row and a set of rows of data in a REST API return for data integrity. My first thought was to do this programmatically in the application returning the data. As an alternative, I thought I would explore if anything was available on the database side since basically we were returning a row of data and a query result of data within the API. This is when I discovered that MySQL had an MD5 function for just such a purpose.
https://www.w3resource.com/mysql/encryption-and-compression-functions/md5().php
The MYSQL MD5 function takes any string and returns an MD5 128-bit checksum for that string.
MD5(str);
MD5 Hash of an individual row in MySQL
To hash each row in a query return, you can concatenate the data into a string and return it as a hash using a combination of the MD5 function and CONCAT function.
https://www.w3resource.com/mysql/string-functions/mysql-concat-function.php
For example:
select
ID,firstName,middleName,lastName,
md5(concat(ID,firstName,middleName,lastName) as hash
from someTable
There is one gotcha in this, and that is the fact that in MySQL, if any of the columns being concatenated together contain a NULL then the result of the concatenation would be NULL. We can solve this problem using another built in MySQL function called coalesce.
https://www.w3resource.com/mysql/comparision-functions-and-operators/coalesce-function.php
With coalesce we can always return a default string so the results that may contain a NULL value can be concatenated. Using the coalesce technique, our new query to hash each row would be
select
ID,firstName,middleName,lastName,
md5(concat(ID,coalesce(firstName,'a'),coalesce(middleName,'a'),coalesce(lastName,'a')) as hash
from someTable
MD5 Hash of a set of rows in MySQL
Now that we can hash each row in a query result, how do we go about getting a hash of the entire result set from any query in MySQL? This took another function built-in MySQL to accomplish. The GROUP_CONCAT function.
Using GROUP_CONCAT, we can set up a query that has the entire result set we want to hash in one string with a bit of creative querying. What we will do is set up a static value in our select and group on that static value.
So to hash a query result set or multiple rows we can use a query something like this
select
md5(group_concat(concat(ID,coalesce(firstName,'a'),coalesce(middleName,'a'),coalesce(lastName,'a'))) as hash,
1 as grouper
from someTable
where dateModified BETWEEN '2020-04-01 00:00:00' AND '2020-04-30 23:59:59'
GROUP BY grouper;
As you can see, we defined a constant value of 1 as a column in our result called “grouper” then using both GROUP and GROUP_CONCAT we can create a single hash of the data that we can use to ensure data integrity in any other system we need.
Hopefully, this quick overview can help you if you need to do any hashing in MySQL. There are any number of use cases for this and I am glad that there are some built-in functions right in the database to enable data processing like this to happen.
3 Comments
If the concatenated fields are great than 1024 characters, the technique to get the checksum of the record doesn’t work.
I desperately need a workaround but have been unable to located one
The MySQL 8.0 manual says the following about GROUP_CONCAT:
The result is truncated to the maximum length that is given by the group_concat_max_len system variable, which has a default value of 1024. The value can be set higher, although the effective maximum length of the return value is constrained by the value of max_allowed_packet. The syntax to change the value of group_concat_max_len at runtime is as follows, where val is an unsigned integer:
See https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_group-concat for details.
Hello, Can you please explain, why do we need 1 as grouper and group by in this sql? When we do md5 on group_concat, we get only single row. So what are we grouping here and why?