Author Archive

The ANTI JOIN – all values from table1 where not in table2

One of the less intuitive concepts I come across regularly in SQL is that of the ANTI JOIN. That is, requesting data from a table where some value is not in another table.

The wrong way of doing it is this:

SELECT
*
FROM table1 t1
WHERE t1.id NOT IN (SELECT id FROM table2)

This is very slow and inefficient, because the RDBMS has to compare each row with an entire table of values. The following is much much faster, as it matches up each row once, then simply eliminates the results that are not in the second table:

SELECT
*
FROM table1 t1
LEFT JOIN table2 t2 ON t1.id = t2.id
WHERE t2.id IS NULL

This is what is known as an ANTI JOIN.. And I really wish it were part of standard SQL.. Be careful, though,if table2.id has any NULL rows, you will experience unexpected results. Here’s my proposal, and the following should throw a fatal error if table2.id is not a NOT NULL column:

SELECT
*
FROM table1 t1
ANTI JOIN table2 t2 ON t1.id = t2.id

Too bad I don’t make the rules.

Get the average value from the nearest date in MS SQL

Needed to get the average exchange rate for the nearest invoice to a given date, and I found a quick and easy way to accomplish this in SQL Server:

SELECT TOP 1
   AVG(i.exchrate) / 100 AS exchrate
   , MIN(DATEDIFF(DAY, '" . $date . "', i.invoicedate)) AS accuracy
FROM InvoiceTable i
WHERE
   i.currencycode = '" . $currency_code . "'
   AND DATEDIFF(DAY, '" . $date . "', i.invoicedate) > 0
ORDER BY MIN(DATEDIFF(DAY, '" . $date . "', i.invoicedate))

GROUP_CONCAT and truncation / field serialization in MySQL

I wish there were a true serialize() grouping method in MySQL, as I frequently deal with data that must be viewed as groups of tables or groups of groups of tables, and sometimes even deeper. One hacky way I occasionally use (only when there is really no other alternative, because this sucks) to accomplish this without having to resort to recursion is

GROUP_CONCAT(items, '|||')

Most datasets I work with are sane enough to safely assume I won’t find 3x consecutive pipe characters in a given field. So, I simply explode that field out into an array.

$data = array();
while ($row = $result->fetch()) {
   $row['items'] = explode('|||', $row['items']);
   // or, to eliminate null columns / empty string values:
   //$row['items'] = preg_split('/\|\|\|/', $row['items'], -1, PREG_SPLIT_NO_EMPTY);
   $data[] = $row;
}

One of the problems you will run into, though, as your datasets start to get a little larger, is that this is an insufficient solution since GROUP_CONCAT() truncates each row by default (on most systems) by 1024 characters. Not to worry — there is a very simple fix to this.. simply increase the value of group_concat_max_len.

DB::mysql()->exec("SET SESSION group_concat_max_len = " . PHP_INT_MAX);

Maybe someday there will be a more elegant solution to what should be an unusual problem, as it seems I come up with a nasty solution like this on a weekly basis these days.

Next on the list should be a MODE() aggregate function. Like, for real.. tell me that’s harder to implement than STDDEV_POP().

Transfering MySQL databases between Servers via SSH

Usually, when transfering a MySQL database from one server to another, people pipe the output of mysqldump into a file, archive it if it is large, upload it to the remote host, unarchive it, and then pipe it into the MySQL instance from that shell.

There is a much easier way, although it doesn’t archive the data first.. but usually unless you’re dealing with larger tables, the ease of using this method makes slower transfers very acceptable.

mysqldump -u user -ppassword databasename | ssh username@example.com "mysql -u user -ppassword databasename"