The Dangers of Auto-Correct

As a Latin teacher, from time to time I type up a quiz or a worksheet.  It doesn’t matter what program I type it in — both Pages and Microsoft Word, and even Open Office, try to correct my spelling and remove my accent-marks.  It drives me nuts.

Thanks to a friend on Twitter, I now know that auto-correct is corrupting some potentially important databases, too.

Herewith the abstract:

Abstract

Background

When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names.

Results

A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original gene names cannot be recovered.

Conclusions

Users of Excel for analyses involving gene names should be aware of this problem, which can cause genes, including medically important ones, to be lost from view and which has contaminated even carefully curated public databases. We provide work-arounds and scripts for circumventing the problem.

This is, needless to say, a problem.  As auto-correct systems become more and more standardized, it becomes harder and harder to do data-entry on odd data-sets or develop unusual coding systems for new sources of information.  If the computer thinks it’s going to be ‘helpful’ and ‘fix’ data-sets any time one opens a data table, it becomes really quite serious.
This is one of the potentially dangerous side-effects of letting computers do work that should be done by humans, and it shows the risks of getting lazy about information.  I don’t have a solution, but the story itself is a warning to the wise.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.