Did Excel Cause the Government to Lose Covid Testing Data?

At the end of last month, there were a lot of headlines and articles about the fact that nearly 16,000 Covid-19 cases had gone unreported. To be precise, between the 25th September and the 2nd October, 15,841 cases had been left out of the government’s daily figures for England. Whilst all of the cases had been immediately informed that their test result was positive, there were significant delays in speaking to them to complete any contact tracing that might be required.

Speaking in the House of Commons on 5 October the Health secretary Matt Hancock said the issue had been caused by Public Health England (PHE) using a “legacy system” and he described the problem as a “glitch”. It then emerged that the issue was also something to do with the use of Microsoft Excel. So, what exactly was the problem?

The error was caused because of the process used by the PHE to consolidate data. Commercial firms paid to analyse Covid tests, submitted results as text-based lists called CSV (Comma Separated Value) files. These caused no problems. It was the file format PHE’s developers were using that led to cases being missed.

An old Excel file format known as XLS, which dates back to 1987 was being used. Microsoft replaced it with an updated version, XLSX, in 2007. An XLS file can only hold around 65,000 rows of data (65,536 precisely). The current version of Excel is capable of holding over a million rows of data.

Although there were, at that point, nowhere near 65,000 positive cases a day being identified, each test result created several rows of data, so any one XLS worksheet could only hold 1,400 cases. Once the worksheet had been filled, further cases were simply left off. Although many articles talk about exceeding Maximum file size, this is misleading, as maximum file size refers to something else entirely.

The big question is why an older Excel file format was used in the first place. If the most up to date version of Excel had been used, it would have handled 16 times the number of cases, which would have, at the very least, stopped this problem from occurring until the number of tests being processed was much higher.

The older format could have been used because XLS file format allows macros, whereas the newer XLSM does not, but an Excel file is very easily saved in the XLSM format, which does allow macros.

It was the fact that the PHE had, as Matt Hancock said used a “legacy system” that may be the real reason that the older format was chosen. On YouTube, Stand-up Maths posted on this subject and quoted an anonymous source in the NHS:

“Even up until very recently…this year (2020), they were using an old database within the NHS which has been around for over a decade and if they wanted to export a selection of those data points, because of a compatibility issue they could only export it as a XLS format”.

It seems that this could be the likely explanation. The NHS has been involved in a number of events in recent years, when using outdated software has caused it problems.

In the end, however, some have argued that it wasn’t the use of an outdated version of Excel that was the problem, it was the use of Excel at all, saying that Excel isn’t designed to handle such large amounts of data. That just isn’t true though. The row limit in the newest version of Excel is over 1 million, as noted earlier, but even this can be exceeded by using tools that come built into the program.

Rather than blame the tool used, the Association of Professional Healthcare Analysts, writing to the Health Secretary said:” This is not an IT glitch… This is a systematic failure that… can only be improved by long term investment in infrastructure and training.”.

It is understandable that the system to collate the data on Covid testing was put together quickly, but given the billions of pounds that have been spent on the Test and Trace system, this mistake should not have happened. And we certainly shouldn’t be blaming Excel.

0 Responses to “Did Excel Cause the Government to Lose Covid Testing Data?”

  1. No Comments

Leave a Reply