top of page

Much Ado About Bad Data

By Frederick Hemans | August 17, 2017


Located in Hobart, Tasmania, Australia, the Commission for the Conservation of Antarctic Marine Living Resources (CCAMLR) is one regulatory arm of the 1959 Antarctic Treaty System. Simply put, you can think of CCAMLR as a mini-United Nations, except revolving around management of the ecosystem of the Southern Ocean. Its 25 members are stakeholders with some level of involvement in the region, and, just like in the U.N., factions can form among nations primarily concerned with fishing, scientific research, conservation, or asserting historic claims of portions of the Antarctic.

I was lucky enough to spend a good part of this winter (Southern Hemisphere, remember) working for the CCAMLR Secretariat, the administrative arm of the Commission, along with the men and women responsible for management of the environmental programs and data gathering. Much of my activity at CCAMLR revolved around somewhat benign subject matter: cataloging national and subnational HS codes, summarizing various Commission activities, and working on budgetary proposals for satellite monitoring of fishing vessels. However, one specific project I worked on in my brief time at the Commission sheds light on a very interesting and complex issue: what to do with data gone bad?

Illegal fishing near the Antarctic revolves around Antarctic and Patagonian Toothfish, known to consumers as ‘Chilean Sea Bass’ (despite the fact that it is not Bass, and most of it isn't caught particularly close to Chile). Its high global price per kilo has given it another name: ‘White Gold’.

Huge demand in North America and Europe has sparked a glut in fishing activities, and competition has pushed more vessels into more distant waters in the Southern Ocean.  Like most industries with the potential for huge profits, there is also great potential for illegal activities. Some of these are wholly illegal from start to finish: vessels procure funding from organized crime to make illegal trips into national and international waters, following no environmental guidelines or procedures for conservation or pollution control . Other vessels work for legitimate companies with permits to operate, but deliberately misreport their numbers and locations in order to maximize the profitability of their fishing expeditions into the frigid waters surrounding the Antarctic Continent.

It is with this data we are most concerned with. CCAMLR performs certain checks on reported data, such as measuring standard deviation checks of reported body size, total tonnage, and number of fish caught per how much effort a vessel put in. If this data is shown, with a high level of confidence (our classic 95% confidence Interval, to be exact), to be likely incorrect or falsified, what then does this do to our analysis of total historical trends, and how to use this data to measure legal sustainable catch in the future?

What has been the policy up to this date is a total data quarantine: simply put the data from these vessels aside as unsuitable. Essentially, this option has the benefit of preventing this potentially problematic data from widespread use. However, quarantining this data essentially excludes any of the thousands of tons of fish captured from official estimates, like it never occurred in the first place. We know fish were caught and we know approximately where, but what standards do we apply in that data we use in our work?

The key issue is that without a quarantine, bad data is, in fact, contagious. Its inclusion in population models  can bias all future estimates of factors such as population size and distribution of Toothfish, already a difficult task considering the harsh climate and often difficult sea conditions.

But if we accept that this reported data is fraught with irregularities and misreporting, how do we incorporate this data into analysis? Essentially, borrowing from one famous GPS professor, how much baby do we throw out with the bathwater? There must be some middle ground between total quarantine, and blind acceptance.

In my last project at the Commission this winter, I wrote an analysis and report on a possible implementation of a data flagging scheme in use by many other oceanographic institutions. This data flagging scheme could be helpful for CCAMLR to annotate specific issues with all their available data. Scientists and regulators will have more information surrounding potential problematic data, and can perform their own specific checks and analysis based on best available practices. Overall, it remains to be seen how the Commission will proceed in dealing with the quarantined data. With the glacial pace at which multilateral and international organizations can sometimes operate, it may be awhile before any action is taken at all. Ultimately,  I’m genuinely thankful for the opportunity to work on some complex and impactful issues these last two months.


Frederick Hemans is a second year MIA student at UC San Diego School of Global Policy and Strategy, and is the Director of Content at JIPS. His area of study is on international environmental policy and global common resources.


bottom of page