What Does Your Data Weigh?

What do you think? Does data have weight?

How much does your data weigh?

Or should we never ask questions about weight?

He ain’t heavy, he’s my data

Data can carry a lot of weight, it can be used to show how well your company is doing, it can be used to measure your own performance at work, it can be used to communicate to the masses, much like this platform, it can be used to look for patterns, as in trending, and so much more. Data can carry a lot of weight, but that’s not the title of the blog, it is not how much weight data can carry, but how much data weighs.

Data is both a blessing and a curse to some companies and even some individuals. In some cases, data must be kept for a certain period of time and after that time, it is permissible for us to delete or purge that data. This is typically what I would call a retention policy. When data is created it usually has a very high value, but over time that value will drop and that data is then beyond its usefulness to the company. During that time we do many things for that data, we protect it with backup, we may send it offsite for DR purposes, we may replicate it, we may even take snapshots or we may copy it to another tier of storage for analysis purposes and in some cases, we may back it up again. We have many copies of this data because it is extremely valuable, but once it reaches it maturation point, and no longer required by law or by internal governance policies to retain, we delete it.


No, we don’t, at least most of the companies I talk to don’t. And some of the reasons I hear most of the time is, “Well, we don’t know if we can or not”, or “Legal hasn’t told us we could”, or “my boss said just to keep everything for now”. Data is an asset, I don’t think anyone would argue against that statement. However, when data is kept beyond its point of usefulness, it may become a liability, and it is then when you can see how much data weighs.

Storytime: A tale of two companies

First, let’s start with a company that didn’t get data weight
Several years ago while I was running my consulting business in Chicago, I had a customer in the financial industry that contracted me to do a full assessment of its backup infrastructure. This initial project was to take approximately 5-10 days, after which I presented my findings. Among other things, what I found was this company would make duplicate copies of backup tapes and then ship these tapes offsite to a facility for storage. This is a good practice, and I would say the best practice for all to adhere to. However, one thing I found odd is that these tapes would never come back, never. They had old 4mm tapes out there, some first-generation DLT tapes, some LTO, etc. The first red flag I raised was, that they needed to clean this up, if for any other reason than to just save money on the storing of these tapes, but my primary reason was that this “data” that lives on these tapes may become a liability should the company be sued, investigated, etc. The response I got from that initial assessment was, “the CEO said we aren’t worried about it, we have enough money if we need to get someone to read those old tapes. Keep them there.” This relationship went from a 10-day assessment to a 3-month consulting contract that lasted nearly three years, and every six months I would conduct the same assessment, and provide that same red flag, and I’d be given the very same response. By 2003, I had sold my interest in my consulting business to my partner and moved to Colorado to work for another company. Around 2004 I returned to see my old friends and former customer with my new company. The meeting was tense, somber, and noticeably uncomfortable for everyone. I finally asked the question, “what’s wrong, you all seem like you’ve just come from a funeral.” And that is when they told me that the CEO was under investigation for improper trading practices. I was shocked, this was a good company, good people, and a great culture. The bottom line, data was being requested, such as emails, memos, etc. The data kept was showing its weight. Since I was not part of the investigative team, I don’t know what was found in the data, what evidence was used to charge these people at the top of this company, but I do know the toll it took on the people who were the stewards of this data, and who remembered my repeated raising of the red flag about having a proper, well-regulated retention policy. According to them, the chances were high that investigators found something in all of those tapes because they went back as far as they could.

And now for the second company

The second company was again in the financial industry, but they had an entirely different perspective on retention. In fact, this company’s policy was almost militant. This company had a one-use policy for backup tapes (LTO), one use. After the backup on the tape had expired, the tapes were brought to the basement of the building where they were thrown into a wood chipper and then shipped off to some waste company for recycling. In fact at one point after learning this I approached the Director of IT and said I thought it was pretty extreme and expensive to essentially throw away money like they were and offered they look at a virtual tape library (disk-based backup). He was all ears, so we talked about the benefits, why it would be faster, cheaper, and better for the company and then he asked, “can you 100% guarantee me the data will be completely eradicated after it is expired?”

Huh?  What?  Seriously?

Well, I explain, the backup application expires it and then it will eventually be overwritten by other backup jobs. That wasn’t good enough for him, he said that the data still exists on that tape, whether it is real or virtual, and the right person is able to extract data if they know what they are doing. So I countered with an option to overwrite the header of the tape after each expiration (I actually had written a Unix script to do just that for another client years earlier so I knew it was possible). To which he replied, nope, data is still there. Finally, I asked him why they were so worried about it and he told me a hypothetical story. What if a famous TV personality/Magazine editor was exchanging emails with one of our traders on a well-timed stock trade that some may consider questionable and we inadvertently kept that email exchange around for more than the legally required time and during that time the DOJ requested information from us on the digital assets pertaining to this person and because we inadvertently kept that data longer than necessary this person hypothetically was convicted?

GOT IT. Explain no more.

Even during that meeting, he said the DOJ was in the next room executing one of its search warrants for information on some case. Needless to say, they continued with their one-use LTO retention policy. This team understood, all too well, the weight of data, some would argue perhaps a bit too extreme, but in this company’s case, it was exactly what was needed to protect itself and its clients.

What is your retention policy? Do you have one? Should you? How do you go about defining a retention policy? Who should author it?
I have worked with many customers to help define a strategy for creating a retention policy. I’ll tell you it isn’t easy, and it does require effort and most importantly, it requires executive buy-in.

Don’t end up as an example of what not to do, but an example of what to do, by losing some of that data weight.

David A. Chapa, Founder, The CTE Group


Leave a Reply