Lies, Damn Lies, and Statistics

The Holy Grail of the BI world is “the single version of the truth.” Indeed even I have posed the question to employers, “Would you rather have one person who always had the right answer or have everyone have the exact same answer?” So much time is spent unraveling the dark magic that the single individual employs to produce numbers, which nobody else can ever hope to reproduce it….and then that person leaves the company. So yes, the long term answer is everyone having the same answer because if it is wrong it gets fixed and now everyone has the right answer. Correct?

Well not so fast. I recall a Controller who once told me, I can make our quarterly earnings look as good or as bad as we need them. Accountants, researchers, salesmen, really almost everyone who deals with numbers has perfected the art of making them say what you want. Truth be told I don’t think the Grail exists, not that Grail anyway. Maybe I’m not pure enough in soul to ever find it, or maybe it’s because the Grail is not truly in the form people expect. Maybe a single version of the truth is not the Grail at all. Perhaps the Grail is transparency…lineage. No more secret ingredients.

Cognos BI, and no doubt other platforms, gives us a lineage service for revealing the ingredients. I feel it is a largely undersold capability. At a recent demonstration where I showed a new data mart delivering, to the penny, the exact numbers, names, etc… as an older trusted system people were amazed. Frankly I was surprised by the “to the penny” thing! When asked how I did it, because nobody else who tried had even come close, I clicked the lineage service and showed how my numbers came from the new mart, but also that the relationship identifiers came from the older system and the labeling from an Access database. Yes the dark arts were in full force! So the truth is where the new mart fell short, magic picked it up and lineage (like that guy on TV who reveals magician secrets) showed everyone how to do it! So did I cheat?

In my opinion BI is not about “the single version of the truth.” Truth can always be twisted. It’s about presenting factual data in relevant perspectives so that the same data can have meaning to different audiences. Lineage shows us how these perspectives are generated so when disputes arise (yes that’s a “when” not an “if”), it can become clear that perhaps everyone is correct. It’s all a matter of how you view things; your perspective.

And once you understand this, maybe companies can stop wasting those millions on the single, authoritative data warehouse…but that is a topic for another time!


Tools of the Trade

Your data may well be everywhere and you’re thinking that you don’t have the means to acquire real infrastructure. I mean how can you keep up with big companies when they have all those servers and resources, etc… I’m reminded of a scenario from a previous employer. They were a major university that wanted to compete with Harvard. But how could they? They did not have the money/resources to go toe to toe.

The answer is you can’t, at least not playing their game. So change the game! The open source movement is a huge boon to the way of thinking that I am espousing. Sure you might not be able to afford Oracle 11g Enterprise, but you can get Apache Server (web server) and MySQL  (database). I’ve watched these products mature over the years and am so impressed. Don’t let the price tag (free) fool you, they’re solid!  So as we begin to delve into some DIY, while not all projects will require these, you might find that it’s good to know they’re there. And if a DIY project does use one or both of these, I’ll be sure to provide step-by-step instructions for setting them up.

Information vs Data

Data is everywhere today. Indeed at a recent conference the message was clear: you must assume that all data is captured somewhere today. Nothing is dropped. “Oh, but we delete all of our prospect information.” Maybe so, but in today’s data environment you have to assume that despite the fact that you no longer are aware of a specific event, someone somewhere is actually aware it. Never assume that it is lost forever. Perhaps the person tweeted, “OMG, @ABCWidgets just totally rejected me! #theirmistake ”

At it’s current growth rate data doubles about every two years, a trend that is not going to subside but rather accelerate. Internally corporations are struggling to keep up with their old operating models around data retention and storage. New operating models are being implemented. Huge hardware purchases are going on. Data appliances/vaults are popping up all over. And yet most companies, if they’re honest, will still put the effective use of their data stores on the list of things they don’t do well. Failures are as abundant as the data itself. (Warehousing Failures)

In my experience one key problem is the understanding of data vs information. Data is just that, the bits and pieces collected through transaction systems, spreadsheets, maybe your quickbooks, etc… that typically answer a “what” question, and maybe “when” and “who.” As useful as that can be, and it is, it seldom gives you the full picture, “the why.” And beyond that it does not implicitly give you and understanding of how to behave in the future to either encourage the repeat of a desirable event, or discourage an undesirable event. So I define “Information” as the restructuring and enriching of this data to create something that has meaning which can become actionable. Consider the case of the spreadsheet of data that is precisely what the recipient has asked for and received for the last 10yrs. That’s a whole lot of “what” w/o much meaning.  There might be logical interpretation by the individual of the information on that page, but likely few if anybody else could make those interpretations. That list is simply data. The BI system, which interprets and makes that data actionable, is in the brain of that individual.

Your goal in BI is to get information, not just data, widely available in your organization. Information is something that has form, is actionable, in effect gives you the sense of what you should do or ask next. When I analyze consumer data I find names are a piece of data that I ultimately can do without. Don’t get me wrong, they are very useful in the initial stages of data enrichment (which I’ll discuss in the near future!), but I find once I understand what a person represents, their demographic, their habits, etc… the personally identifiable aspect of that data becomes something I’ll park off in a corner. And that keeps privacy issues off the list of concerns as well, always a good thing!

The first DIY project will demonstrate data vs information as well as enrichment of your personal data with public sources to create actionable information. Stay tuned!

Some of this and some of that

This blog is intended for different audiences at different times. Many topics will be merely explanations of the type of work that can be done and a high level of how you’d do them. Others may be specific coding techniques, intended for the information producers amongst us. The last, I to me most exciting category, will be the DIY projects. In each of these articles will be descriptions and screen shots as to what the project contains and how it might be used, and then companion project files so you can do it yourself! We live in an age where web technology is ubiquitous, so the typical DIY project would be an html page template, some script files, and perhaps a sample dataset or 2. Taken as a whole these topics should not only paint a picture of my philosophy around BI, but also give you usable examples of the same with the goal making at least some cool tools available for all.

Just as great athletes elevate their games when playing the best, I truly believe we’re all better off when we all operate better/smarter.


I recall my first computer program: I wrote a “lunar lander” game for the TI-99. The code was TI Basic and was saved on a cassette tape. I loved that project because it involved all aspects of my brain. I had no idea where to start, exactly, but I just started. As soon as I saw something on the screen I knew what needed to be changed/added/enhanced. After some misdirection in my life, known as high school and college (!), I returned to IT. It was now 1990 and personal computers were beginning to make their move.  I did some programming for work and took some data modeling courses as well as masters level work in decision support systems. My final project was a functional specification for an expert system: a retail banking scoring system.

Almost 20yrs later I’m working in Business Intelligence in financial services. I swear it was an accident! In fact before 2010 I had not worked in financial services since 1995. In the interim I’ve seen and done quite a few things, some which used all of my brain, and some of which used little. The things I’m most proud of are not the biggest accomplishments, but rather those that ended up meaning the most to the people for whom they were intended.  The tools, technology, and people connected to my industry are usually always priced at a premium making first rate BI inaccessible to the large number of people and companies who would actually use and appreciate it.

This blog is for the people and the companies who want to learn a little bit about what one person and his associates have done and want to share.