
Case study: "Unless we strive for perfect data we're never going to get it"
By Jo Best
Published: 4 December 2008 10:29 GMT
With more than 300 separate databases, each with a possible 100 variables and an offshore team preparing datasets that will be transformed into complex analytical models, pharmaceuticals giant AstraZeneca is reliant on high quality data.
Wayne Obetz, senior manager of quantitative commercial insight at AstraZeneca, told silicon.com about the challenges in keeping data up to scratch.
"I could retire a rich man if I had a dollar for every time someone told me: 'well, we're never going to have perfect data, just do the best with what you've got'. My comeback to that is, unless we strive for perfect data we're never going to get it," he said.
Among the complications facing AstraZeneca's data watchers are interlocking datasets with different start and end points - one weekly dataset could run from Sunday to Saturday, another Monday to Sunday. But wholesale changes are not always an option because the data is used in a number of legacy systems.
"[What] we have is that we have a bunch of legacy programs that expect data to be in a certain place on a file, so if you're going to start adding in variables, you throw off all that old code base. We have to be very careful in updates in one place that cause unintended consequences somewhere else," Obetz said.
Failing to spot a problem with a dataset will throw the company's models off and lead to AstraZeneca workers working from reports that aren't in sync.
"Keeping everything in sync - because there are so many moving parts - making sure they're all in the same spot at the same time is a particular concern of mine," Obetz said.
To spot rogue data, AstraZeneca gets a helping hand from SAS.
"For us, SAS kind of serves a dual purpose - it allows us to very rapidly turn around ad hoc requests which, if we get those requests often enough, we turn that over to [the] reporting team and they can develop reports that replicate what we're doing.
"The other thing is it kind of serves as a check number coming out of other systems - we're so intimately familiar with the data on our end if we get conflicting numbers, we know to raise an issue with reporting folks and tell them, 'we're not getting agreement on the numbers we think we should be getting - you need to go back and take a look at what's happening on your side'," Obetz said.
According to Obetz, the Base SAS product allows the company to handle a large volume of data.
"[SAS] allows us to report more flexibly than any of the other tools we have… and the added benefit is, when we're analysing things in SAS, the folks writing in SAS code are also aware of some of [the] data issues that the general report users aren't aware of so we know to bypass certain types of records or avoid sharing data of a certain type because we know, say, there's been a problem with the data feed one week," he said.
At AstraZeneca, even if the company checked 100 variables per day, it would take more than a year to check all the variables coming in. In the quest for perfect data, human vigilance is still order of the day and Obetz recommends an inventory to keep data updates as small as possible.
"The best way to do it is to go back and look at the data, look at why you're collecting it and what you're doing with it. My training has told me don't collect it unless you have a purpose in mind for collecting it. We have a lot of data we collect [on a] monthly basis, even a weekly basis that we may use once a year but all through the year, you're collecting it, updating it, but if it's not going to be used, that update cycle ought to be changed so you're not spending time and effort needlessly on updates that aren't going to be used," he advised.
To manage the CRM database to facilitate your day-to-day business and create your appointment forecast whilst maintaining accurate records of ...
Key responsibilities:- - Collecting / Understanding / Documenting Business Requirements and translating them into functional specifications and ...
Delivery duties * That we make the right promises, and that contracted promises to both clients and internal clients are kept on time and within ...
Agenda Setters 2009
Welcome to the ninth annual Agenda Setters poll – silicon.com's list of the top 50 most influential individuals in the technology and IT industries, from techies and CIOs to entrepreneurs and business leaders. Find out more in our latest special report.
Data Protection Strategies: Deduplication for More Efficient Backups
Dell PowerVault DL2100 Powered by CommVault - Spec Sheet
True Convergence Demands a Communication Service Provider that Embraces a Customer-Centric...
Learn how Performance Metrics for Telcomm Expense Management Drive new ROIs and SLAs
Stories from the web...
Copyright © 2008 CBS Interactive Limited. All rights reserved. Top of page
Mark Crichard Doing business with citizen developers: Beware the legal pitfalls Legal Eye: Make sure your business is protected from potential hazards
Tim Ferguson How CIOs can achieve post-recession success Q&A: McKinsey & Company on living in the 'new normal' business world