
Case study: "Unless we strive for perfect data we're never going to get it"
By Jo Best
Published: 4 December 2008 10:29 GMT
With more than 300 separate databases, each with a possible 100 variables and an offshore team preparing datasets that will be transformed into complex analytical models, pharmaceuticals giant AstraZeneca is reliant on high quality data.
Wayne Obetz, senior manager of quantitative commercial insight at AstraZeneca, told silicon.com about the challenges in keeping data up to scratch.
"I could retire a rich man if I had a dollar for every time someone told me: 'well, we're never going to have perfect data, just do the best with what you've got'. My comeback to that is, unless we strive for perfect data we're never going to get it," he said.
Among the complications facing AstraZeneca's data watchers are interlocking datasets with different start and end points - one weekly dataset could run from Sunday to Saturday, another Monday to Sunday. But wholesale changes are not always an option because the data is used in a number of legacy systems.
"[What] we have is that we have a bunch of legacy programs that expect data to be in a certain place on a file, so if you're going to start adding in variables, you throw off all that old code base. We have to be very careful in updates in one place that cause unintended consequences somewhere else," Obetz said.
Failing to spot a problem with a dataset will throw the company's models off and lead to AstraZeneca workers working from reports that aren't in sync.
"Keeping everything in sync - because there are so many moving parts - making sure they're all in the same spot at the same time is a particular concern of mine," Obetz said.
To spot rogue data, AstraZeneca gets a helping hand from SAS.
"For us, SAS kind of serves a dual purpose - it allows us to very rapidly turn around ad hoc requests which, if we get those requests often enough, we turn that over to [the] reporting team and they can develop reports that replicate what we're doing.
"The other thing is it kind of serves as a check number coming out of other systems - we're so intimately familiar with the data on our end if we get conflicting numbers, we know to raise an issue with reporting folks and tell them, 'we're not getting agreement on the numbers we think we should be getting - you need to go back and take a look at what's happening on your side'," Obetz said.
According to Obetz, the Base SAS product allows the company to handle a large volume of data.
"[SAS] allows us to report more flexibly than any of the other tools we have… and the added benefit is, when we're analysing things in SAS, the folks writing in SAS code are also aware of some of [the] data issues that the general report users aren't aware of so we know to bypass certain types of records or avoid sharing data of a certain type because we know, say, there's been a problem with the data feed one week," he said.
At AstraZeneca, even if the company checked 100 variables per day, it would take more than a year to check all the variables coming in. In the quest for perfect data, human vigilance is still order of the day and Obetz recommends an inventory to keep data updates as small as possible.
"The best way to do it is to go back and look at the data, look at why you're collecting it and what you're doing with it. My training has told me don't collect it unless you have a purpose in mind for collecting it. We have a lot of data we collect [on a] monthly basis, even a weekly basis that we may use once a year but all through the year, you're collecting it, updating it, but if it's not going to be used, that update cycle ought to be changed so you're not spending time and effort needlessly on updates that aren't going to be used," he advised.
Manage room bookings and equipment requests Manage deliverable tracking for and collect and maintain electronic and paper copies of deliverables. You ...
Responsibilities:- - Collecting, understanding and documenting the business requirements for the project and translating these into functional ...
In this context, the business analyst will carry out the following duties:Collect and understand the business issues and needsTranslate the business ...
Agenda Setters 2009
Welcome to the ninth annual Agenda Setters poll – silicon.com's list of the top 50 most influential individuals in the technology and IT industries, from techies and CIOs to entrepreneurs and business leaders. Find out more in our latest special report.
Power Solutions Article: High-Availability Virtualization with Dell EqualLogic Arrays...
Power Solutions Article:Â Power Solutions Article: Getting Started with Microsoft...
Customer Case Study:Â A L Filters
Solution Brief: Dell Equalogic PS Series Can Offer Robust, High-Availability Infrastructure...
Stories from the web...
Copyright © 2008 CBS Interactive Limited. All rights reserved. Top of page
Naked CIO Naked CIO: Social networks are useless for finding a job 'Quantity over quality' approach poisoning professional networks
Peter Cochrane Peter Cochrane's Blog: Uneconomics We must move away from short-termism to prevent next economic crisis