The Power of Negative Thinking: Freeing the Dark Data of Science
The new issue of Wired has an essay I wrote on a topic I’ve been mulling for a few months: liberating what I call “dark data” in science, the unpublished, inconclusive, or inconvenient results that too many researchers would rather stick in a drawer than put into the light so that others may learn from their work (and perhaps build on it).
Here’s the gist:
In this data-intensive age, the apparent dead ends could be more important than the breakthroughs. After all, some of today’s most compelling research efforts aren’t one-off studies that eke out statistically significant results, they’re meta-studies — studies of studies — that crunch data from dozens of sources, producing results that are much more likely to be true. What’s more, your dead end may be another scientist’s missing link, the elusive chunk of data they needed. Freeing up dark data could represent one of the biggest boons to research in decades, fueling advances in genetics, neuroscience, and biotech.
The essay mentions one cool project by Google they’re calling (internally) the Palimpsest project, where they offer to store and distribute massive data sets - like in the petabytes - from scientists.
I’ve been mulling over this “free negative results” theme for a couple years, but this summer I had two seperate conversations with Pat Brown and Michael Eisen, two of the three founders of PLoS. Both of them grokked the idea immediately - turns out it’s the impulse that led to the creation of PLoS (’open access’ being just one step along the way. But since the more ambitious, less practical notion of freeing all dark data hasn’t really been stated as an explicit call to arms, I figured I may as well try to give it a shot.
Published by: tgoetz on September 25th, 2007 | Filed under Trends, databases
Comment now »
Can DNA Break the Asbestos Backlog?
One of the most egregious injuries to public health of the 20th century - one that could’ve largely been avoided - was the widespread use of asbestos in industry, manufacturing, and construction. Though a great insulator and flame-retardant, the microscopic asbestos fibers also wreak havoc when inhaled - causing all sorts of lung complications, the worst being “mesothelioma“, the cancer caused by asbestos exposure. Even after the first cases of “asbestos lung” became well known (the first court case was in 1929), it was still kept in widespread use.
Though most products were removed from the market by the 1980s, a stunning swath of the population had had some exposure, a legacy that keeps on ticking, with some 10,000 people still dying from the legacy of asbestos exposure annually. It has likewise spread into the legal system. In the past three decades, $70 billion has been spent on asbestos litigation (Jim Suroweicki had a nifty New Yorker column on the asbestos industry last year, from which some of these facts are drawn). Asbestos claims clog our court systems - there have been more than 6,000 defendants and 600,000 claimants; in 2001 Supreme Court justice David Souter described asbestos litigation as “an elephantine mass” (the onslaught has resulted, not surprisingly, in all sorts of lobby groups and organizations aiming to eliminate the backlog. Another curious side effect: Google charges top dollar for ads to run alongside searches for “asbestos litigation” and similar terms because there’s such demand from lawyers). The cases are hard fought, and drag on for weeks. Last time I was called for jury duty I was considered for a civil case based on asbestos litigation; the judge estimated that it would take three to six weeks to hear all the arguments.
One of the reasons asbestos litigation is such a burden on the court system is that it’s difficult to prove causality for many cases. Though some conditions, such as mesothelioma, are strongly correlated with asbestos exposure, many claimants are bringing lawsuits based on less egregious illnesses and short-term exposures. In the case I almost sat on the jury for, the claim was for emphezema - and the plaintiff’s lawyer slyly asked potential jurors if the fact that the plaintiff was a lifelong smoker, as well as someone exposed to asbestos, would affect our ability to judge the origin of his disease. Multiply that times 10,000 and you have a huge, sclerotic burden on the court system, and the country.
There have been efforts at cleaning up the mess with legislation. The “Fairness in Asbestos Injury Resolution Act” would set up at $140 billion trust fund for asbestos claimants, but it hasn’t gotten out of Congress (the American Public Health Association, among others, opposes the bill).
And now there may be a solution through technology. A DNA test called msds1 promises to offer a clear, declarative read on causality for injuries from exposure to chemicals and substances such as asbestos. If it catches on, it could be a great impetus towards answering the causality question in so many asbestos suits (and it could give it’s inventors a nice slice of the massive litigation expenditures).
The test, devised by University of Illinois Bruce Gillis at his Cytokine Institute, measures gene expression, matching a gene up to specific chemical signatures across 36,000 parameters. The company claims it can provide “99.9% certainty if a person was injuriously exposed to a particular toxin.” It’s hard to suss out exactly how the test works, but it sounds like it works by comparing tissue from a diseased person claiming causal exposure to tissue from a healthy person. More specifically, someone claiming injury provides a tissue sample - and then that A sample is compared to a B sample from a healthy person that has been exposed to specific chemicals. If there’s a DNA match in gene expression, odds are the DNA change is due to the substance in question, and the injury claim is valid. The test costs a little over $6,000, according to a National Underwriter story.
Earlier this month, the Los Angeles-base Institute opened an office in Boston - a hub for asbestos related litigation. The test has already been used in 20 California cases, and I expect it’ll start showing up in East Coast courtrooms now, too.
(Curiously, the UK press is all over this - The Times of London, the Independent, and the BBC all have stories. Some publicist must’ve been working the field over there).
Published by: tgoetz on September 18th, 2007 | Filed under Law, cancer, DNA
Comment now »
Brits Would Sooner Die Than Exercise
I love this story: a survey says that even the threat of death won’t get a majority of Britons to exercise and maintain their health. Just “38% of people questioned by YouGov said they would do more exercise if their life depended on it,” according to the BBC story. Says Mike Knapton, director of prevention and care at the British Heart Foundation: “For many people, exercise has become an ugly word, something to avoid at all costs.”
Wow. And people call Americans lazy. Reminds me of Idiocracy, the 1/3 great movie by Mike Judge that, tragically, nobody saw.
Published by: tgoetz on September 18th, 2007 | Filed under obesity
Comment now »
How Good is Public Health Research?
Yesterday morning I had one of those experiences journalists dread: I opened up the Sunday New York Times and there on the front page of the Times Magazine was a story I’ve been kicking around. Titled “Do We Really Know What Makes Us Healthy?”, the story by Gary Taubes is a thorough look at how so much scientific research in the name of public health gives us dodgy, even incorrect, results. Told through the vehicle of hormone replacement therapy, Taubes’ story is really a discerning look at the purpose and limits of epidemiology - which is our best way of establishing what sort of behaviors and interventions might be good for us, and which might be bad.
One thing I was expecting Taubes to mention was the killer study by John Ioannidis in the August 2005 issue of PLoS Medicine: “Why Most Published Research Findings Are False”. This erudite bit of statistical analysis pointed out that one-off results are often wrong - in fact, are more likely to be false than true. One of the most downloaded papers PLoS has put out, it’s the sort of clear, counter-intuitive and declarative paper that I wish was characteristic of more scientific research. I expect Taubes didn’t get into it because it’s full of talk of positive-predictive-vaules and power and bias. But still, if I had written the story - which I swear I had a version of on my to-do list - Ioannidis would’ve been on the source list.
Published by: tgoetz on September 17th, 2007 | Filed under Media, Epidemiology
1 Comment »
Is Disease Resistance Genetic? Of Course, But…
Yesterday a story caught my eye titled “Disease Resistance May Be Genetic” that heralded a breakthrough study in Evolution about the inheritance characteristics of genes conferring resistance to infectious disease. It sounded like a perfect link for Epidemix (even though, curiously, the study actually came out in the June issue). But as I began to write it up, I kept stumbling over the headline. Didn’t we already know that disease resistance may be genetic? I’m no geneticist, nor an infectious disease expert, but I did know that sickle cell anemia - which confers resistance to malaria - and other resistant traits were genetic and thus inherited conditions. And since the journal is not open access, I could only check out the journal’s press release and the abstract, which didn’t clarify matters at all.
So I dropped a note to the study’s author, Paul Schliekelman, a statistician at the University of Georgia. And I’m very glad I did. He kindly explained the actual import of his work and it’s very cool. Here’s the thread:
EPIDEMIX: Coverage of your research gets the headline: “Disease Resistance May Be Genetic” - but didn’t we already know that (sickle cell anemia, for instance)? Am I missing something, or is there a greater significance here?
SCHLIEKELMAN: You are right, this was already well known and not at all the point of the paper. I will give a try at a simple (if not short) explanation:
1) Disease resistance is well known to have a strong genetic component.
2) Suppose that a new mutation appears in a population that gives resistance to some potentially fatal infectious disease. Initially it will be rare in the population. The question that we are interested in is how quickly will it spread through the population. Understanding this will help us understand the evolutionary history of disease resistance genes such as the CCR5 locus that confers resistance to HIV.
3) Although initially rare in the population overall, it will be common within some families. This is because if a parent is carrying a copy then his/her childen will each have a 1/2 chance of also carrying it. Most of the time, if you see one copy in a family you will see others.
4) We have a strong tendency to catch infectious diseases from our family members (if you have children you will know this personally). Therefore, if I have a gene that makes me less likely to become infected then it is beneficial to my entire family. This is because if I am less likely to get sick then my family members are less likely to catch something from me. Likewise, if any of them have the same gene, it makes me less likely to get sick.
5) Therefore, the resistance genes in a family all boost each other and the intensity of natural selection in their favor is thereby magnified. Kin selection is the name for the general phenomenon of genes in one individual benefitting genes in his/her relatives (hence the title of the paper: “Kin Selection and Evolution of Infectious Disease Resistance”).
In the paper I used a mathematical model to explore this effect and determine how important it is. The answer, as often is the case, is “it depends”. The effect can be quite dramatic, increasing the strengh of selection by as much as two to three times. However, in other situations the effect is fairly small (e.g. 10-20%). The difference results from what you assume about how the resistance gene affects transmission probability. I would be happy to explain this further if you are interested. However, my answer is already rather long-winded so I will cut it off here.
EPIDEMIX: Is it right that one takeaway is that resistance to one disease is good for a family’s entire genepool - that is, by being resistant there’s a greater likelihood the entire family-line will flourish?
SCHLIEKELMAN: I would put it like this: When the resistance gene is rare then resistance will tend to be clustered in families. This means that to some degree families become the unit of natural selection rather than individuals. All of the genes in the family receive a selective boost rather than just the genes in an individual.
So that’s much more clear. The network effects of immunity are particularly intriguing to me. And I love the multidisciplinary nature of the work: This is a statistician doing genetics with public health import. And here’s the capper: Schliekelman got the idea for the study after he kept catching stomach flu from his young daughters.
Published by: tgoetz on September 7th, 2007 | Filed under Disease, Genetics, Infection
5 Comments »
The Popcorn Problem
The Pump Handle has a scathing and rather startling rundown on the risk of microwave popcorn, specifically a butter-flavoring chemical called diacetyl, and federal regulators’ failure to act to protect consumer health. The anecdotes in the comments are especially disconcerting. Makes me glad we’ve never owned a microwave (until last month). A great testament to the power of informed blogging.
Published by: tgoetz on September 6th, 2007 | Filed under Misc.
Comment now »
Aliens and the Venter/Watson Genomes
All the Venter hubub today reminds me of a scenario that Oliver Morton offered at SciFoo:
Imagine the human race is wiped out. Aliens come down from the sky, and want to see who these humans were. All they’d have to go on would be the DNA of Craig Venter and James Watson. So at best, they could come up with a floc of Venter and Watson clones.
Oy.
Published by: tgoetz on September 4th, 2007 | Filed under Genome
2 Comments »
Bipolar Syndrome: When Is a Disease Real?
Much frenzy about the rapid increase in diagnoses for bipolar syndrome in children. A new study in the Archives of General Psychiatry shows that there’s been a 40-fold increase in diagnoses over the past decade or so, with now fully 1 percent of all children being labeled “bipolar.” According to the DSM IV, the textbook for mental health and disease classification, bipolar disorder is a mood disorder characterized by manic episodes and major depressive episodes. The diagnosis is controversial because of familiar riddle of epidemiology: Either the illness was underdiagnosed for decades, and we are now identifying a previously-hidden epidemic. Or the illness is being overdiagnosed, as a condition already of flimsy status in adult populations gets extended into children.
As it happens, I was speaking last week with a prominent psychiatrist who was involved in the crafting of the DSM IV. First characterized bipolar disorder in children as one of the two “major issues”* in the psychiatric community and in the crafting the new edition of the diagnostic manual. The issue is how applicable is the disorder to children, in whom mental health can be far more variable and transitory.
The big takeaway from my conversation with this psychiatrist was what he described as the “major problem” with the popular perception of the DSM. “People forget that these aren’t real diseases,” he said. “These are man made diseases. We’re just describing symptoms, we don’t really know what’s going on inside.”
This, of course, is a major difference between mental health diagnoses - which are efforts to describe symptoms as we see them - and pathological diagnoses - which are attempts to discern molecular and cellular processes inside the body. I wish more effort was made to make this point clear, among the menta-health community as well as among the media. This New York Times story does a nice job of referring to bipolar disorder as a “label”, but it’s surprising to me that there’s not more of an effort to explain the relativistic nature of this label (particularly for an article that’s really about the mushiness of mental health diagnoses). And as medicine starts to adopt this practice of diseases more conceptual than physical, I think this distinction must be maintained.
Interestingly, there’s one major exception to the “man-made” nature of the DSM - sleep disorders, which can be understood and, more importantly, measured in physical terms
* The other being a sort-of flip side illness, Adult Attention Deficit Disorder, which represents a childhood condition being extended to adults.
Published by: tgoetz on September 4th, 2007 | Filed under Disease, Media, mental health
Comment now »
James Watson’s Genome, Annotated
One of the coolest things from SciFoo was a presentation by Lincoln Stein of Colds Springs Harbor Lab. Lincoln is an intriguing guy, a computational biologist who is extremely handy with code.
Stein’s presentation was titled something like: “Genomic Voyeurism”. He had spent the previous week (just one week!) whipping up a little website called the James Watson Genome Browser. Stein matched Watson’s genome, which was made public on May 31, to the reference genome from the Human Genome Project. He also input data from the Human HapMap Project, which indicates the position of common polymorphisms in the human genome, and OMIM associations that list so common genomic variants associated with disease risk.
The result is a Genotype Viewer that lets you scan Watson’s sequence to see where his genome differs from the “reference” sequence, as well as to view some genes and potential disease associations. As Stein introduced the Browser, he made the point that this is almost entirely an academic scan, not a clinical one - at nearly 80, Watson has pretty much had happen whatever’s gonna happen to him. That is, whatever risks he might have, he’s clearly beat the odds.
The browser is not an easy thing to navigate - most of the information in it is over my head, and scanning page by page or mousing over the popups, it’s a bit difficult to make sense of what exactly the genome is telling you. But there’s a nifty search function that lets you, say, enter a gene with a known association (Stein suggests trying “HTR2A”) and seeing whether Watson has said gene.
What’s amazing about this is that 1) Stein did this in a week and that 2) it’s the first time I’ve seen a hint at what information lies inside a genome on a practical level. As I say, it’s a mite impenetrable for the non-geneticist, but it does open the window onto all of our genetic futures.
[Duncan Hull has a thorough play by play of Stein’s talk here.]
Published by: tgoetz on August 17th, 2007 | Filed under Genome
1 Comment »
Friday Fanboy: Look Over to Dover
I’ve long been a fan of Dover Books, the no-frills publisher of reissued and public-domain texts. The books are typically high quality - each book has an endpage with this text:
A DOVER EDITION DESIGNED FOR YEARS OF USE
We have made every effort to make this the best book possible. Our paper is opaque, with minimal show-through; it will not discolor or become brittle with age. Pages are sewn in signatures, in the method traditionally used for the best books, and will not drop out, as often happens with paperbacks held together with glue. Books open flat for easy reference. The binding will not crack or split. This is a permanent book.
What a wonderful ethos.
What I especially appreciate is how any time I want to go deep into the history of a subject, there’s bound to be a trove of Dovers that will catch me up. So when I indulged in New York City architectural history, Dover has terrific books on NY bridges, the 1939 World’s Fair, and a book on the city’s architectural holdouts (the recalcitrant landowners who refuse to sell to the big developer, and thus get built around with remarkable consequences - see some examples here).
What’s this have to do with public health? Well, needless to say, their catalog of books on medicine and public health is a joy to peruse. They have John Hooke’s Micrographia, the 1665 work that helped establish the microscope as a tool for medicine. There’s Florence Nightingale’s Notes on Nursing. Malthus’s Essay on the Principle of Population. Henry Mayhew’s great study of the London Underworld, circa 1840.
If you have a jones for history and esoteric knowledge, you could do worse than to buy a couple boxes of Dover books and spend a Sunday flipping through them.