Economy & Business

Can hunting season affect court dates? The weird world of big data

St. Louis Public Radio | By David Baugher

Published July 30, 2013 at 5:51 PM CDT

The article first appeared in the St. Louis Beacon, July 30, 2013: Collecting court data may seem a dry business but when Andrew Winship talks about the process, it becomes clear that the end product is something far more remarkable than a simple pile of statistics on verdicts and rulings.

“We help lawyers predict the future,” said Winship, co-founder of Juristat, the local analytics company he helped bring to life last year with the idea of using raw data to project the likelihood of a given judge granting a motion or an attorney settling a case.

Juristat is just one of a new breed of enterprises that is rising across the nation driven by what’s known as “big data,” a cross-disciplinary concept that looks to marry the increasing availability of an exponentially growing pool of information with the mass computing power that can crunch all that knowledge into predictions on everything from consumer buying preferences to health-care treatment outcomes.

It is a field that has gained special attention locally with a number of entrepreneurial startups exploring the possible commercial potential. This week, St. Louis will also play host to StampedeCon, a national gathering for big data enthusiasts now in its second year here. The event, which includes a “lightning round” for entrepreneurs to present their technology, is set for today and Wednesday.

“You can tell by talking to the startups,” said Gary Stiehr, founder of the confab. “There is a lot of energy around, and there is a lot interest just with learning about the data and how it can be brought into different organizations. At the same time, there are a number of organizations around St. Louis already using it.”

Court dates and hunting seasons

Most experts like to talk about what the industry refers to as big data’s “three v’s” – volume, variety and velocity. In short, humans are now creating and processing more information from more sources at a faster rate than at any time in history. The difficulty is not so much in collecting that knowledge but in knowing how to use it without drowning in a factual flood of irrelevant trivia. Even more challenging, programmers need predictive models that are able to discern which variables are actually causing an observed phenomenon and which merely mirror its effect.

“To get really competent with that, you need to figure out all the factors affecting the system,” said Winship. “The nice thing about a court system is there is typically a very finite number of actors.”

That may be, but the data often seem anything but finite. Juristat says on its website that it can analyze over 1.5 trillion statistics when examining a given case, allowing it to predict everything from the chances of a verdict being overturned on appeal to finding the best day of the week to argue before a particular judge.

If that latter point makes one do a double take, it really shouldn’t. Odd, seemingly insignificant factors are increasingly thought to play a bigger role in decision making than previously believed. For instance, evidence from an observational study in Israel two years ago found a distinct correlation between what time of day judges decided a parole case and the likelihood of the prisoner obtaining freedom. The jurists were most generous just after a snack break, granting 65 percent of requests. That number fell off throughout the day before jumping back up again – after the next meal.

While Juristat doesn’t mention the effect of hungry judges, it does look at decidedly non-legal factors in cases, including everything from flu outbreaks to Facebook sentiment to the onset of hunting season, which Winship said may have an effect on court scheduling in rural areas.

“We had someone ask us, could we figure out the number of jury members who were on antidepressants,” said Winship who compares the whole process to Sabermetrics, the baseball statistical wizardry made famous in the movie “Moneyball.”

While Winship doesn’t think the data will ever get granular enough to look at juror medications, Juristat’s attention to details beyond the normal parameters of the judicial system does bring up an interesting point about the future of big data. Predicting case outcomes means looking at things outside the courtroom and building algorithms that know how to pull patterns from the chaos.

He thinks making the court system more predictable will also lead to better-written insurance policies and may make settlements more common, freeing up court resources for the cases that really need to be tried.

“You are going to have gains cascading at each step of the process and a significant economic savings for a lot of actors,” he said.

Some of those gains may also lead to additional revenue sources, a thought that leaves many big data companies with a flexible plan for the future. Juristat, for instance, has data on every divorce in the state of New York for the last three decades.

“We’re joking that we could basically make a dating website,” Winship chuckles.

A more personalized tomorrow

Similar ideas about efficiency and better allocation of resources crop up repeatedly in conversations with local big data entrepreneurs. Like Winship, Steve Marciniak, co-founder and CEO of local startup TrakBill, which monitors lawmaking activity for clients, is quick with a great catchphrase for his own company.

“We are the ESPN of legislation,” said Marciniak who helped create the operation last year. “We give you the play-by-play update of everything that’s happening both at the state and federal level of government.”

TrakBill is still mostly in the data-collection phase, but Marciniak foresees a day when it might help predict legislative outcomes to assist interest groups with successful wording of bills or other issues. Clients could include everyone from industry lobbyists to unions to watchdog groups to media outlets.

He also feels big data could create a rebirth of citizen activism as information helps to coalesce people.

“A lot of social groups will form because of the different data that’s available,” he said. "You’ll start to see a lot of different communities pop up around these different big data areas.”

All of that means greater transparency, he said.

“A lot of what goes on in government is behind closed doors and we’re trying to bring a lot of that to light,” said Marciniak. “By providing this data that just about anyone can understand, a lot more people are able to interact with it, know what’s going on and advocate for their cause more effectively.”

In St. Louis, it may also mean creating a hub for big ideas about big data. Food Essentials, a company founded in Australia in 2008, moved to town a year ago after winning an Arch Grant. Anton Xavier, CEO of the enterprise, said the field seems to be burgeoning here amidst the presence of big companies and noteworthy universities.

“There seems to be a lot of educational infrastructure that really suits it, a lot of biotechnology and mathematics,” he said. “That provides the kind of infrastructure you really need to look at big data.”

Food Essentials gathers information regarding food labels to help analyze marketing possibilities as well as customer behavior. Xavier believes big data analytics could create highly personalized shopping adventures for consumers who can get what they want easier and retailers who don’t need to worry about unused inventory sitting on shelves due to misallocation of resources or poor planning.

“This kind of data will help the industry to respond to consumers in a more intelligent way,” he said. “We see that the future of big data predictions for our industry is going to be about making the supply chain more effective and efficient, being able to predict where certain products should be. What that leads to is a client’s customers being treated to a more customized experience.”

Docs and data

If big data can make companies wealthier and customers happier, could it also make everyone healthier as well? Bill Shannon hopes so.

“The shift is going to go from a patient going to the hospital and a doctor saying this is how we treat this patient to the same patient going to the hospital where the IT people say let’s compare this patient with all the others we’ve ever seen and figure out how we treated those other patients to get the best outcome,” said Shannon, a professor of biostatistics at Washington University’s School of Medicine. “There is going to be a real improvement in treatment outcomes.”

Shannon is also founder and managing partner of BioRankings, an enterprise which offers big data analytics. He said companies are increasingly using big data to cut costs, improve operations and even invent new products and revenue streams. That includes hospitals and insurers who can use aggregated health data to find better, more cost-effective methods of treatment.

A constant stream of information allows doctors and institutions to make changes on the fly and immediately see the results, Shannon said. Meanwhile, new cases can be understood in the context of a universe of previous treatments for similar patients thus providing better rates of cure. Like Juristat’s lawyers, doctors can increasingly rely on more than just their own experience and instead depend on a database of collective knowledge.

“Big data allows us to test ideas we want to implement very quickly.”

Also like Juristat, Shannon notes the possibility of cross-correlating information from different non-medical sources. Even weather data or statistics on demographics and location might play a role in medicine.

Lies, damn lies and statistics

Big data are at the heart of controversial National Security Agency intelligence gathering programs that have recently come to light and drawn fire from privacy advocates and civil libertarians. Moreover, there have long been concerns by individuals uncomfortable with the idea of being categorized or having large organizations, public or private, record their actions in areas from credit card purchases to social media posts.

But StampedeCon’s Stiehr notes that big data entrepreneurs aren’t creating information. They are analyzing what’s already there.

“The technology itself is neutral,” he said. “The fact that there is a lot of data out there is something that is enabled by all the services people use, not really by big data itself.”

Stiehr said that privacy laws still apply. In fact, one of StampedeCon’s speakers will be Anthony Martin, Walmart’s chief privacy and information security counsel.

The aggregated nature of big data means the main danger may not be privacy but poor interpretation in unskilled hands.

“There is the old saying about lies, damn lies and statistics,” said Winship. “The fact is that statistics aren’t lies. It’s that most people don’t understand statistics so they are easily fooled by bad ones.”

Shannon agrees. He sees no problem with privacy so long as data security protocols are observed. The threat may be from misinterpretation.

“The biggest problem is, if you don’t have qualified data scientists to talk about the proper ways of analyzing it you can very quickly get in trouble coming up with answers that are wrong,” he said.

Marciniak notes that as long as data are obtained ethically from legitimate sources, it can provide benefits that far outweigh any risks.

“Big data are definitely more good than bad, for sure,” he said. “Anything can be bad if you don’t handle the situation properly.”