Juan Sequeda and Daniel Miranker launched Capsenta, a start-up based on their research at the University of Texas at Austin which was recently acquired by data.world. Photo credit - Vivian Abagiu.

Juan Sequeda and Daniel Miranker launched Capsenta, a start-up based on their research at the University of Texas at Austin which was recently acquired by data.world. Photo credit - Vivian Abagiu.

NEWS From the College of Natural Sciences by  Marc G Airhart

It was 2006 when Juan Sequeda (BS '08, PhD '15), then a new UT Austin computer science transfer student, saw a fellow undergraduate drop a bunch of papers on the floor. When he bent over to help pick up the papers, he was surprised to see that they were research articles about an obscure subfield in computer science that Sequeda himself had recently become obsessed with: the Semantic Web.

The term referred to a proposed system of organizing and tagging all the information on the Internet so that autonomous agents could find and make sense of it. Thanks in part to that chance encounter in a University of Texas at Austin classroom, Sequeda and his advisor, professor Daniel Miranker ultimately were able to invent a powerful new way to transform, if not all the Internet's information, at least some key data stored in different ways in different places, into a form that made it vastly easier to analyze.

And that in turn led to the launch of a start-up based on University of Texas at Austin research—Capsenta, a computer science spin-out recently acquired by another company. Although roughly 9 of 10 startups fail, Capsenta flourished. In June, data.world acquired it and named Sequeda its principal scientist and Miranker an adviser.

Juan Sequeda (BS '08, PhD '15), principal scientist at data.world. Photo credit - Vivian Abagiu.No Easy Answers

Sequeda first become captivated by the Semantic Web when a visiting professor at his former school, the Universidad de Valle in Cali, Colombia, gave a lecture on it and described how its tools were making searches more powerful. A semantic approach would be able to determine, for example, whether a search for "Paris Hilton" was about a person or a hotel in the city of Paris, and if it was about a hotel, whether it was in Paris, France or Paris, Texas.

"It got me thinking about what the next generation of the web might look like, and I definitely wanted to be part of that," Sequeda said.

Sequeda learned that the student whose papers he helped pick up was doing research with Miranker, whom Sequeda then tracked down, asking to join his team. Miranker said yes—and that he had just the problem to tackle.

At the heart of the problem was helping businesses get answers to simple questions that proved nearly impossible to resolve when information is spread out in multiple databases and recorded differently. A manager may wonder: how much income did we earn from sales last year? But if your Canadian sales team records sales in Canadian dollars in one database, while your U.S. sales team uses American dollars in another, it takes a lot of work to extract the appropriate data, make it consistent and find the answer. Or if, you're a scientist at a pharmaceutical company needing to compare data from different ongoing clinical studies, one database using metric units and another using imperial units becomes a problem.

"Software developers for a Big Data analysis project may spend as much as 90 percent of their time wrangling data and just 10 percent or their time analyzing the data," Miranker said.

Turning Data into Knowledge

Miranker envisioned an automated method for merging databases, unlike what existing—and often cumbersome—data integration software offered. First, it would translate each conventional database into something called a knowledge graph, and second, merge those graphs together into one big, seamless graph. Then, asking questions of the graph should be so intuitive that even business managers could do it.

"Business people are the ones asking the questions, but they can't use their own data," Sequeda said. "They have to go talk to their IT folks a lot and waste a bunch of time. The idea was: What if we could give the business users a view of their own data in a way that made sense to them so they could answer questions on their own?"

In a conventional database, information is stored in tables with columns and rows, kind of like a set of spreadsheets on steroids. In a knowledge graph, on the other hand, each chunk of data is linked to other chunks of data, like pushpins on a bulletin board connected by bits of string, and tagged with information explaining what each chunk of data is all about.

Miranker challenged Sequeda as a junior undergraduate to develop an automated method for translating a conventional database into a knowledge graph, enabling simpler data analytics than a conventional database offered.

Sequeda says he learned two important things working with Miranker.

"Number one, I learned to be comfortable in a sea of ambiguity," he said. "You start out lost, but eventually, you'll figure out what the problem is that needs to be solved. The second thing is how to communicate the problem to others. You can't solve it unless you can define it."

Daniel Miranker, professor of computer science at the University of Texas at Austin. Photo credit - Vivian Abagiu.Wrap It Up

Though he was making progress, as he was finishing his bachelor's degree in 2008, Sequeda had more to do. He told Miranker he wanted to continue researching with him through graduate school—and to start a company based on their research. Miranker responded that a PhD wasn't needed to start a company.

"I said, 'That's true, but I want to be able to work on something that's a problem that's really hard that's never been solved before—and that's what you do in a PhD program,'" Sequeda recalled. "'And I want to make sure that problem is related to industry, because if we solve it, it's going to be a business opportunity.'"

Within a couple of years, he was demonstrating a new piece of software called Ultrawrap, based on the work he did as an undergraduate, which automated the process of integrating relational databases. Instead of moving the data, Ultrawrap creates a virtual wrapper around the databases that allows users to access the information as a graph.

One of its first applications was for Constitute, a project involving Zachary Elkins, a professor in UT Austin's Department of Government. Elkins had been working with colleagues in Illinois and London on a project to gather the text of the world's constitutions in one place for academic research. He asked Miranker for help developing an online portal that would allow anyone to search the constitutions of the world's nations and compare similar passages side by side. With funding from Google, Miranker and Sequeda used Ultrawrap to translate the constitutions into a knowledge graph, bringing the information out of the domain of academics and making it accessible to not only anyone curious about differences in the world's governments and cultures, but also to leaders in countries looking to revise their constitutions.

Before long, the computer scientists were riding the success from Constitute to start Capsenta, a way to commercialize Ultrawrap. Miranker applied for a patent for the technology behind Ultrawrap and, when it was secured, negotiated a license with the university. For the first couple of years, he served as CEO. They were joined by Kyle Cox, former director of the UT Austin Technology Incubator, David Arnold, a consultant in healthcare IT, and later entrepreneur Wayne Heideman. Capsenta raised seed funding from LiveOak Venture Partners and other investors, and launched a series of projects, first in the healthcare industry and later in e-commerce, oil and gas, and elsewhere.

Best of Both Worlds

Earlier this year, when another Austin-based technology company, data.world, acquired Capsenta, it was seizing a chance to merge its own open, cloud-based platform—in which users can add, integrate, share and query data—with Capsenta's Ultrawrap technology, which allows users to keep their data in their own preferred databases, while accessing powerful analytical tools available for knowledge graphs.

"Our technology allows people to work in this graph world without having to move the data," Sequeda said.

Capsenta overcame steep odds to find success in the business world. Miranker, who continues to teach and research at UT Austin, gives Sequeda much of the credit.

"He has an entrepreneurial spark and an irrepressible drive," said Miranker.

For his part, Sequeda is grateful for his time at UT Austin and the mentorship he received from his professor.

"I am a scientist because of Dan," Sequeda said. "This whole career that I started is thanks to his training and mentorship. He is like another father figure."