Beyond If Beyond If
Q&A May 10, 2022

Beyond data: Mining for more than meaning

A Q&A with members of our data science community

Black woman and mideast woman looking at a hologram of location based data
  • ~ 2.5

    quintillion bytes of data is produced per day currently

  • 181

    181 zettabytes of data in the world by 2025 (Statista)

In a world where data is measured in sums that boggle the mind (many outlets report roughly 2.5 quintillion bytes of data is produced per day!), data scientists have their hands full simply pulling coherence out of the digital sea. But at Jacobs, our experts don’t aspire just to make sense of the stories hidden in the data; they seek to transcend the zeros and ones and contextualize the data to help our clients make strategically sound decisions.

Meet some of the talented individuals who comprise the Jacobs data science community:

Dr. Jennifer Blum - Senior Director Data Scientist, Critical Mission Solutions, Cyber & Intelligence Business Unit

Dr. Jennifer Blum

Can you tell us about a project you’re working on now for Jacobs and its clients?

I'm helping an organization with a “request for information”. They have realized on the cybersecurity side that they're not where they need to be. They want to know what advanced technologies and data science methodologies we can actually use to not just make them more secure, but also more advanced in terms of automation. They’ve told me, "We know we need help, and this is where we think we need it." It was really nice that they were working to address this and pulled in people like myself and my team to see if we can address these things. Right now, we're sharing with them Jacobs’ previous experience in this area and what we can do to help them.

What do you think is the biggest misconception people entertain about data science? And what are the realities?

I think the biggest misconception is when you know you have a vague problem and you bring in a data scientist and just say, "fix it," without any true understanding of what it actually is that you want. People sometimes view data science as a fad or a hot term. They say, "We need a data scientist," but they don't actually have a plan. They want the data scientist to figure out the plan. We can to an extent, but we need some guidance. I've come into multiple scenarios like that. The reality of it is that sometimes people, companies and clients don't know exactly what they want. They want to use machine learning and they throw in predictive analytics, "We want you to do this." Sometimes, it's not data science that they want. Sometimes, they just want something to be plotted or shown and there's no understanding really of what data science actually is. It turns out they probably didn't need a data scientist for that.

On the other hand, if they say something like, "We want our networks to be secure," I can work with that. I can come up with a few ideas. I've done this for some government agency clients. I helped them map their networks and wrote flexible code that allowed them to monitor those networks. It was a machine learning thing. They didn't care that it was machine learning, just that it worked. What I have found is that many times, people need to ask slightly more detailed questions, rather than simply using hot terms.

What do you see as the most significant element of data science development within the next few years, and what is the most overhyped?

One of the more exciting ones is anything to do with smart cities. I actually think it's going to be a game-changer. Any of the automation, any of the algorithms that deal with maintenance and sustainment, I think that's going to be a game-changer in terms of the energy-efficient cities and just improving the world overall. In the next few years it's going to be tremendous, with automation in general, and with artificial intelligence and robots and stuff like that. I know automated cars are trying to break out. I think that they will probably in the next 10 years or so be much more commonplace. I'm really excited to see if that reduces the number of car crashes or improves efficiency, etc.

Regarding what is overhyped, I would say “terms” in general. When people say, "We need artificial intelligence," people automatically think robots for instance, when really what they mean is a lot more basic than that. It's not necessarily one particular genre. It's just the terms themselves are so hot button and people are throwing them around without the terms having anything to do with what they need. Sometimes, you just need an Excel spreadsheet to plot something; they may say, "We need machine learning," but actually they don't need an algorithm to do what they want.

What is the most surprising thing you’ve learned, working in the field of data science?

One of the most pleasantly surprising things is how open people are to giving a data scientist free rein, within reason. I could say, for example, "Hey, there's this new technique I think would be really useful," and I found a lot of people are like, "Yes, if it works, we'll give it a shot." Sometimes, they need a bit more evidence, but I really like having free rein to explore.”

I've also found the data science community is really nice and welcoming. When I first started, I was a research scientist, but at a certain point in my career I was labeled as a data scientist by a government agency I was working for. There were certain gaps in my knowledge, but everyone was just very, very nice. There was a wealth of knowledge, all really well organized. I was very pleasantly surprised. I came from another field where people were for the most part nice, but it could be a bit more cutthroat. I was very pleased that this wasn't the case for the data science culture. I would say those are the two most pleasant surprises.

Michael Brown - Global Technology Lead - Predictive Analytics, People & Places Solutions

Michael Brown

Can you tell us about a project you’ve participated in that you’re especially proud of?

One of the recent projects I really enjoyed working on is the Climate Risk Manager (CRM) solution. It's a cloud-based platform that brings together global climate data, geospatial analysis, and our own Jacobs subject matter experts in the areas of climate science and resiliency. We're able to perform risk and hazard assessments rapidly and visually with compelling reporting tools. Generating climate model data is normally a very complicated space, but our CRM solution makes it much simpler for our clients to use and understand.
I have a background in climate model research from my days in school. It's been really fun to get to do that again, to work with our climate teams and to launch a product in our markets that can help our clients essentially prepare for the uncertainties that they're going to see with an evolving climate in the future.

Can you share an example you’ve seen where a data visualization helped a client make an important strategic decision? What was the story behind it?

One example was for a large federal client that we were working with. We were doing some analysis on the environmental side of remediation around things like vapor intrusion. They had a really large problem where they had assets all over the place and needed to figure out how to prioritize the higher risk areas that they needed to go target to start the whole remediation process. Previously, it was being done tabularly, on a case-by-case basis.

Our team had great algorithms that essentially performed risk assessment and quantified it very clearly to the client. We were able to come in, work with our vapor intrusion experts and take that algorithm, but do it in batch across the entire organization. The output of that we visualized into a very simple dashboard, where the clients got to come in and see which regions had more risk compared to other regions. They even saw specific assets where it was occurring. It was very clear and showed them their top 20 assets that needed to be prioritized, where otherwise it would have been an almost painful, impossible process before.

What do you see as the key to encouraging organizations to invest more in data analytics and data science research and development?

Organizations want to invest more in what returns more value. With data analytics and data science, it's no different. When you're starting out, you really need to do your homework and understand the organization's strategy. What's high value to them? What's the roadmap look like? Where do they feel behind? Where do they identify different growth opportunities?

You've really got to target those higher value opportunities and showcase the value you can bring to the organization very quickly and early. You align those opportunities with lower hanging fruit, in terms of how you're able to execute, how you're confident in the data you have, how you've got business leaders who understand the problem space, and that your team knows the first principles. That's one of the big ways of being strategic as you're starting out, so that you can showcase high value outcomes to the business early on.

The other element of encouraging investment comes through this role of a data liaison to the organization. It can be formal or informal. It's someone who understands the organization's business goals and is able to marry the discussions between business and data. Essentially, the teams are speaking one language and the business is speaking another, so how do we make sure that it all comes together? Otherwise, you won't be able to make that impact with the analytics and insights you have.

Those outcomes can be very complicated and rarely black and white. Having someone who can translate what the data scientist teams have found, and who can make the insights digestible and spoken in the language of the business, means that those insights and essentially the whole outcome of those data science projects are able to reach the audience and the decision makers in a much more cohesive way.

What is the most surprising thing you’ve learned, working in the field of data science?

The role of the data engineer is so important to these projects. The data scientist role is all the rage right now and that's what we tend to hear about, but the data engineer is really the unsung hero.

The data engineer is focused on the data pipeline and how we bring the data together, how we get it cleaned up and ready to be put into the hands of data scientists to work on. A lot of our projects benefit so much from that efficient utilization of data, and not just the analytics of it. The analytics and the insights are great for sure, and they are a big deal, but the data engineers are a critical component. From streamlining the data from collection through quality control, analysis, visualization and into reporting is how we essentially deliver on that data. They touch every aspect of the project lifecycle with how they're influencing the data and the usage on those projects.

“One of the most pleasantly surprising things is how open people are to giving a data scientist free rein, within reason. I could say, for example, "Hey, there's this new technique I think would be really useful," and I found a lot of people are like, "Yes, if it works, we'll give it a shot." Sometimes, they need a bit more evidence, but I really like having free rein to explore.”

Dr. Jennifer Blum

Dr. Jennifer Blum

Jacobs Senior Director Data Scientist, Critical Mission Solutions, Cyber & Intelligence Business Unit

David Morgareidge - Director, Predictive Analytics, People & Places Solutions

David Morgareidge

Can you tell us about a project you’ve participated in that you’re especially proud of?

One that's underway is a clinical operations Digital Twin for the Mayo Clinic at their Phoenix location, focused on the emergency department (ED). This is one of the first true digital twins in healthcare. It's phenomenal that we're getting to do completely cutting-edge work with one of the best healthcare organizations in the world. I don't know anybody who's done this before, and we have a very creative partnership with a software developer who's got the simulation engine that drives the digital twin.

The data sources we are using come from their electronic health record and from their real time location-based service system (which is an indoor GPS system that tracks the location of every patient, every staff member and every piece of equipment in the ED). We're building a unique data store that will blend those two disparate data sources to give a really robust understanding of three years of history about how patients have moved through the ED.

Artificial intelligence (AI) and machine learning (ML) will be used to not only process that historical data, but to also analyze external events occurring in the community that impact the patient demand on the ED. This will be one of the first times that Jacobs is taking advantage of the Azure Cloud to tap into the true AI/ML potentials of our relationship with Microsoft. That's going to be a key part of this project.

What do you see are some of the top benefits that prescriptive data analytics can provide to clients?

Prescriptive analytics, enabled by the digital twin, is what we're doing with Mayo. That means that we're taking real-time data sets and not only forecasting future state performance problems, but also prescribing solutions to them in advance. In a predictive model, solutions are not developed – just warnings.

For Jacobs, it is our relationship with Azure and AWS that will support that evolution. The benefit is that clients will have minute by minute current state visibility and continuous performance optimization, 24 hours a day, seven days a week, 365 days a year. The financial savings and quality improvements are off the charts.

There is a subtilty about digital twins that I should mention. With the more common form of digital twin that addresses equipment, the model actually makes changes to the real world. Pervasive sensor technology is linked by Internet of Things (IoT) infrastructure with AI and ML overlays to drive the twin’s look-ahead capability. When the digital twin senses a potential performance degradation, it makes a change, without human intervention, to the manufacturing equipment, the HVAC system, a lighting system, a glazing envelope system, or whatever, to eliminate or mitigate that degradation. However, in people-based healthcare clinical operations like what we are pioneering with Mayo, we're using what’s called a “digital shadow.” The difference between a twin and a shadow is that for some things that are life-critical, you may not want the machine pulling the trigger on its own. Instead, the shadow displays recommended options to a director or a supervisor, and then a person actually makes the decision to change the clinical operational model. Over time, as healthcare digital twins mature, that caution may ratchet back a bit.

How can one respond fast enough to advancements in data analytics? How do you stay current with the trends?  

It is not easy! The extent and pace of change hit me five years into my first job managing the network infrastructure and all design and engineering automation resources for The Haskell Company. I was preparing for a 60-hour weekend of non-stop effort to convert the network architecture and data storage technology. It occurred to me that if I knew, at that moment, only what I’d known when I was hired, I’d be unable to execute any aspect of the project. And if I were to interview that day for my job, with what I’d known when I joined the firm, I’d never get hired. If you are not an avid continuous learner, anything in the field of data is not for you.

Fortunately, in today’s environment, one generally just needs to commit the time. The quantity and quality of free online resources is incredible – publications, research, webinars, some training, vendor product profiles, etc.  Certification programs with moderate costs can often be completed online (for instance, I did a Lean Six Sigma Black Belt for Healthcare), and from community colleges to four-year institutions, one-year specialization programs are available. In addition to those resources, I also invest between $500 and $1,000 per year in my own library of books.  But the best way is when these resources are blended into your project work. It makes the content real, gives you an incentive, and lets you deliver an above and beyond product that keeps you and the client elated.

What is the most surprising thing you’ve learned, working in the field of data science?

When I put together my first business plan to do this kind of work, I had estimated somewhere between 3% and 5% of my labor required was going to be in data collection. In fact, it's between 20% and 25%; it's been as high as 60%.

Everybody wants to be data-driven, but the fact is many organizations’ data management systems are inadequate, and that can slow down or delay the work of data scientists to deliver data-driven solutions.

I think the answer, and this is what I've seen with most of my clients, is once they see how big the problem of data governance is, they want to fix it. People are trying to get it right, but most organizations are still a long way from it.

“The role of the data engineer is so important to these projects. The data scientist role is all the rage right now and that's what we tend to hear about, but the data engineer is really the unsung hero.”

Michael Brown

Michael Brown

Jacobs Global Technology Lead - Predictive Analytics, People & Places Solutions

About the interviewees

Dr. Jennifer Blum has a Ph.D. in astrophysics and has expertise is in cyber security algorithm development and advanced statistical data science methodologies. She has developed machine learning algorithms and cyber infrastructure for government clients and currently supports the Jacobs Insights team by providing cyber security technologies and business solutions.

Michael Brown provides strategy and oversite to how Jacobs leverages data and analytics on its projects, solutions and services. Michael has built a large community of practice within Jacobs focused on advancing Machine Learning, Data Science, and Data Engineering in Jacobs work and embedding it into all Jacobs offerings.

David Morgareidge has over 40 years’ experience in analyzing and optimizing the operational and financial performance of the built environment through the use of virtual integrated digital design tools. He is the Director of Predictive Analytics at Jacobs, overseeing and directing the application of advanced operational modeling and simulation tools and advanced data analytics.

You might be interested in...