Hi welcome. Thank you for joining us today for our continuation of the Data Intelligence webcast series. My name is Lance Dickerson. I am North America field marketing manager for Erwin by Quest. I am your host for our session today.
I'd like to introduce our presenter for today, Susan Laine. Susan is a sales director here at Erwin by Quest and she's also a DI thought leader. Susan, you can take it away.
Thank you, Lance. I'm excited to be here today. We have a full agenda for you all regarding data lineage, data quality, serving up the needs for data intelligence. And in this session, I'm actually going to walk through some diagrams on lineage and talk about how we can interpret that as well as how are our clients using data lineage in their initiatives. What are the top initiatives that we see data the data lineage appearing in? And then I'm going to end on what are some of the great new things that you can expect from the Erwin by Quest Data Intelligence Solution coming forward in the end of the year as well as the beginning of next year.
So full agenda today. Really excited to talk to you about data lineage. Slide. I think Dataversity says it best when they explain data to be the story of your organization. It's really the underpinnings of your decision-making. It reveals the five W's of your data.
So who made that data? What did they do with the data? Where is it sourced from? When was it created? Why was it created? And how is it being manipulated? And how is it being integrated across your lines of business and throughout the organization?
Data lineage is really providing you with the facts and the context about your data. So it's making your data and all aspects thereof explainable. From both a technical and from a business perspective, it authenticates your source and how it moves throughout your different environments. It ensures that it is aligned to your business rules and policies and therefore making it accurate and consistently accurate from a business context.
When you blend in data quality, it points to the accuracy of the data itself. It shows you where the good and the bad data is propagating. And it really helps each department be a little bit more accountable for sharing good data across the different business lines. The value of data lineage, our clients are finding the value of data lineage through the reliability-- so ensuring that data is trusted and you're able to use it. Because you can quickly validate everything that happened to the data.
You're also getting to root cause analysis much more quicker than you ever have before because you can follow that back and find out exactly where that source of contention is. Maybe that technical rule didn't meet the business rule requirements. Maybe there's just a whole lot of complexity around certain pieces of the lineage. But it helps you indicate exactly where you need to go to do some profiling and where you need to do some forensics around that data.
Data observability-- being able to watch these pipelines and subscribe to them on a regular basis and know exactly when it changed and what is impacted by that change is also a key indicator for value and usage of that data. And then finally, project-scoping-- so when you're embarking on your latest and greatest data initiative, you know exactly what it's hitting down the line and all the impacts from a business and a technical perspective.
And I love how Steward Bonds compares this to what does it take to trust a new person. It takes understanding that new person, knowing that new person, knowing where that new person is coming from and what type of values they share. And are they coming from the same world that you are? Are you on the same plane about different things? What are the rules that they govern themselves by?
However, we might not have that sort of time when it comes to data. We need this for our data as well. And we need it faster. We need to interpret and understand that data and get to data trust as quickly as we can.
From a customer perspective, organizations are relying on the data across the organization. And every major initiative that's going on, they're looking for more structured collaboration across these different teams and projects. And they need more guidance and best practices to really achieve efficiencies and end to end and build more of an operational flow so that they can improve on the reliability on the re-usability and the repeatability of data. And they want to continually and try to reduce the redundancy and the data bloat.
And they need to do this in a more structured, automated way. So it's really all about automation and operationalizing the data practices across these initiatives. There's a lot of complexity working into these initiatives because of the existing applications and technologies that are underpinning each one of these initiatives. The engineering community is becoming frustrated with the lack of data literacy across all these different environments when you have legacy systems and modern environments that they need to deal with. there's a lot of hand-offs across these different initiatives and applications. And there's a lot of reliance on subject matter expertise.
And this really can significantly delay the projects. There's a real barrier to success when we need to and we need to start harmonizing the data and the processes to collaborate on this information across these different teams. So why is producing data lineage difficult?
So it's kind of been a hot potato inside of organizations as far as who owns the data lineage and the putting the data lineage together. And folks are frustrated with how long it takes. So some of the barriers to this challenge include a lack of support in these different hybrid environments. And if you have a solution for a data lineage that's really tied to a particular ETL, you're going to fall short.
If your lineage tool only works in the legacy environment or it only works in the new modern big data lake environments, you're going to fall short. So you really need something that you can continue to produce end-to-end data lineage from so that you can read across these different environments.
A lot of clients are not releasing the information or the data lineage until it's 100% automated. And a good data lineage tool is going to allow you to plug the gaps or stitch the gaps. When they meet small challenges along the way. So you can get to the end result quicker and really look at that final outcome of where did that data source from. And how did it move across these different organizations even if you have to stitch a gap, if you wait for 100% automation, you might be waiting for six to nine months before you can produce anything and get to the big picture.
Lack of standards-- so when you're crossing all these different environments, the standards change from development team to development team. And there's not a whole lot of best practices that really bridge those teams together to provide that sort of collaboration. So you need a tool to really help you parse and understand each one of these different environments.
And what I feel is the number one reason on why it's so difficult and maybe why you're not getting value from the lineage that's produced is because a lot of clients start doing data lineage for the sake of data lineage-- so not tying it to a business outcome. Not tying it to really understanding how lineage is going to help you with that initiative and just doing it so that you can read the data inside of the data lineage and plug every hole and make it automated, I think, is a real barrier to success for most of our clients.
So let's start talking about some of those use cases as well as how we're digesting the lineage when we're looking at it. So there's two real views that at least Erwin by Quest provides our clients. And one view is a controlled business asset mind map. It's a knowledge graph that allows the business user or a business analyst or even a data engineer to come in, select their asset and build it out according to how they view the world. They might view the world in the context of business rules and processes before they get to the actual data lineage. But it allows them to decide how to maneuver and create that view themselves from a business perspective.
And the second way is end-to-end data lineage that's showing you exactly how that information is traversing across the data landscape. So that business asset controlled mind map looks like this. You're starting in the center there on that business term. And you can see that it's PI information. So you know that information is attached to some business policies around your GDPR processes. So you can start to expand up into the left on your business process and see exactly which GDPR processes are impacted.
And then you can expand on the business terms themselves. And on those business terms, you can see exactly which ones are restricted by that business process and your GDPR processes. And then when you feel like it, you can start to expand it to the actual columns and understand exactly which columns are restricted by that right-to-be-forgotten policy or that master data policy and understand what those restrictions and controls are around your customer data.
So clients are usually-- this is one of their wild factors is to be able to have this blend of business processes, terms, and policies and then expand to those columns when they want to. So you're not quite at data lineage at this point. You're still in that knowledge graph around your business assets.
On this portion of the controlled mind map where we're looking at it in a little more structured view-- so again, we have that customer PI information and on the right-hand side, we see those business terms. We see exactly what store, what glossaries they're stored in, what context is around those business terms, and which ones are restricted by that governance process.
We also see what technical environments they're in, what KPIs they are supporting, and what policies that customer PI information is showing up. And when you're ready and you're done digesting the right-hand side, you can bring in the left-hand side. And now you can start to get to that whole technical aspect of what columns, what files, what schemas, what databases are supporting the data from that particular policy and that particular PI information. So it's just a slow roll on how you can start to understand everything about that PI information from a high-level structured view.
So let's talk about now diving into the actual data lineage itself. On the first click, we're going to be able to see exactly how that data was authenticated. So you're getting into where is the source of the information coming from? So when you hit click on this one-- Lance, there we go-- you can start to see that your pipeline is coming from a trusted source or not.
The next thing that we're seeing on this particular pipeline is that there's some third-party data in here. So we're actually traversing through some third-party data. And you can start to ask yourself how are we using this third-party data? Are we capturing the privacy information from here? Are we using the third-party information according to the data policy?
Next, you're going to see how the data is actually changing. Because you're bringing in the actual ETL transformations. And you can see what aggregations, what calculations have been applied. And you can actually see what SQL-- because it's being parsed by the system behind the scenes, what SQL, what technical rules are moving the information from that north winds to the adventure works environment? And then finally, over here on the right-hand side, when you get a lot of arrows going every which way, that's your indication that you have some complexity here and that you need to go in and really understand what's going on with that data. And you might want to apply your forensics at this point.
I love to talk track now to around the fact that as data moves out to the edge-- Stuart Bonds is big on this, the industry analyst from IDC-- he's telling us, and I think a lot of people are also taking this to heart, that as data is moving out to the edge, it tends to get more distorted and you know that that's a place that's super important to ensure that the data that you're getting on that final report is well-understood and well-defined and you know exactly what's happening to the data. And that distortion can come as you're getting further and further away from the source of the information, which is another reason why you need a data lineage tool that can get you from end to end to really pick up on what is going on with this data. Next slide.
So here is another great example with that distortion on the edge. So let's zero in. What do we see here? We're seeing that column name is actually changing from system to system last name to first name to both names. And you're starting to see that it went from both names back to the same name again. So do we have the right integration inside of these hops? And again, not having this view down to the element level could really trip you up when it comes to validating that you're using the correct ETL that maps to your critical business policies and regulations.
And you're going to have delays if you can't see this picture. Because you're going to have to go to the individual SMEs for each one of these environments. They need to help you interpret it, what that actually means and then align it somehow to your business context. And being able to proactively monitor this pipeline is also really important so that you any sort of impacts. Whether it's a business impact or a technical impact, what is that going to break down the line when you make these changes? And does it all still make sense when you go from system to system?
Let's talk about supporting data lineage for observability. We partnered with DQ Labs. They are completely embedded into our data quality and our data lineage initiative inside of Erwin By Quest for DI.
DQ Labs talks about the convergence of three different categories-- the bringing together of data discovery, data quality, and data observability. Here in this picture, you can see where blending the data quality results into the lineage, what it really looks like. I can immediately see where the bad data is propagating. And I have the bad data circled there because we're actually showing you the red, yellow, green embedded inside of the lineage of where data that has not met the threshold of integrity is impacting, the data all the way down to your reports.
And with Erwin DQ Labs approach, we can monitor this pipeline for data drift. And we can start to capture trends inside of the actual ETL code itself. And there's alerts that when this data exceeds these thresholds or becomes an outlier, it's going to inform you. It's going to let you know that, hey, you need to go take a look and take a pulse on this data and probably profile this data to find out exactly what's going on to it and how it's changed.
So we're using AI to auto-curate this data and as well as auto-remediate the data itself. If you want to allow for it. So it is going to start to say, I see a pattern. And you need to go in and curate this data. Here's some suggestions that I have to curate this data.
It's also going to save what we feel are the fixes to this data to a file that you can apply on your database should you chose choose to do so to remediate the data and the findings that we've found over time here. So what exactly is the use case here? At the end of the day, we're observing your business processes through the data that's produced from and within that process.
We're providing you with a validity check on these reports. So on a regular basis you can start to trust it. We check all the quality measures-- validity, fit for purpose, completeness, timeliness-- so that you can know and trust this critical data that's coming out of these critical data pipelines. Next slide.
At Erwin by Quest, we take it one step further. If you want to consider your data lineage a model or a logical model and actually apply it to the lineage itself, you can create code from this data lineage. And we have clients doing that today.
Reverse engineering is giving you that picture of exactly what's going on to the data and putting it all together for you from your BI reports all the way back to the source. And I've already spoken to where we're using AI to interpret what's going on and to alert you and to recognize the patterns that's going on inside of the lineage itself.
So let's take a quick look at how you can apply data lineage to some of those projects and programs that we talked about early on. So one of our most popular and traditional use cases is regulatory compliance.
So how can data lineage help? Let's take a data subject request and map it out. We use the Erwin mind map to understand, obviously, the policy and the processes around a particular data element and the request that was received. The lineage is going to tell you where that data is sourced, if it went through any third party, if it's leaving the house and going out to a third party. And we're going to be able to follow that pipeline.
Data quality is going to tell us where the score of the data-- it's going to tell us the score of the data and whether it's met the threshold of integrity and where that threshold of integrity might be minimizing the value of the data as it's going through the entire landscape. It's going to get you to any sort of root cause analysis that needs to be applied to this data and give you the impact analysis that will tell you the whole story of what is being impacted by that bad data. i you have any vulnerabilities to report. Next slide.
Regulatory compliance for banking-- how many of your data engineers would love to have the regulatory processes and procedures in line while they're responding to or managing a regulatory compliance issue? How many privacy officers would like to be audit ready to have these pipelines and these controls already set up around the data so that you can quickly go and understand these impacts?
This visualization and this knowledge graph is really going to get you to this nirvana of understanding exactly what data pipelines are aligned to your regulations. Next slide.
Let's talk about data reliability and that use case itself. So we want to increase your data uptime, if you will. We want you to be able to be alerted to when you have data issues and what data hasn't met that threshold of integrity-- so using those data quality scores on the data inside of the reports themselves. There's an impact score as well.
This impact score is telling you to what degree is it worthwhile to go in and profile or curate this data? How much of an impact will it have on the quality of the data? So it's a separate measurement where you're understanding, do I need to spend time on this particular portion of my data pipeline? We can't monitor everything. But we can monitor where we'll have the most impact on the validity of that data pipeline.
And then finally, a good Erwin Data Intelligence webinar is not going to be remiss about where data modeling fits in. So from our perspective, it's all about accelerating your data intelligence program by bringing in all that or carrying forward all that longstanding data modeling results into your data intelligence application.
So starting from a blank slate is never easy. And we want to leverage our logical modeling as much as we can. So being able to start with having your PI information already identified, your critical data elements that were identified inside of your logical models. Putting them into the glossary so you can see that context right away is really going to accelerate your program.
We're also going to know how your data is classified, the different domains that are supported. You're going to have that highly-structurized, categorized view of your data before you start. And it's also providing you that business context around your data lineage. So most of my clients, when they're doing data lineage after they get through and they have this great data pipeline and they see it all put together, they're extremely happy.
But if it's all in technical names. It still doesn't mean much to your business analyst or even some of your engineers. So we're taking the logical business names and turning them on inside of the lineage itself. So you can read that pipeline. And it's going to increase your data literacy that much more and be that much more effective if you can understand it from a business perspective right away. Next slide.
So love this quote by one of our clients out on our Gartner Peer Review site. So they are definitely emphasizing the fact that they don't have to do these investigations and impact analysis in a manual way anymore. And they're really talking to the power of automation to expose and create the tagging of the sensitive data and aligning it to the data governance policies. They've had some really awesome results there at this health care company Next slide.
And one thing that's really cool to point out is that we have grown over time to have over 120 different connectors, whether it's to the database schemas themselves or the parsing of the information from hop to hop. And out of the box with the core systems, you receive these standard connectors for free. It is core in the environment to provide you with the actual schema metadata connectors to the databases for free. You are only paying for the actual connectors where we have to do the more difficult work of parsing out exactly what's happening to the code as it moves from hop to hop.
Where are we going with our data lineage? We're extremely excited to start to present our intelligent marketplace of data sets. So it's a place where you can actually come and get the data set that you need for your latest and greatest business campaign marketing initiative modernization effort. And you're tracking and understanding those data sets because the scores are rolling up into, hey, this is a well-used, well-curated, well-defined. And the quality is high on this particular data set. And it's allowing you to reuse and shop and find those data sets much quicker than ever before.
So you're aligning your data set to your business need. We're allowing you to actually document what is the reason for this data set? How can I use this data set? Where does it make sense for me to use this data set? And what insights am I getting out of this data set that's going to help me with my next campaign in a proactive way?
You're going to be able to request access to this data set. And we're going to use the lineage to validate that this is the data set that has the most integrity for you. Next slide.
You're also going to be able to see these data sets inside of that knowledge graph, that mind map. So you're going to get the data set definition, the ratings of the data inside of that data set, and any sort of classification tags that are on there as well as the trust scores in the details of that knowledge mind graph.
And finally, I want to just draw your attention to the 2022 Data Catalog Leader by IDC. We are definitely in that leader diagram. And Stuart Bonds can definitely speak to how Quest is really helping our end users from a lineage and impact analysis perspective. We are growing from an AI automation perspective. We have our own automation department that's continuing to develop and create more and more features and functions around our data lineage and extend. And it's really extending our Quest portfolio with some really strong best of breed approaches to data intelligence.
This is a quick view of what is included in our data intelligence portfolio. Everything is based on our data catalog which is providing you all the metadata from 120 different types of sources related to your legacy environments as well as your new big data modern environments. Everything is available to be profiled with the data quality technologies that we've embedded inside of the catalog itself.
And then finally, we're spending a lot of time on end user adoption and how to really collaborate on the actual data itself to increase the literacy of the data. Underneath everything is that support with our connectors for automation that's providing you with those data lineage pipelines. Next slide.
Also, please consider that we have support for business process modeling and of course, the tried and true Erwin Data Models where we're accelerating our data intelligence suite-- so complete integration between all three of these products. And then the catalog, the data quality, and the literacy and lineage of our data intelligence suite is available for you today.
Thank you so much for joining us today. There was a lot to unpack for you. I hope you got a lot out of it. My team is demoing the solution and showcasing it on a regular basis. We would love for you to contact us to really drill in and talk about these features as well as the use cases to help you provide that business value from a data perspective.
All right. Thank you, Susan. Great session. Just a reminder to you all you can also rewatch or watch, if you haven't already, the previous sessions in this series. And then also we're going to look out for the continuation of this series. You'll receive an invite for that here soon. With that, we're done with this session. Thank you for joining. Thank you, Sue. We'll see you guys next time.