Dr. Adam McKinnon is the People Data and Analytics Lead at Reece Group in Australia. Over the course of his career, he’s focused on driving individual, team, and organizational performance through the development and implementation of tailored analytics frameworks and methodologies.
Martha Curioni is the Associate Director of HR at IQVIA in Italy. She leverages her technical skills and expertise to help businesses solve their problems using data analysis.
In this episode, Adam and Martha talk about the data analysis model presented in the article they co-wrote entitled Automated Assessment of Employee Data Quality using Machine Learning.
[0:00 - 6:07] Introduction
[6:08 - 15:04] What motivated you to start your project?
[15:05 - 27:42] What did you learn from the process?
[27:43 - 34:31] How do we realistically start using this new data quality model?
[34:32 - 38:08] Final Thoughts & Closing
Connect with Dr. Adam McKinnon:
Connect with Martha Curioni:
Connect with Dwight:
Connect with David:
Podcast Manager, Karissa Harris:
Production by Affogato Media
Resources:
Announcer: 0:02
Here's an experiment for you. Take passionate experts in human resource technology. Invite cross industry experts from inside and outside HR. Mix in what's happening in people analytics today. Give them the technology to connect. Hit record for their discussions into a beaker. Mix thoroughly. And voila, you get the HR Data Labs podcast, where we explore the impact of data and analytics to your business. We may get passionate and even irreverent, that count on each episode challenging and enhancing your understanding of the way people data can be used to solve real world problems. Now, here's your host, David Turetsky.
David Turetsky: 0:46
Hello, and welcome to the HR Data Labs podcast. I'm your host, David Turetsky. Like always, we try and find fascinating people inside and outside the world of human resources to bring you the latest on what's happening with HR data analytics and HR processes. Today, we have with us two phenomenal thinkers, two brilliant people who woke me up and definitely got me intrigued when I saw a post on LinkedIn that they co authored, and it wasn't that they co authored it on LinkedIn. It was brilliant. And we're gonna get to that topic in a minute. But I want to introduce Martha Curioni. She's the associate director of HR at IQVIA. And Dr. Adam McKinnon, who's the people analytics leader for the Reece Group. Hello, Martha. And hello, Adam.
Dr. Adam McKinnon: 1:30
Hey, hello.
David Turetsky: 1:31
How are both of you?
Dr. Adam McKinnon: 1:33
Warming up! It's 7am in Australia. Coffee is much needed.
David Turetsky: 1:38
And what time is it, Martha, in Italy?
Martha Curioni: 1:40
It is 11. And I am definitely at the end of a very long day. But I'm still looking forward to this.
David Turetsky: 1:47
Well, we're gonna take it easy on you. But what I first want you to do is after you take your next sip of coffee, tell us a little bit about yourselves and how you got to this place. Martha, why don't we start with you?
Martha Curioni: 1:59
Sure. So let's see, after working about eight years in HR, I or little bit before that I found out I learned about this thing called people analytics, HR analytics, all the various different things that you call it. And I was immediately fascinated. And while I was at Deloitte, I had the opportunity to work on some strategy projects. So whether it was kind of gender related or just general talent strategy for the company, but I didn't have the skills really to do data stuff. And so eventually moved to Italy, where I'm at now. And I left Deloitte and I said, You know what, I'm just gonna learn the data stuff. And so I trained myself as a data scientist. And shortly after I started working at an HR tech startup based out of Australia, Flow of Work, where I was building a model for their talent marketplace. And that's how I met Adam. And now I met IQVIA, in a more traditional HR role, but at a company that really uses data in everything they do. And so it kind of allows me to still be in touch with my data side.
David Turetsky: 3:08
Yeah, awesome. Adam, how about you?
Dr. Adam McKinnon: 3:12
I started my career as a psychologist, and I would say in recent years, I was, throughout my entire career, I was working in a consulting capacity, either as an external consultant or as an internal consultant. And just over time, data became more and more prevalent in the roles. And then over time, I just started morphing into more of a data scientist or a people data scientist, as opposed to a psychologist. But I think having a foot in both camps, just was able to enrich the picture, and develop a little bit of a niche. Having said that, I think people analytics has really taken off in recent years. And I'm very happy to be a part of that, that activity, that movement.
David Turetsky: 3:55
And we're happy to have you as people analytics, thought leaders, because as I said, when we get to the topic, everybody will understand why the HR Data Labs was so fascinated talk to both of you. But before we get there, the one fun thing about you that no one knows, Adam, we're going to start with you.
Dr. Adam McKinnon: 4:13
I would say my career has been a little bit like that of Forrest Gump. In that I've been an army officer. I've been a consultant. I've been a lifeguard. You know, I've collected rubbish. I feel like I've done everything at some point in time.
David Turetsky: 4:29
I thought you were gonna mention that you live your life has been like a box of chocolate.
Dr. Adam McKinnon: 4:33
I probably should have started with that. That's really good. Yeah, it's been a lot.
David Turetsky: 4:38
You've been at the right place at the right time doing lots of different stuff.
Dr. Adam McKinnon: 4:41
Yeah, I'd say one of the things I enjoy most is every New Year's Eve speaking with my wife and reflecting on the year that we've just passed, and looking back thinking, I wouldn't have thought that would happen.
David Turetsky: 4:53
There you go.
Dr. Adam McKinnon: 4:54
Yeah, unpredictable. I like it.
David Turetsky: 4:57
Except for COVID I'm sure, you would have wished anything other than that.
Dr. Adam McKinnon: 5:01
Yes, exactly. Well said.
David Turetsky: 5:03
Martha, how about you?
Martha Curioni: 5:04
So my little fun or interesting fact is that I met my husband, married him, decided to marry him three weeks later, married him shortly thereafter. So we married about a month after meeting. And we've now been married just about nine years. And we have two kids!
David Turetsky: 5:22
Wow. That's great. That's wonderful. That's good news to hear having been unlucky twice. So there you go. That being said, fun things about both of you. Now, let's get into the topic. So today's topic is, as I said, very near and dear to all the hearts of the people who've listened to the HR Data Labs podcast from the very beginning, which is talking about employee data. But the article you wrote was entitled, automated assessment of employee data quality using machine learning. And so it's gonna be fun and fascinating. And we're gonna start now. So our first question is, what motivated you to start this project? What was the spark?
Dr. Adam McKinnon: 6:15
I would say, in two words, necessity and curiosity.
David Turetsky: 6:22
That's three, by the way.
Dr. Adam McKinnon: 6:23
Wow. So you raise that? Technically correct.
David Turetsky: 6:27
Sorry, sorry.
Dr. Adam McKinnon: 6:28
Yeah, thank you, thank you for calling me out on that.
David Turetsky: 6:31
Necessity and curiosity are two phenomenal things. Because you're a scientist, you're in your background, as you know, you try and figure out why people do what they do. Right? So that's, that's awesome. Tell me a little bit more about that.
Dr. Adam McKinnon: 6:44
I'd say the necessity piece is just by virtue of the context in which I'm operating, I had been employed by Reece Group. And it was it was experimental for them, you know, it was an opportunity to build a new capability. And for the last 12 months, I've been on my own in my role, and standing up this new capability, and you get drawn in many different directions as a consequence. And one of the things that I've noticed both where I'm working and where I've worked in other environments is that the quality of the data is so central to what you do. And it can often undermine the quality of the outputs that you provide people. And as a consequence, the conversations that you have with stakeholders, and the actions that you take on the back of those conversations, so being able to improve the data quality, just seems to enrich the entire lifecycle of activity. That said, as a one person operator, you're trying to do everything. And the conversations I've had with Martha was, how do I scale myself in a way that enabled me to do that, but also to do some other fun stuff as well, and make the entire ecosystem of work that much more productive. So it was really about trying to do more with the little that I had available. That was the necessity piece.
Martha Curioni: 8:01
So he finds a mentee who becomes free labor who is willing to code for him.
David Turetsky: 8:07
Isn't that how it always is Martha?
Dr. Adam McKinnon: 8:10
Yeah, that's, that is true.
Martha Curioni: 8:13
Hey, Martha, I have an idea. What do you mean?
David Turetsky: 8:16
Exactly?
Dr. Adam McKinnon: 8:17
Yeah. And that's where the curiosity piece kicked in. You know, we were we were looking at things you know, and Martha and I talked quite regularly. And I think one of the things that characterizes our respective curiosity is the fact that we're, when we're looking further afield, you know, I love what's happening in people analytics. But we also look into other domains, you know, finance, customer relations, marketing, however you want to term it. And in fact, more often than not, I steal ideas quite readily from other domains, domains that are a bit more advanced, and bring them back home into HR and go, How can we apply that?
David Turetsky: 8:56
Of course, because they've been doing it for longer. They've dealt with these issues. And for many of the things that they do they have similar problems, like the transactions aren't meant to be measured. So they've had to deal with these data quality issues for a long time.
Dr. Adam McKinnon: 9:12
Yep, exactly. Yeah. So they're the two motives for me.
David Turetsky: 9:17
So Martha, you got involved with it and you know, Adam sells it to you. And, and, and so you are not necessarily free labor. But But obviously, you guys are working toward a common goal, right? So how does this thing that you're creating, how does it take shape? Where does it go from? Hey, I've got an interesting idea. I've got a problem I have to solve, how does it generate towards this project?
Martha Curioni: 9:44
So I'm in a different role from Adam, right? So Adam's in a position where he's the one that essentially is providing the data. He's in charge of the quality of the data and so forth, both when I was at the HR tech startup as well as in my role. Now I'm more of the, I don't want to say this the end consumer of the data, but I'm taking the data from someone like Adam, and then doing my analysis or my dashboards or my model, or whatever it may be. And so I was in the position of, you know, really looking at the quality of the data and to Adams point, making sure that it's right, because otherwise your model is no good, or your analysis is no good. And so when he pitched the idea, I thought, well, this is actually really interesting. And then being the data geek that I am, I said, I'd love to just dive in and see what this looks like. So that's how it evolved.
Dr. Adam McKinnon: 10:38
Can I make one other point on this topic?
David Turetsky: 10:40
Sure.
Dr. Adam McKinnon: 10:41
And that is that there's just so little conversation about data quality, unfortunately, and it's such a central point, we thought there was an opportunity to have an opinion on a topic that is central, but rarely spoken about.
David Turetsky: 10:57
And I'll say this the nicest way I can, when whenever we talk about these things in the context of people analytics, a lot of times it's a, who cares. So what, you know, the data that is generated is because of it's coming from an HR transaction system, or a payroll transaction system or other things. And they were never intended to be measured, they were never intended to be perfect. In fact, a lot of times, these companies, these technology companies, no offense to them at all, but they actually get paid more for mistakes and errors, because you have to call them up and, you know, you have to rerun payroll, or you have to make a change to payroll, and therefore there's a, there's a, there's an expense that so having dirty data doesn't necessarily seem like a bad thing to them, always. But when I ran a technology shop, doing this people analytics for people, we actually gave them a lot of tools to be able to look at it. And unfortunately, it was kind of a so what, you know, what, what's my motivation? What am I doing here? I know how to clean my data, I just don't have the time to do it. Right? So
Martha Curioni: 12:07
It's also not fun. Nobody, nobody wants to do it! Go out and, you know, audit it, update it. It's very robotic, time consuming.
David Turetsky: 12:17
And that's the point. I think the problem is, it's so time consuming. Like if you were looking at your job architecture, and you said, Well, we've just had three M&A, we've just hired a bunch of people, we've gone grown organically, terribly big. And therefore, that job architecture doesn't fit us anymore. That's expensive, right? We, you know, to clean that thing up? How do we get there? Or trying to change history? Those are expensive. So what do we do even so even if they had the motivation to be able to fix their bad data? It's getting off that dime to do it, that that's the bigger problem. So when you were doing this? Well, you know, before we get to the next question, that was the one thing I want to ask is, you obviously had a thought or a hypothesis when you're doing this, when you were creating this project, what was your hypothesis? What did you go in as good scientists to say, what are we going to find? Or what are we hoping to find when we do this?
Dr. Adam McKinnon: 13:09
I think for me, personally, it was an opportunity to create something that we could automate naturally by virtue of what we've written, but then also, incrementally chip away at that big problem. You know, that we could have this runnings ideally on a weekly or fortnightly basis every two weeks, and just be identifying those most extreme cases of data quality and going, let's just let's just clean that up. And in doing that, we just incrementally get better and better and better. And over time, we start to factor in other processes, like when people come into the company, we tighten that up, make that capture so much better, because everyone's happy during the honeymoon phase to give you good data. It's after the fact which is where we've written our work, where we're trying to clean up the mess. So I suppose that was it, at least from my perspective? Yeah.
Martha Curioni: 14:03
And I think what I would add to that is, it was really important for both of us to keep in mind the end user and think about how can we do this in a way that ultimately we can hand something over to somebody who doesn't understand Python, or R, that they can actually take that and do something with it. And so we always had that in the back of our mind as well. And we weren't sure what the output was going to look like until we started digging in and trying it out.
David Turetsky: 14:31
And so like good data scientists, you document that the hell out of this thing. And you placed it in GitHub so other people can leverage it, which is unbelievably appreciated by all of us. Because, like good data scientists, we want to also be able to take it and use it, test your theories but also be able to add to it hopefully, or at least comment back on it.
Announcer: 14:55
Like what you hear so far? Make sure you never miss a show by clicking subscribe. This podcast is made possible by Salary.com. Now, back to the show.
David Turetsky: 15:05
Our next question is, what did you learn from the process? Well, you're you, you had a stated goal, you had a hypothesis going in, you had a reason for doing it. And they're all very altruistic. But what did you actually learn as you're doing the project?
Martha Curioni: 15:24
Adam's I'm smiling, because the first thing he wants to say is he, when he pitched this project, it'll be a quick, easy thing, we'll put it together, and then, you know, we'll move on to the next thing. And, you know, that was that's not usually the case when you go into these things, I should have known better shame on me. But but so that's probably the main one. But I think for me, as I was really getting into the code, I literally sat there with a notepad because I'm still very much a write things down kind of person. And before I wrote the loop to go through it over and over, I would run it, and then write everything down and run it. And I would look at the visual to see, as an HR person does this make sense? Is the output making sense? Because if it doesn't, it's not going to work. And just over and over and over, before I automated it, it was really interesting, just to look at how it all worked.
Dr. Adam McKinnon: 16:23
I think it was a unique perspective, me doing this project with Martha because this was all done in Python, and I code in R. So I wasn't as au fait with what was happening. And as a consequence, it was one of those projects where I almost felt like a stakeholder for a change. Because we'd have these weekly meetings, and Martha would be walking through the code. And I didn't get it, as well as I would have been written in R, and as a consequence, I was very dependent on the outputs. And there were times where I needed her to dumb it down for me, you know, and that put us in a good position to say, how do we create outputs that our non data scientists can effectively use? Because we're not going to task data scientists to go clean data. Yeah? That's not our that's not our audience. Yes. Yes. So in that regard, it was a very unique experience for me, where I'm usually creating something and trying to anticipate the needs of the consumer, or is this time I felt like the consumer.
David Turetsky: 17:28
And I gotta be honest with you, that's a really incredible result. Because a lot of times, we tend to think that we're, like you said that we are above it. But then we actually have to put ourselves in those user shoes to say, wait a minute, what is this telling us? And how are we actually going to utilize this once we get out of this? And so I applaud you for for being able to think like that. Because a lot of times when we're we're in the midst of a data science issue, especially on people, we don't actually think about the audience as much as we should we go, Oh, my God, look at this really cool algorithm I created. How are you can explain that to somebody? Yeah, that's your job, not mine. So, but you're both data scientists. And so you chose Python, obviously, because Martha, you've got some skill in Python. And R can do a lot of the same things, because I'm an R guy, you know, like, Adam is, but I imagine Python is a better tool for the codification of a reciprocal code, or whatever you call the code that has to keep running over and over again. And it's the word is escaping me. But I mentioned Python's better, better tool, better hammer.
Martha Curioni: 18:38
We've had this conversation quite a bit. And we... another project we did, I did the code in Python, because it was a little bit more efficient, I guess you can say. But we found that the visuals were much better in R as opposed to the options that I had available in Python. So we actually split it up. So I did the first part, converted it, sent it over to him to then do the output. So I think each each language, in my opinion, has their pros and cons. And obviously, they each have their separate libraries, and so forth. And so I think that's the nice thing of partnering with somebody who does code in a different language. Because you can leverage kind of some of you can pass things back and forth and leverage the strengths of each of them.
David Turetsky: 19:27
You mentioned that one of the outputs was it was just text. And if anybody, you know, they need to reference the LinkedIn link that we have at the bottom of the podcast, but I would applaud you for how you created that very simplistic by employee list of the most impactful data element that's missing, or
Martha Curioni: 19:49
needs to be checked. Yeah,
David Turetsky: 19:50
needs to be checked. So I applaud you for that. Because a lot of times, what do we do? We try and make graphs out of the stuff, right? Whether it's a word graph or you know, some kinda chart I applaud you for not doing that.
Martha Curioni: 20:03
Well. And that goes back to how are we going to hand it off, right. And with Python, you can very, very easily save it as a CSV. Sure. And so from there, you just export it as a CSV, somebody opens it in Excel, because everything we do inevitably ends up in Excel no matter where you start. And so that was kind of the motivation for keeping it that simple.
Dr. Adam McKinnon: 20:23
But I think in fairness to the question, we did start with a graph, you know, and we looked at it, and I suppose, I suppose as a, as a consumer this time around, I was thinking to myself, How do I use this? And how do I use this at scale? Imagine we're asking someone to correct 100 records each week, or each month, I don't want to sift through 100 graphs. You know, it just didn't feel plausible. There wasn't enough utility in the graph. And as a consequence, we went back to the drawing board. And when, how do we refine this? How do we make it easier? How do we make it more usable?
David Turetsky: 21:00
Sometimes the simplest is always the answer. And sometimes always, sometimes the simplest is the answer. But in a lot of times, we try and overcomplicate it just because, you know, there are people who love consuming numbers in a graph. And there are some people who just love a table, I happen to be one that likes both. And I know the place of both. And when I looked at your output, I was like, This is awesome, because it actually elegantly shows employee in the row with the impacted and information that's next to it by the element by the field. And it just spoke to me, you know, I'm sure there are other people who would rather be in histogram or something. But but you know, did you get any reaction from people that said, you know, why didn't you do a graph?
Dr. Adam McKinnon: 21:46
I think the, the motivation for the simple output that we created was just that thinking of, how do I give this to somebody who has no knowledge of the process that we've been through? About, we want them to help us with the data quality improvement. And it was just that simplicity of being able to say, this is the record, you know, here's the employee ID, and here are the collective fields that we feel, are indicating that this is an anomaly, that there's a problem with this record, you know, basically, how can I give this to you, David, without you being involved, and just narrow your focus to the point of look at these fields? Just check these fields for me. And I think as a consequence, the type of feedback that you've provided us is consistent with what we've heard from our other network of professionals that we interact with around yeah, I could, I could use this, I could use that. And that was ultimately the goal.
David Turetsky: 22:43
So before we get to the last question, the one last part of question I want to ask in this area is, has anybody ever kind of reached out and said, I'm using it, and I found it either helpful, or I've added to it, or I've changed it in a way.
Martha Curioni: 22:59
I'm gonna do a very shameless, like, name drop here. So I not only do I have the privilege of talking to Adam regularly, I also talk and connect with the head of people analytics for NASA on a regular basis. And I, one of my calls with him. I mentioned this project that I was doing, I said, I'd love you to review the code for me. And he emailed me and said, You'll never guess I just came out of a meeting where they asked me, Can we figure out a way to identify anomalies in our people data? Really, wow.
David Turetsky: 23:35
Just so happens!
Martha Curioni: 23:38
Let me send you the GitHub now, I was going to send it tomorrow. And so he said, he's already he's also an R guy. So apparently, I'm the anomaly in this equation. But he said that he's already copied it, replicated it, and now he's just waiting until he has some good chunk of time to convert it to R to share that code when it's available. And so yeah, so he's hoping to be able to use it,
David Turetsky: 24:07
Then Adam will be able to read it!
Dr. Adam McKinnon: 24:09
Yes, exactly. Exactly.
David Turetsky: 24:12
And me, but but that's fascinating. And yes, you can name drop all you want, it's totally okay. Because, you know, when we're dealing at scale, especially with companies like the size of NASA, I mean, organizations, the size of NASA, they have a ton of problems, because they have a ton of people. And being able to go through all that data must be just unbelievable. So you know, if this is going to be a godsend for companies of all sizes, but mostly companies who have dedicated either people, analytics teams or data scientists, who are able to actually use the code and be able to, you know, identify these issues
Dr. Adam McKinnon: 24:48
Well I think get to the point you're making organizations with scale. You know, there's a real opportunity there and I think, in today's day and age organizations have just reached such a level of complexity, that that's where I think there's this real, really natural marriage between HR and machine learning. Because machine learning is this great opportunity to synthesize this massive complexity and distill it to a way or a to a form that becomes usable, but also elevates, I suppose the the level of work that our HR colleagues can perform, because I can't write an algorithm that does empathy. But my colleagues in HR do that perfectly sure. And by doing some of this work, we get to remove them from this minutia from the weights and elevate their level of operation. I think there's just this natural marriage between people analytics and HR, it just works. And it should work.
David Turetsky: 25:47
But I think the beauty of what you've designed is you're trying to solve a problem that can be dealt with, with machine learning. And we're always trying to sorry, to use a NASA representation. We're always trying to shoot for the stars, and say, why don't we try and solve this problem up here? When, unless you solve this problem down here, which is data quality, you're not getting to that ever, and we always overshoot the mark. By the way, I hope the NASA the new NASA project does not overshoot the mark. Sorry to mention this project. But you know what I'm saying? It's, it's just this is perfect for, for machine learning, enabling us.
Dr. Adam McKinnon: 26:27
Yeah, yeah. And I hope it will also be illustrative to people that we probably don't need another employee churn model, you know, we've done plenty of those. What we need is a little more imagination with the way in which we apply machine learning in the context of HR. And I hope this will maybe inspire people's too strong a word, but open their eyes to the art of what's possible.
David Turetsky: 26:52
Well, you've given them the example. So it's not it mean, opening your eyes is actually just looking at the results, either in GitHub or up from LinkedIn. And I think they're gonna get there. And all they need is either that training that we all have that we all gained, we didn't grow up knowing R or Python. So all they have to do is get that bit of knowledge, and then they should be able to do this for themselves. Right?
Dr. Adam McKinnon: 27:15
I hope so. I hope they improve upon it.
Martha Curioni: 27:18
Yes, please. The GitHub repository is there, please go in and tear me up if you need to, because that's how I will learn as well.
David Turetsky: 27:28
But but but I like to see it as being able to improve, not tear, improve. So the next question is really kind of the key for a lot of us, which is, how do we use this model? Realistically, how do we get started? What's the way in which we would, you know, download this and just get going? Where do people start?
Dr. Adam McKinnon: 28:00
Get in touch with NASA.
Martha Curioni: 28:05
He's gonna kill me. So, you know, I have shared it with some some other kind of colleagues, I guess you can say. And some of the feedback that I received from them was along the lines of Adam, you'll probably have to help me articulate this better. But essentially, along the lines of you do have to have a certain level of data quality before you can use it, right? Because if you are still just a complete mess, and you don't know, you know, where some of the data issues are coming from, and things like that, you have, I guess, lower hanging fruit that this is a little bit still too sophisticated. Once you get to a point where, you know, okay, generally speaking, our data is pretty good. Now, this is where you take that data quality to the next level to say, Okay, now we really want to catch the things that are not correct and catch these anomalies.
David Turetsky: 29:06
So would you say it's, it's starting with the generic employee demographic data first, before you even get to job table or action reason codes or department hierarchy or other things? Is it? Is it really or is it just like the flat employee record, just taking a look at that, once it's gone through that prima facia review to make sure that, you know, the i's are dotted in most of these T's are crossed?
Dr. Adam McKinnon: 29:32
I think you could certainly make that argument. Yeah. I feel to the point Martha was making. This is a, this is a piece of a much broader ecosystem of data collection and curation. You know, I'm a big fan of doing it right up front. When people are in that honeymoon phase, and collecting it well there. I see this as being another step in the process, maybe a more sophisticated step. But then prior to that, and the point that Martha was making is that we need some basic rules around data quality and capture, you know, if somebody's over 100, maybe their age is too high, and we need to check that. And that's a very basic rule. And that's the point Martha was making in terms of the feedback she'd received. Because in the work that we're doing, we're actually training a model. And if you're training it with bad data, you're gonna get bad outcomes. Yes, so if we've got those steps in place prior to the implementation of this work, we're only going to get a better outcome as a consequence.
David Turetsky: 30:32
So you're training on 20%? 80%? 20%? Which part of the data set are you training on? So that you can actually then leverage it to then do the work on the rest of the model? The rest of the data? Sorry,
Martha Curioni: 30:49
You don't necessarily this isn't kind of a split, test and train type of model. Essentially, it reviews all of the data, and would start to find, for example, this person is only 30. But they've been working at this company for 20 years. That doesn't quite make sense, right? Those are some of those types of things that would come up. And so it doesn't. And so that's the to the point Adam was making, you know, if, if all of your data is really bad, and you don't have some basic checks and balances in place, it's going to be really hard for the model to be able to catch some of these inconsistencies. Okay.
David Turetsky: 31:28
So it has to already passed basic QA, before you get to the model and being able to utilize it effectively.
Dr. Adam McKinnon: 31:37
Yeah, ideally, yeah. Yeah.
David Turetsky: 31:39
Okay. So that being said, a lot of the people who are listening go, how do I know where my data is? Because a lot of them are either just starting out an HRIS or they're in HR, they're looking at their data, they know there are holes, they can actually run a demographic report and just go look and pivot table it and say, you know, where are the blanks to start? Where are the inconsistencies? You brought up the age problem? I've seen that many times. I've also seen hire dates, rehire dates, service dates, all inconsistent. No term date. Yeah. So you know, just kind of starting from beginning do they have to just really just go pivot their data first, before they start this?
Dr. Adam McKinnon: 32:21
Look, I think there's some good packages that I know of within the R ecosystem. And I'm sure Martha knows some in the Python domain, where you can just create some basic rules for your data. And then the you can write a script that will go in and just check those rules for you. And then elevate those cases that don't meet the rules. So I think simple checks like that are a great place to start. And then the work that we've done, instead of looking at variables, one at a time, it effectively looks at variables in a multi dimensional sense. So you know, for example, a bit like what Martha was saying earlier, somebody has been with us for 30 years, they're still on the minimum wage, or they've been with us for 30 years and never had a promotion or you know, when you start combining things. They're very simplistic examples. But when you start combining things, and you'd see this from the output in the article, we were essentially showing five or more variables that make a case anomalous. So we're, we're trying to replicate a human looking at something going, this doesn't seem to feel right. When I look at the, the, the overarching picture.
Martha Curioni: 33:34
Yeah, and the model really is meant to catch the things that you're, you even if you do the pivot table, right, you do a pivot table, you're going to know that, you know, these are blanks, you can fill it in that there's somebody over 100, you can change fix that. But you're not going to catch the combination of variables that make it inaccurate. And so that's where the model comes in, to be able to raise that level of sophistication.
David Turetsky: 34:01
A lot of the people that we're talking to will really appreciate this, because what they're gonna do right after they get off this is they're gonna go take their data, dump it to Excel, and start running those checks. And, Adam, what what I'm going to ask you to do, though, is when asked you to help me with those R models that you're talking about for employee demographic data?
Dr. Adam McKinnon: 34:19
Yep. I'll circle back. I think that's fair.
David Turetsky: 34:22
All right, great. So we've talked about why you did the study. We've talked about what you learned during the study, and surprised yourself a little bit. And we've talked about where to get started. Is there anything else that you'd want to bring to light before we close?
Dr. Adam McKinnon: 34:46
What do you think Martha?
Martha Curioni: 34:49
I think we've covered everything.
David Turetsky: 34:51
So is it really tough learning to be a data scientist?
Dr. Adam McKinnon: 34:55
No. No, I think it was really enjoyable. But uh, Bit like my response to the first question being around curiosity. That's the thing that gets me excited about my work. Either when I'm learning to do something new, or I'm finding something new about the organization, that I can go to somebody and say, Hey, did you know this? You know, and just say not only the eye open, but the thought bubble around. Now
Martha Curioni: 35:22
And I think the curiosity piece is key, right? what? So you, you, obviously, you have to be interested in data, or it's probably not the right path to go down. But you know, the curiosity piece, I've heard time and time again, from, you know, people who are data science leaders and so forth, that without that curiosity piece, you're just, you know, I don't mean this in a bad way. But you're, you're an engineer, right? You should go and, you know, develop software or something. But the curiosity because you need to ask the questions, and you get an answer. And you need to ask another question. And you need to just keep digging deeper and deeper, until you get to a point where you say, Okay, wow, this is actually something interesting.
David Turetsky: 36:08
And I love that moment, you talked about the aha moment, Adam. I love that moment where you double clicked, you've double clicked, you've gotten to that, that nugget, you've identified it, and you go, and you show it to your, the people who you know, are going to be curious as well, or that you can explain it to. And they say, Ooh, yeah, that is pretty cool. And their eyes light up, too. And then the reason why I'm going there is because you two have that together. And it's obvious, and you have a chemistry, which makes it really cool that you can bounce that stuff off each other. And so I applaud both of you for not only doing the research and publishing it, and making it known, but also really getting into it, and allowing us to kind of live it with you. So thank you very much.
Dr. Adam McKinnon: 36:54
I appreciate the kind words, but also thank you for the opportunity to publicize the work and get it out there. I appreciate that.
David Turetsky: 37:02
So at the HR Data Labs, we love this stuff. We are data geeks, we live for conversations to talk about data quality, and a lot of the people who who listen to this, who will be listening to this getting to you guys. We want to hear your feedback, because one of the things that we love is we'd love to be able to understand how to do things better, and the ideas from brilliant people like you guys about how to make our lives easier. So from all of the listeners, the HR Data Labs, thank you very much, Martha. And thank you, Adam.
Dr. Adam McKinnon: 37:33
Thank you!
Martha Curioni: 37:34
Thank you for having us!
David Turetsky: 37:36
Our pleasure. Thank you for listening, take care and stay safe.
Announcer: 37:40
That was the HR Data Labs podcast. If you liked the episode, please subscribe. And if you know anyone that might like to hear it, please send it their way. Thank you for joining us this week, and stay tuned for our next episode. Stay safe
In this show we cover topics on Analytics, HR Processes, and Rewards with a focus on getting answers that organizations need by demystifying People Analytics.