Category: evaluation

The right stuff

Well that was unexpected.

When I hit the Publish button on Not our job, I braced myself for a barrage of misunderstanding and its evil twin, misrepresentation.

But it didn’t happen. On the contrary, my peers who contacted me about it were downright agreeable. (A former colleague did politely pose a comment as a disagreement, but I happened to agree with everything she stated.)

I like to think I called a spade a spade: we’re responsible for learning & development; our colleagues are responsible for performance; and if they’re willing to collaborate, we have value to add.

Bar graph showing the impact of your ideas inside your brain much lower than the impact of your ideas when you put them out there.

The post was a thought bubble that finally precipitated after one sunny day, a long time ago, when Shai Desai asked me why I thought evaluation was so underdone by the L&D profession.

My post posited one reason – essentially, the inaccessibility of the data – but there are several other reasons closer to the bone that I think are also worth crystallising.

1. We don’t know how to do it.

I’m a Science grad, so statistical method is in my blood, but most L&D pro’s are not. If they haven’t found their way here via an Education or HR degree, they’ve probably fallen into it from somewhere else à la Richard in The Beach.

Which means they don’t have a grounding in statistics, so concepts such as regression and analysis of variance are alien and intimidating.

Rather than undertake the arduous journey of learning it – or worse, screw it up – we’d rather leave it well alone.

2. We’re too busy to do it.

This is an age old excuse for not doing something, but in an era of furloughs, restructures and budget freezes, it’s all too real.

Given our client’s ever-increasing demand for output, we might be forgiven for prioritising our next deliverable over what we’ve already delivered.

3. We don’t have to do it.

And it’s a two-way street. The client’s ever-increasing demand for output also means they prioritise our next deliverable over what we’ve already delivered.

If they don’t ask for evaluation, it’s tempting to leave it in the shadows.

4. We fear the result.

Even when all the planets align – we can access the data and we’ve got the wherewithal to use it – we may have a sneaking suspicion that the outcome will be undesirable. Either no significant difference will be observed, or worse.

This fear will be exacerbated when we design a horse, but are forced by the vagaries of corporate dynamics to deliver a camel.

A woman conjuring data from a tablet.

The purpose of this post isn’t to comment on the ethics of our profession nor lament the flaws of the corporate construct. After all, it boils down to human nature.

On the contrary, my intention is to expose the business reality for what it is so that we can do something about it.

Previously I’ve shared my idea for a Training Evaluation Officer – an expert in the science of data analysis, armed with the authority to make it happen. The role builds a bridge that connects learning & development with performance, keeping those responsible for each accountable to one another.

I was buoyed by Sue Wetherbee’s comment proposing a similar position:

…a People & Culture (HR) Analyst Business Partner who would be the one to funnel all other information to across all aspects of business input to derive “the story” for those who order it, pay for it and deliver it!

Sue, great minds think alike ;-)

And I was intrigued by Ant Pugh’s Elephant In The Room in which he challenges the assumption that one learning designer should do it all:

Should we spend time doing work we don’t enjoy or excel at, when there are others better equipped?

Just because it’s the way things are, doesn’t mean it’s the way things should be.

I believe a future exists where these expectations are relinquished. A future where the end result is not dictated by our ability to master all aforementioned skills, but by our ability to specialise on those tasks we enjoy.

How that will manifest, I don’t know (although I do have some ideas).

Ant, I’m curious… is one of those ideas an evaluation specialist? Using the ADDIE model as a guide, that same person might also attend to Analysis (so a better job title might be L&D Analyst) while other specialists focus on Design, Development and Implementation.

Then e-learning developers mightn’t feel the compulsion to call themselves Learning Experience Designers, and trainers won’t be similarly shamed into euphemising their titles. Specialists such as these can have the courage to embrace their expertise and do what they do best.

And important dimensions of our work – including evaluation – won’t only be done. They’ll be done right.

Not our job

Despite the prevailing rhetoric for the Learning & Development function to be “data driven”, data for the purposes of evaluating what we do is notoriously hard to come by.

Typically we collect feedback from happy sheets (which I prefer to call unhappy sheets) and confirm learning outcomes via some form of assessment.

In my experience, however, behavioural change is reported much less often, while anything to do with business metrics even less so. While I recognise multiple reasons for the latter in particular, one of them is simply the difficulty we mere mortals have in accessing the numbers.

Which has been a long-standing mystery to me. We’re all on the same team, so why am I denied the visibility of the information I need to do my job?

I’ve always suspected the root cause is a combination of human foibles (pride, fear, territoriality), substandard technology (exacerbated by policy) and a lack of skill or will to use the technology even when it is available.

Notwithstanding these ever-present problems, it’s been dawning on me that the biggest blocker to our ability to work with the numbers is the fact that, actually, it’s not our job.

Business woman presenting data to two colleagues

Consider a bank that discovers a major pain point among its customers is the long turnaround time on their home loan applications. To accelerate throughput and thus improve the customer experience, the C-suite makes a strategic decision to invest in an AI-assisted processing platform.

I contend the following:

  • It’s the job of the implementation team to ensure the platform is implemented properly.
  • It’s the job of the L&D team to build the employees’ capability to use it.
  • It’s the job of the service manager to report the turnaround times.
  • It’s the job of the CX researchers to measure the customer experience.
  • It’s the job of the C-suite to justify their strategy.

In this light, it’s clear why we L&D folks have so much trouble trying to do the other things on the list that don’t mention us. Not only are we not expected to do them, but those who are don’t want us to do them.

In short, we shouldn’t be doing them.


At this juncture I wish to caution against conflating learning & development with performance consulting.

Yes, learning & development is a driver of performance, and an L&D specialist may be an integral member of a performance centre, but I urge anyone who’s endeavouring to rebrand their role as such to heed my caveat.

My point here is that if you are responsible for learning & development, be responsible for it; and let those who are responsible for performance be responsible for it.


Having said that, there is plenty we should be doing within the bounds of our role to maximise the performance of the business. Ensuring our learning objectives are action oriented and their assessment authentic are two that spring to mind.

And I don’t wish to breathe air into the juvenile petulance that the phrase “not my job” can entail. On the contrary, we should be collaborating with our colleagues on activities related to our remit – for example training needs analysis, engineering the right environmental conditions for transfer, and even Level 4 evaluation – to achieve win-win outcomes.

But do it with them, not for them, and don’t let them offload their accountability for it being done. If they don’t wish to collaborate, so be it.

Essentially it boils down to Return on Expectation (ROE). In our quest to justify the Return on Investment (ROI) of our own service offering, we need to be mindful of what it is our financiers consider that service to be.

Anything beyond that is an inefficient use of our time and expertise.

Scaling up

In Roses are red, I proposed definitions for oft-used yet ambiguous terms such as “competency” and “capability”.

Not only did I suggest a competency be considered a task, but also that its measurement be binary: competent or not yet competent.

As a more general construct, a capability is not so readily measured in a binary fashion. For instance, the question is unlikely to be whether you can analyse data, but the degree to which you can do so. Hence capabilities are preferably measured via a proficiency scale.

Feet on scales

Of course numerous proficiency scales exist. For example:

No doubt each of these scales aligns to the purpose for which it was defined. So I wonder if a scale for the purpose of organisational development might align to the Kirkpatrick Model of Evaluation:

 Level  Label  Evidence 
0 Not Yet Assessed  None
1 Self Rater Self rated
2 Knower Passes an assessment
3 Doer Observed by others
4 Performer Meets relevant KPIs
5 Collaborator Teaches others

Table 1. Tracey Proficiency Scale (CC BY-NC-SA)

I contend that such a scale simplifies the measurement of proficiency for L&D professionals, and is presented in a language that is clear and self-evident for our target audience.

Hence it is ahem scalable across the organisation.

Unhappy sheets

Few things in the L&D profession attract as much disdain as the Level 1 evaluation survey.

Known colloquially as the “happy sheet”, the humble questionnaire is typically considered too onerous, too long, too shallow, and it gives whiners a platform to complain about the sandwiches or the air conditioning.

A finger selecting an unhappy face in a poll on a tablet

And yet a long time ago, someone told me something that has stuck with me ever since: you don’t want to not do it.

Why? Well first of all, it’s easy. Given the availability of online forms these days, rustling one up is like falling of a log, and the back-end compilation of the results is rather impressive.

And the survey shouldn’t be too long; that’s the fault of its design, not of the concept. In fact, I advocate only two questions…

  1. The net promoter: How likely are you to recommend this learning experience to a colleague?
  2. The open-ended: How might we improve this learning experience?

The net promoter score (NPS) has itself been criticised, but I like it because it’s a simple number that’s easy to track and report. By no means an in-depth analysis, it’s a summary indicator to keep an eye on. The standard to which it adheres is quite high – a promoter is a 9 or a 10 – and a negative score is a sure-fire sign you’re not hitting the mark.

The open-ended question shores up the NPS by enabling the respondent to explain their rating. If there’s a problem, this is where it will appear.

Indeed, the sandwiches and the air conditioning are favourite punching bags, but we L&D pro’s do bang on a lot about the learning environment. You don’t get much more environmental than shelter and sustenance, so why not turn the dial and mix up the menu?

A vegetable sandwich on a plate

I also advocate the following…

  • Resist asking the participants to hand their feedback to you in person; or worse, share it aloud. This will probably result in platitudes which – unless that’s what you’re really after – are effectively useless.
  • Allow the participants to submit their feedback anonymously. Again, you want them to be honest. If they still lie, your organisational culture has way bigger issues to address!
  • Allow the provision of feedback to be voluntary. You need actionable insights, so invite only the feedback that someone feels strongly enough is worthwhile actioning.
  • Invite feedback less frequently over time. We all suffer from survey fatigue, so after you’ve got a good gauge of what’s going on, keep your finger on the pulse with the occasional spot check.

If we look at Level 1 evaluation through this lens, we consider the feedback form less a “happy sheet” and more an “unhappy sheet”. It exposes hidden ailments that you can subsequently remedy.

It’s not the be-all-and-end-all of evaluation, nor is it meant to be. Rather, it’s the canary in the coal mine that alerts you to a risk before it gets any worse.

The unscience of evaluation

Evaluation is notoriously under done in the corporate sector.

And who can blame us?

With ever increasing pressure bearing down on L&D professionals to put out the next big fire, it’s no wonder we don’t have time to scratch ourselves before shifting our attention to something new – let alone measure what has already been and gone.

Alas, today’s working environment favours activity over outcome.

Pseudo echo

I’m not suggesting that evaluation is never done. Obviously some organisations do it more often than others, even if they don’t do it often enough.

However, a secondary concern I have with evaluation goes beyond the question of quantity: it’s a matter of quality.

As a scientist – yes, it’s true! – I’ve seen some dodgy pseudo science in my time. From political gamesmanship to biased TV and clueless newspaper reports, our world is bombarded with insidious half-truths and false conclusions.

The trained eye recognises the flaws (sometimes) but of course, most people are not science grads. They can fall for the con surprisingly easily.

The workplace is no exception. However, I don’t see it as employees trying to fool their colleagues with creative number crunching, so much as those employees unwittingly fooling themselves.

If a tree falls in the forest

The big challenge I see with evaluating learning in the workplace is how to demonstrate causality – ie the link between cause and effect.

Suppose a special training program is implemented to improve an organisation’s flagging culture metric. When the employee engagement survey is run again later, the metric goes up.

Congratulations to the L&D team for a job well done, right?

Not quite.

What actually caused the metric to go up? Sure, it could have been the training, or it could have been something else. Perhaps a raft of unhappy campers left the organisation and were replaced by eager beavers. Perhaps the CEO approved a special bonus to all staff. Perhaps the company opened an onsite crèche. Or perhaps it was a combination of factors.

If a tree falls in the forest and nobody hears it, did it make a sound? Well, if a few hundred employees undertook training but nobody measured its effect, did it make a difference?

Without a proper experimental design, the answer remains unclear.

Scientist holding two flasks, each containing blue liquid.

Evaluation by design

To determine with some level of confidence whether a particular training activity was effective, the following eight factors must be considered…

1. Isolation – The effect of the training in a particular situation must be isolated from all other factors in that situation. Then, the metric attributed to the staff who undertook the training can be compared to the metric attributed to the staff who did not undertake the training.

In other words, everything except participation in the training program must be more-or-less the same between the two groups.

2. Placebo – It’s well known in the pharmaceutical industry that patients in a clinical trial who are given a sugar pill rather than the drug being tested sometimes get better. The power of the mind can be so strong that, despite the pill having no medicinal qualities whatsoever, the patient believes they are doing something effective and so their body responds in kind.

As far as I’m aware, this fact has never been applied to the evaluation of corporate training. If it were, the group of employees who were not undertaking the special training would still need to leave their desks and sit in the classroom for three 4-hour stints over three weeks.


Because it might not be the content that makes the difference! It could be escaping the emails and phone calls and constant interruptions. It could be the opportunity to network with colleagues and have a good ol’ chat. It might be seizing the moment to think and reflect. Or it could simply be an appreciation of being trained in something, anything.

3. Randomisation – Putting the actuaries through the training and then comparing their culture metric to everyone else’s sounds like a great idea, but it will skew the results. Sure, the stats will give you an insight into how the actuaries are feeling, but it won’t be representative of the whole organisation.

Maybe the actuaries have a range of perks and a great boss; or conversely, maybe they’ve just gone through a restructure and a bunch of their mates were made redundant. To minimise these effects, staff from different teams in the organisation should be randomly assigned to the training program. That way, any localised factors will be evened out across the board.

4. Sample size – Several people (even if they’re randomised) can not be expected to represent an organisation of hundreds or thousands. So testing five or six employees is unlikely to produce useful results.

5. Validity – Calculating a few averages and generating a bar graph is a sure-fire way to go down the rabbit hole. When comparing numbers, statistically valid methods such as Analysis of Variance are required to get significant results.

6. Replication – Even if you were to demonstrate a significant effect of the training for one group, that doesn’t guarantee the same effect for the next group. You need to do the test more than once to establish a pattern and negate the suspicion of a one-off.

7. Subsets – Variations among subsets of the population may exist. For example, the parents of young children might feel aggrieved for some reason, or older employees might feel like they’re being ignored. So it’s important to analyse subsets to see if any clusters exist.

8. Time and space – Just because you demonstrated the positive effect of the training program on culture in the Sydney office, doesn’t mean it will have the same effect in New York or Tokyo. Nor does it mean it will have the same effect in Sydney next year.

Weird science

Don’t get me wrong: I’m not suggesting you need a PhD to evaluate your training activity. On the contrary, I believe that any evaluation – however informal – is better than none.

What I am saying, though, is for your results to be more meaningful, a little bit of know-how goes a long way.

For organisations that are serious about training outcomes, I go so far as to propose employing a Training Evaluation Officer – someone who is charged not only with getting evaluation done, but with getting it done right.