Tag: measurement

Scaling up

In Roses are red, I proposed definitions for oft-used yet ambiguous terms such as “competency” and “capability”.

Not only did I suggest a competency be considered a task, but also that its measurement be binary: competent or not yet competent.

As a more general construct, a capability is not so readily measured in a binary fashion. For instance, the question is unlikely to be whether you can analyse data, but the degree to which you can do so. Hence capabilities are preferably measured via a proficiency scale.

Feet on scales

Of course numerous proficiency scales exist. For example:

No doubt each of these scales aligns to the purpose for which it was defined. So I wonder if a scale for the purpose of organisational development might align to the Kirkpatrick Model of Evaluation:

 Level  Label  Evidence 
0 Not Yet Assessed  None
1 Self Rater Self rated
2 Knower Passes an assessment
3 Doer Observed by others
4 Performer Meets relevant KPIs
5 Collaborator Teaches others

Table 1. Tracey Proficiency Scale (CC BY-NC-SA)

I contend that such a scale simplifies the measurement of proficiency for L&D professionals, and is presented in a language that is clear and self-evident for our target audience.

Hence it is ahem scalable across the organisation.

The unscience of evaluation

Evaluation is notoriously under done in the corporate sector.

And who can blame us?

With ever increasing pressure bearing down on L&D professionals to put out the next big fire, it’s no wonder we don’t have time to scratch ourselves before shifting our attention to something new – let alone measure what has already been and gone.

Alas, today’s working environment favours activity over outcome.

Pseudo echo

I’m not suggesting that evaluation is never done. Obviously some organisations do it more often than others, even if they don’t do it often enough.

However, a secondary concern I have with evaluation goes beyond the question of quantity: it’s a matter of quality.

As a scientist – yes, it’s true! – I’ve seen some dodgy pseudo science in my time. From political gamesmanship to biased TV and clueless newspaper reports, our world is bombarded with insidious half-truths and false conclusions.

The trained eye recognises the flaws (sometimes) but of course, most people are not science grads. They can fall for the con surprisingly easily.

The workplace is no exception. However, I don’t see it as employees trying to fool their colleagues with creative number crunching, so much as those employees unwittingly fooling themselves.

If a tree falls in the forest

The big challenge I see with evaluating learning in the workplace is how to demonstrate causality – ie the link between cause and effect.

Suppose a special training program is implemented to improve an organisation’s flagging culture metric. When the employee engagement survey is run again later, the metric goes up.

Congratulations to the L&D team for a job well done, right?

Not quite.

What actually caused the metric to go up? Sure, it could have been the training, or it could have been something else. Perhaps a raft of unhappy campers left the organisation and were replaced by eager beavers. Perhaps the CEO approved a special bonus to all staff. Perhaps the company opened an onsite crèche. Or perhaps it was a combination of factors.

If a tree falls in the forest and nobody hears it, did it make a sound? Well, if a few hundred employees undertook training but nobody measured its effect, did it make a difference?

Without a proper experimental design, the answer remains unclear.

Scientist holding two flasks, each containing blue liquid.

Evaluation by design

To determine with some level of confidence whether a particular training activity was effective, the following eight factors must be considered…

1. Isolation – The effect of the training in a particular situation must be isolated from all other factors in that situation. Then, the metric attributed to the staff who undertook the training can be compared to the metric attributed to the staff who did not undertake the training.

In other words, everything except participation in the training program must be more-or-less the same between the two groups.

2. Placebo – It’s well known in the pharmaceutical industry that patients in a clinical trial who are given a sugar pill rather than the drug being tested sometimes get better. The power of the mind can be so strong that, despite the pill having no medicinal qualities whatsoever, the patient believes they are doing something effective and so their body responds in kind.

As far as I’m aware, this fact has never been applied to the evaluation of corporate training. If it were, the group of employees who were not undertaking the special training would still need to leave their desks and sit in the classroom for three 4-hour stints over three weeks.

Why?

Because it might not be the content that makes the difference! It could be escaping the emails and phone calls and constant interruptions. It could be the opportunity to network with colleagues and have a good ol’ chat. It might be seizing the moment to think and reflect. Or it could simply be an appreciation of being trained in something, anything.

3. Randomisation – Putting the actuaries through the training and then comparing their culture metric to everyone else’s sounds like a great idea, but it will skew the results. Sure, the stats will give you an insight into how the actuaries are feeling, but it won’t be representative of the whole organisation.

Maybe the actuaries have a range of perks and a great boss; or conversely, maybe they’ve just gone through a restructure and a bunch of their mates were made redundant. To minimise these effects, staff from different teams in the organisation should be randomly assigned to the training program. That way, any localised factors will be evened out across the board.

4. Sample size – Several people (even if they’re randomised) can not be expected to represent an organisation of hundreds or thousands. So testing five or six employees is unlikely to produce useful results.

5. Validity – Calculating a few averages and generating a bar graph is a sure-fire way to go down the rabbit hole. When comparing numbers, statistically valid methods such as Analysis of Variance are required to get significant results.

6. Replication – Even if you were to demonstrate a significant effect of the training for one group, that doesn’t guarantee the same effect for the next group. You need to do the test more than once to establish a pattern and negate the suspicion of a one-off.

7. Subsets – Variations among subsets of the population may exist. For example, the parents of young children might feel aggrieved for some reason, or older employees might feel like they’re being ignored. So it’s important to analyse subsets to see if any clusters exist.

8. Time and space – Just because you demonstrated the positive effect of the training program on culture in the Sydney office, doesn’t mean it will have the same effect in New York or Tokyo. Nor does it mean it will have the same effect in Sydney next year.

Weird science

Don’t get me wrong: I’m not suggesting you need a PhD to evaluate your training activity. On the contrary, I believe that any evaluation – however informal – is better than none.

What I am saying, though, is for your results to be more meaningful, a little bit of know-how goes a long way.

For organisations that are serious about training outcomes, I go so far as to propose employing a Training Evaluation Officer – someone who is charged not only with getting evaluation done, but with getting it done right.

LATI: A better way to measure influence on Twitter?

Twitter hero

I’ve never been comfortable with attributing digital influence to the number of followers someone has on Twitter.

To me, it’s more a measure of your longevity on the platform. The longer you have been on Twitter, the more followers you will have collected over the years.

Sure, the quality of your tweets and other variables will have an effect, but simply comparing the raw number of followers among tweeps is not really comparing apples with apples.

Three fresh green apples and one orange

I was ruminating over this when it dawned on me: why not divide the number of followers by the number of years the person has been on the platform? That will remove the variance due to longevity from the equation.

For example, I currently have 831 followers and my Twitter age is 2.1 years, so my Longevity Adjusted Twitter Influence (LATI) is:

831 / 2.1 = 396

According to convention wisdom, someone who has 1500 followers is much more influential than I am. In absolute terms that may be true, but if their Twitter age is 4 years, their LATI is 375 – which suggests I am relatively more influential than they are. That means I’m on track to becoming more influential overall.

Compare that to someone who joins Twitter and attracts 200 followers in 3 months. That’s a LATI of 800 which blows both of us out of the water.

Clock

In the short term I imagine a typical person’s LATI would follow an s‑curve, whereby they take a while to attract followers in the beginning, then they ramp up as word spreads, then they plateau out again as their target demographic is exhausted. Over time, their LATI will decline as the years rack up without significantly more followers.

In contrast, truly influential people will continue to attract followers into infinity, so their LATI will remain high.

Einstein sticking out his tongue

Now I’m no mathematician, so my logic may be all screwed up. But to me it’s more meaningful because it levels out the playing field.

Of course, the metric doesn’t recognise who is following you. Someone with 10,000 followers won’t be very influential if those people have neither the means nor the inclination to act upon their pearls of wisdom.

Conversely, someone with only 3 followers will be incredibly influential if those people happen to be the President of the United States, Rupert Murdoch and the Head of the European Central Bank.

So notwithstanding complicated and opaque measures like Klout, LATI provides an open and convenient snapshot of digital influence. At the very least it’s fun to toy around with.