A combination of 3 different insights

If I am right this should have:

Better sample efficiency due to reward factorization (giving stronger learning signal)
Less biased value estimator (due to target value learning)
Unsupervised skill discovery (due to information encoding from clock-DIAYN)