Reliability tells us the error that may be present in a result measure. It is often reported as inter-rater reliability and intra-rater reliability; this is informative, but may not be immediately accessible for use in daily clinical decision making. In addition, another chance to estimate reliability is to calculate the default error of the identification number, which can then be used to calculate the minimum verifiable change (MDC) of an indicator; this notification is presented in the units of the result measure. The MDC tells us how many changes it takes to make sure it is a real change and not just measurement errors. Researchers often think that there is a great deal of agreement between counsellors, but there is only temporary reflection on the importance of correspondence between counsellors and their importance . When the agreement is quantified, a single measure of compliance is generally declared (most often cronbach α), as well as a statement that the amount observed only justifies taking into account the average assessment for each face. As noted above, the use of average assessment in future analyses may lead to conclusions that may be misleading or even meaningless, if individual differences are both significant and potentially instructive. Kappa is similar to a correlation coefficient, as it can`t exceed 1.0 or -1.0. Because it is used as a measure of compliance, only positive values are expected in most situations; Negative values would indicate a systematic disagreement.
Kappa can only reach very high values if the two matches are good and the target condition rate is close to 50% (because it incorporates the base rate in the calculation of joint probabilities). Several authorities have proposed “thumb rules” to interpret the degree of the agreement, many of which coincide at the center, although the words are not identical.     The common probability of an agreement is the simplest and least robust measure. It is estimated as a percentage of the time advisors agree in a nominal or categorical evaluation system. It ignores the fact that an agreement can only be made on the basis of chance. The question arises as to whether a random agreement should be “corrected” or not; Some suggest that such an adaptation is in any case based on an explicit model of the impact of chance and error on business decisions.  If advisors tend to accept, the differences between evaluators` observations will be close to zero. If one advisor is generally higher or lower than the other by a consistent amount, the distortion differs from zero. If advisors tend to disagree, but without a consistent model of one assessment above each other, the average will be close to zero. Confidence limits (generally 95%) It is possible to calculate for bias and for each of the limits of the agreement. The statement, which was presented on screen throughout the experiment, was “In this particular photo, what does X look like?”, with X referring to the specific property.
Five of the characteristics were sex (1 – very feminine, 7 – very masculine), age (1 – very young, 7 – very old), attractiveness (1 – very unattractive, 7 – very attractive), reliability (1 – very unreliable, 7 – very trustworthy) and predominance (1 – very low, 7 – very high). The last feature was the resemblance to a relative, so “in this particular photo, how much does the person look like your mother and dad?” The instruction (“mom” or “dad”) was adapted to the respective gender of this image, and the rating scale was marked by “1 – not at all, 7 – much”. In trying to avoid unwanted similarities, only because of age influences (i.e. all those with young parents would consider old faces to be weak), images where the identity was older or more recent than their parents were ordered to imagine how much the person on the screen resembled his progenitor when that parent was of a similar age.