The evaluation of results achieved in the WP 3 and WP 6, started in April 2009 and was finished in the middle of May. In total 72 respondents were involved into process of evaluation by filling the forms online. The results of evaluation were analyzed in many aspects and are presented below in the Fig. 1-14 and the corresponding comments. First of all, the results were considered across the different users groups in order to investigate how their needs were satisfied. Each respondent had to express own opinion assigning a score – an integer ranging from 0 (poor or not available) to 4 (excellent). Secondly, the average scores, given by many respondents to the separate questions, WPs, Categories and to the main quality Criteria used for testing a quality, are shown in the Fig. 6-10. The estimation results derived as the averages were compared to each other and the reliable statistical inference made based on the approximate 0,95 confidence intervals fitted to the estimators of quality. It is shown that with the 0,95 confidence we can confirm that the developed Tools are the strongest property in the ENRICH results when compared with other properties such as digital Objects, Processing or using Repository as a whole. Also with the 0,95 confidence we state that the Interoperability and Adaptability are the best in ENRICH results when compared with other Criteria such as Multilinguality or Usability.
I . The Results across the Different Target Users Groups

Fig.1. The total number of the respondents was 72 belonging to the four different groups: content providers – information managers, technical personnel – supporting staff, scholars – researchers in historical documents, and the general or the end-users having general interests.

Fig.2. The seven questions concerning the quality in WP 3 and WP 6 evaluated (the scores from 0 to 4) by different users groups. The average scores of each question are given numerically on the blue line.

Fig.3. The spread of the opinions among the users groups is rather equal in the WP 3 questions and have much larger variations in the WP 6 quality evaluation questions.

Fig. 4. The averages of the numerical values, assigned to each of the four questions, reflecting the quality of WP 6 results, evaluated by different users groups. The average in the right of this diagram shows the total evaluation of a quality in WP 6 by different users.

Fig. 5. The averages of the numerical values, assigned to each of the three questions, reflecting the quality of WP 3 results, evaluated by different users groups. The average in the right of this diagram shows the total evaluation of a quality in WP 3 by different users.
II. The Scores in WPs, Categories, and Main Criteria

Fig.6. The average scores of 3 questions on a quality in WP 3 evaluated by different users.

Fig.7. The average scores of 4 questions on a quality in WP 6 evaluated by different users.

Fig.8. The average total scores, reflecting a quality in WP 3 and WP 6, evaluated by different users. It seems the quality of results achieved in the WP 3 is higher than in WP 6 – that is the opinion of 72 respondents involved into evaluation.
In order to test correctly the statistical hypothesis that WP 3 results are better than those of WP6, let us calculate the total average given by various users to WP 3 and WP 6. Those quantities are 3,52 and 2,66, respectively. Let us fix the standard significance value 0,05, corresponding to 0,95 confidence level in the following interval estimator p*+ 1,96 [p* (1- p*)/4nk]1/2 . Here p* is an average quality estimator divided by 4 – the maximum possible value, n – number of respondents, k – number questions used for evaluation. Applying these formulae we have the following confidence intervals:
WP 3: [3,49 – 3,55]; WP 6: [2,62 – 2,70].
Because those confidence intervals are non-overlapping we can conclude that a difference in quality of WP 3 and WP 6 is significant and with the probability equal to 0,95 we can confirm that the results in WP 3 are better than in WP 6.
The most important goal is to derive the average scores across the categories (digital objects, tools developed in ENRICH project, processing, and a repository as a whole) and the Main Criteria in quality. It will be done in the next paragraph.

Fig.9. The average scores of the four categories as the components of a quality in ENRICH project results, achieved in WP 3 and WP 6, as they were evaluated by 72 respondents. The maximum possible score is 4, the “Tools” category is closest to it.

Fig.10. The average scores of the four Main Criteria reflecting the quality of ENRICH project results, achieved in WP 3 and WP 6, as they were evaluated by 72 respondents. The Interoperability and Adaptability are estimated better than Usability and Multilinguality. Are those differences significant?
This question will be answered in the next paragraph.
III. Confidence Limits of Estimators

Fig.11. The percentages of the four categories as the components of a quality in the ENRICH project results, achieved in WP 3 and WP 6, as they were evaluated by 72 respondents. The approximate upper and lower confidence limits derived for each category in order to investigate if the differences in the obtained estimates are significant or not.

Fig.12. The percentages of the four categories as the components of a quality in the ENRICH project results, achieved in WP 3 and WP 6, as they were evaluated by 72 respondents. The approximate upper and lower confidence limits derived for each category.
Results displayed in the Fig. 12 show that we can conclude with the 0,95 confidence that the developed Tools are the strongest property in ENRICH results when compared with other properties such as digital Objects, Processing or using Repository as a whole. Looking in general all these aspects were evaluated very well, the results: 64; 63,5; 70,5 assigned to those categories (from the maximum possible 100 points) are rather good. According to the Methodology for Evaluation, described in D 7.1, the score falling in the interval 0 – 25 means that the result is low, 26 – 75 it is rather good (satisfactory), 76 – 100 it is very good. Therefore the Tools received the very good evaluation from the ENRICH partners and other related institutions.
Results displayed in the Fig. 13 show that we can conclude with the 0,95 confidence that the Interoperability and Adaptability are the best ENRICH results when compared with other Criteria such as Multilinguality or Usability. Looking in general, all involved Criteria have got very high estimates, even the lowest results: 66,56; 74,5 assigned to those Criteria (from the maximum possible 100 points) are rather good. The Interoperability and Adaptability received very good evaluation or the excellent mark from the ENRICH partners and other related institutions.

Fig.13. The percentages of the four Main Criteria reflecting a quality of the ENRICH project results, achieved in WP 3 and WP 6, as they were evaluated by 72 respondents. The approximate upper and lower confidence limits derived for each Criterion. Adaptability and Interoperability received very high evaluations and those are significantly different from the evaluated Multilinguality and Usability properties.
IV. ENRICH Partners Involvement into Evaluation Process

Fig. 14. The total number of the respondents is 72 – the ENRICH partners submitting their evaluation for WP 3 and WP 6. Displayed data shows the state at 14th May 2009, when the evaluation of WP 3 and WP 6 was finnished in principle.
The all project partners have to participate in evaluation process. Thank you for your efforts in evaluating the first results derived in WPs until December 2008.