Updated Results of Evaluation and Testing in WP 3, WP 5 and WP 6

 

 

Summary. The evaluations of results achieved in the project ENRICH Work Packages (WP), namely WP 3 and WP 6, started in April 2009 was finished in the middle of May. The WP 5 was evaluated during 15 May – 18 June. In total 116 respondents were involved into process of evaluation by filling the evaluation forms online. Each respondent had to express own opinion on the specified result, assigning a score – an integer ranging from 0 (poor or not available) to 4 (excellent). The structure of respondents profile and their general opinions on results achieved during 18 months of project work are illustrated in the Fig.1 and Fig.2. The results of evaluation were analyzed in many aspects and are presented in this report: illustrated by Fig. 3-19 with corresponding comments. First of all, the results were considered across the different users groups in order to investigate how their needs were satisfied, see Fig. 2-12. Secondly, the average scores, given by many respondents to the separate questions on results achieved in WPs, related to Categories and quality Criteria used for testing, are shown in the Fig. 13-19. The estimation results derived as the proportions of averages were compared to each other and the reliable statistical inference made based on the approximate 0,95 confidence intervals fitted to the estimators of quality. It is shown that with the 0,95 confidence we can confirm that the created Processes and Tools are the strongest properties in the ENRICH results achieved during the 18 months of work. Similarly, the Interoperability and Adaptability are the best (with the 0,95 confidence) in ENRICH results when compared with other quality Criteria such as Multilinguality or Usability.

 

I . The Results across the Different Target Users Groups

 

Fig. 1. The total number of the respondents was 116. The four target users groups were considered: content providers – information managers (53), technical personnel – supporting staff (38), scholars – researchers in historical documents (10), and the general or the end-users having general interests (15).

 

Fig. 2. Project results at the month 18th of ENRICH work (6 months to the project end) evaluated by target users groups, compared to the maximum possible score 4.

 

Fig. 3. Ten properties (10 questions asked) concerning the quality in WP 3, WP 5, and WP 6, evaluated in average by target users groups (the individual scores were ranging from 0 to 4). The average scores of each property are given numerically on the blue line.

 

Fig. 4. The spread of the opinions among the users groups is rather equal in the WP 3 and WP 5 questions and have much larger variations in the last four questions - WP 6 quality evaluation. The smallest value is 1,33 - the average score assigned to WP6-d by scholars.

 

Fig. 5. The averages of the numerical values, assigned to each of the three questions, reflecting the quality of WP 3 results, evaluated by different users groups.  The average of WP 3 (located at the right of this diagram) shows the spread of opinions on quality in WP 3 by target users. The total average of WP 3 (at 18th month) is equal to 3,32.

 

The questions in WP 3 were rather technical and certainly difficult to access for general users and sometime for scholars – researchers in the historical documentary heritage. Investigating more thoroughly the results in Fig. 5 we see that the four extreme values equal to 4 are allocated at WP3-a and WP3-b by scholars and general users and they can affect significantly the average of WP 3. Let us exclude the extreme values and apply the stratified sampling for evaluating WP 3 results only by real experts: content providers, information managers (15 respondents in WP 3) and technical personnel, supporting staff (17 respondents in WP 3). Happily, they were in majority of this sample (comparing to 2 scholars and 4 general users only).

 

Fig. 6. The averages of the numerical values, assigned to each of the three questions, reflecting the quality of WP 3 results, evaluated only by expert users groups. The average at the right of this diagram shows the total averages assigned by content providers and technical staff and the total average of WP3 given by experts is equal to 3,26.

 

Fig. 7. The averages of the numerical values, assigned to each of the three questions, reflecting the quality of WP 5 results, evaluated by target users groups. The average calculated across the WP 5 questions shows the total evaluation of a quality in WP 5. The average of WP 5 in total is equal 3,32.

 

Fig. 8. The averages of the numerical values, assigned to each of the four questions, reflecting the quality of WP 6 results, evaluated by target users groups. The average located at the right of this diagram shows the total evaluation of a quality in WP 6 by different users. The average of WP 6 in total is 2,66.

 

To top

II. The Scores in WPs, Categories, and Main Criteria

 

Fig. 9. The average scores of 3 questions on a quality in WP 3 evaluated by full sample (38 respondents) of users. The four extreme values equal to 4, assigned by scholars and general users, located in WP3-a and WP3-b.

 

Fig. 10. The average scores of 3 questions on a quality in WP 3 evaluated by stratified sample including only the expert users. Total number of respondents there is 32. The extreme values, affecting the final average, were excluded. By experts evaluation the average of WP 3 in total is 3,26 (compare to a previous result 3,32).

 

Fig. 11. The average scores of 3 questions on a quality in WP 5 evaluated by 41 respondents - all users of WP 5.

 

Fig. 12. The average scores of 4 questions on a quality in WP 6 as were evaluated by all users (37 respondents). The scholars have the most spread opinions in contrast to general users having almost uniform opinion.

 

Fig. 13. The average total scores, reflecting a quality in WP 3, WP 5 and WP 6, evaluated by different users. It seems that the quality of results achieved in the WP 3 and WP5 is higher than in WP 6 – that is an integrated opinion of 72 respondents involved into this evaluation.

 

Let us compare the total averages given by various users to WP 3, WP 5 and WP 6. Those quantities are 3,32 (or 3,26 evaluated only by experts), 3,32, and 2,66, respectively. Are those estimates significantly different? The problem of statistical inference, say, testing a hypothesis that WP 3 results are better than those of WP6, will be addressed in the next paragraph, see Fig. 16.

Now let us derive the average scores across the categories (digital objects, tools developed in ENRICH project, processing, and a repository as a whole) and the Main Criteria in quality.

 

Fig.14. The average scores of the four categories as the components of a quality in ENRICH project results, achieved in WP 3, WP 5 and WP 6, as they were evaluated by 116 respondents. The maximum possible score is 4, the “Process” category is closer to it compared to other values.

 

Fig.15. The average scores of the four Main Criteria reflecting the quality of ENRICH project results, achieved in WP 3, WP 5 and WP 6, as they were evaluated by 116 respondents. The Interoperability and Adaptability seems to be estimated better than Usability and Multilinguality. Are those differences significant?

 

The question of correct comparison of available evaluation results will be answered in the following paragraph.

To top

III. Confidence Limits of Estimators

 

In order to test correctly the statistical hypothesis that WP 3 results are better than those of WP6, let us fix the standard significance value 0,05, corresponding to 0,95 confidence level (and the critical value 1,96 in the normal approximation of statistics used). Then the following 0,95 confidence interval is fitted: p*+ 1,96 [p* (1- p*) / 4nk]1/2 . Here p* is an average quality estimator divided by 4 – the maximum possible its value, n – number of respondents, k – number questions used for evaluation. Applying this formula we have the confidence intervals for WPs investigated, summarized in the following table.

Because the confidence intervals for WP 3 results (both cases) and WP5 are overlapping we can conclude that there is no significant difference among them. But evidently a different conclusion follows when considering a difference in quality of WP 3 (or WP 5) and WP 6 - it is significant and with the probability equal 0,95 we can confirm that results in WP 3 are better than in WP 6.

 

Fig.16. The average evaluation (72 respondents) of a quality in the ENRICH project results, achieved in WP 6, WP 5 and WP 3 evaluated in two ways: experts only and by all users. The approximate upper and lower confidence limits (CL) derived for each case. The results are related to those in Fig.13 and show evidently that the only significant difference in lower quality what can be concluded is in WP 6. Other results are equally good.

 

Fig.17. The percentages of the four categories as the components of a quality in the ENRICH project results, achieved in WP 3, WP 5 and WP 6 how they were evaluated by 116 respondents. The approximate upper and lower confidence limits (CL) derived for each category.

 

Results displayed in the Fig. 17 show that we can conclude with the 0,95 confidence that the developed Processing and Tools are the strongest property in ENRICH results especially when compared with using Repository as a whole. The Object evaluation is rather good also but has larger standard deviation. Looking in general, all these aspects were evaluated very well, the results: 82,25; 79,25; 75,75 assigned to those categories (from the maximum possible 100 points) all are very good. According to the Methodology for Evaluation, described in D 7.1, the score falling in the interval 0 – 25 means that the result is low, 26 – 75 it is rather good (satisfactory), 76 – 100 it is very good. Therefore the Tools and Processing received the very good evaluation from the ENRICH partners and other related institutions.

 

Fig.18. The percentages of the four Main Criteria reflecting a quality of the ENRICH project results, achieved in WP 3, WP 5 and WP 6, as they were evaluated by 116 respondents. The approximate upper and lower confidence limits derived for each Criterion. Adaptability and Interoperability received very high evaluations and those are significantly different from the evaluated Multilinguality or Usability properties.

 

Results displayed in the Fig.18 show that we can conclude with the 0,95 confidence that the Interoperability and Adaptability are the best ENRICH results when compared with other Criteria such as Multilinguality or Usability. Looking in general, all involved Criteria have got very high estimates, even the lowest result 69,75 assigned to Multilinguality (from the maximum possible 100 points) is rather good. The Interoperability Adaptability and Usability received very good evaluation or the excellent mark from the ENRICH partners and other related institutions.

 

Fig.19. The percentages of satisfaction of target groups on a quality of the ENRICH project results, achieved in WP 3, WP 5 and WP 6, as they were evaluated by 116 respondents. The results are related to Fig.2.

 

The approximate upper and lower confidence limits derived for each group of users show that there are no significant difference among users opinions – almost all intervals are overlapping. The difference can be stated only say between opinions of content providers and general users – the general users are more satisfied with ENRICH project results than experts – content providers.

To top

IV. Conclusions

 

  1. The measure evaluating a quality of results achieved in ENRICH project have been created and reflects a satisfaction of users in target groups using the results developed in project: digital objects, tools, processing and usability of whole repository.
  2. Numerical evaluation results are comparable to each other and show the weak and strong points in the activities and different other aspects.
  3. The summarized results of the first 18 month work in the project are the following:
    • All results are evaluated rather high: the most optimistic were general users, less – content providers (Fig.2) but statistically their opinions are not very different (Fig.19)
    • Processes and Tools evaluated significantly better than Objects and Repository (Fig.17)
    • Adaptability and Interoperability received very high scores and those are significantly different from the evaluated Usability and Multilinguality properties (Fig.18).

To top

 

Evaluation and Testing Applied to ENRICH WP 3, WP 5, and WP 6

 

The all project partners have to participate in evaluation process. Thank you for your efforts in evaluating the first results derived in WPs until December 2008.

 

Home