Integrate Microsoft.Extensions.AI.Evaluation packages for quality and safety evaluation

## Problem
The product needs an official evaluation foundation for response quality, tool use, and safety rather than ad hoc scoring.

## Scope
- Track use of Microsoft.Extensions.AI.Evaluation, Quality, NLP, Safety, and Reporting packages where each fits
- Cover evaluation targets such as relevance, groundedness, completeness, task adherence, tool-call accuracy, and safety
- Keep evaluation broad enough for coding and non-coding agents

## Out of scope
- Vendor-specific evaluation stacks outside the approved default
- Final scorecard UI behavior beyond the evaluation foundation itself

## Implementation notes
- Prefer the official Microsoft evaluation libraries as the default path
- Keep evaluation compatible with transcript replay and telemetry
- Separate evaluation foundation from scorecard presentation concerns

## Definition of Done
- The issue defines which official evaluation packages belong in the product and why
- The issue makes the approved evaluation direction explicit for implementers

## Verification
- Review the issue against the feature spec and repo AGENTS updates

## Dependencies
- Parent epic: #20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Microsoft.Extensions.AI.Evaluation packages for quality and safety evaluation #66

Problem

Scope

Out of scope

Implementation notes

Definition of Done

Verification

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Integrate Microsoft.Extensions.AI.Evaluation packages for quality and safety evaluation #66

Description

Problem

Scope

Out of scope

Implementation notes

Definition of Done

Verification

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions