Background: Explore biases and noise in generative AI text, emphasizing quality enhancement. Purpose: Utilize ROUGE indicators for objective evaluation and model comparison. Methods: Conduct simulated ROUGE experiments in Python for comprehensive analysis. Results: ROUGE reveals notable biases and noise in generated text. Conclusions: ROUGE is effective for assessing and improving generative AI model quality.