You are optimizing a large language model (LLM) for deployment on edge devices with limited computational resources. To reduce the model size and improve efficiency without significantly compromising performance, which of the following quantization techniques is most appropriate for this scenario?
Which of the following stopping criteria can help in generating coherent and well-structured text without cutting off mid -sentence or continuing unnecessarily?
In the context of Tuning Studio in IBM watsonx, what is one of the key benefits of using Compute Unit Hours (CUHs) during the fine -tuning process?
two)
describes the role of embeddings in the RAG process?
➢ TOTAL QUESTIONS: 379 In the context of IBM Watsonx and generative AI models, you are tasked with designing a model that needs to classify customer support tickets into different categories. You decide to experiment with both zero-shot and few -shot prompting techniques. Which of the following best explains the key difference between zero -shot and few -shot prompting?