Pre-training is expensive but Fine Tuning is comparatively Cheaper
training_data = """
Your training data goes here.
This can be a collection of articles, books, or any other relevant text.
"""
ine-tuning the Model To fine-tune the GPT-3.5 model with your training data, use the fine_tune function from the OpenAI library. Specify the training data, the model name, and any additional parameters you wish to include.
fine_tuning_job = openai.FineTune.create(
model_engine=model_engine,
n_epochs=n_epochs,
batch_size=batch_size,
learning_rate=learning_rate,
max_tokens=max_tokens,
training_file=os.path.abspath(training_file),
validation_file=os.path.abspath(validation_file),
)
job_id = fine_tuning_job["id"]
print(f"Fine-tuning job created with ID: {job_id}")
You can use the OpenAI API to monitor the progress of your fine-tuning job. The following code snippet shows how to fetch the status of the fine-tuning job:
import time
while True:
fine_tuning_status = openai.FineTune.get_status(job_id)
status = fine_tuning_status["status"]
print(f"Fine-tuning job status: {status}")
if status in ["completed", "failed"]:
break
time.sleep(60)
fine_tuned_model_id = fine_tuning_status["fine_tuned_model_id"]
# Use the fine-tuned model for text generation
def generate_text(prompt, model_id, max_tokens=50):
response = openai.Completion.create(
engine=model_id,
prompt=prompt,
max_tokens=max_tokens,
n=1,
stop=None,
temperature=0.5,
)
return response.choices[0].text.strip()
prompt = "Your example prompt goes here."
generated_text = generate_text(prompt, fine_tuned_model_id)
print(f"Generated text: {generated_text}")
Training Data
{"prompt": "What is the capital of France?", "completion": "Paris"}
{"prompt": "Which gas do plants absorb from the atmosphere?", "completion": "Carbon dioxide"}
{"prompt": "What is the largest mammal on Earth?", "completion": "Blue whale"}
{"prompt": "Which element has the atomic number 1?", "completion": "Hydrogen"}
Validation Data
{"prompt": "What is the chemical formula for water?", "completion": "H2O"}
{"prompt": "What is the square root of 81?", "completion": "9"}
{"prompt": "Who wrote the play 'Romeo and Juliet'?", "completion": "William Shakespeare"}
{"prompt": "What is the freezing point of water in Celsius?", "completion": "0 degrees Celsius"}
References:
https://medium.com/@smitkumbhani080/how-to-train-a-pre-trained-large-language-model-llm-in-python-using-openai-easy-27680c92fc3d
No comments:
Post a Comment