-- Living Mobile --: Training a pre-traininged LLM using OpenAI

Sunday, July 9, 2023

Training a pre-traininged LLM using OpenAI

Pre-training is expensive but Fine Tuning is comparatively Cheaper

training_data = """

Your training data goes here.

This can be a collection of articles, books, or any other relevant text.

"""

ine-tuning the Model To fine-tune the GPT-3.5 model with your training data, use the fine_tune function from the OpenAI library. Specify the training data, the model name, and any additional parameters you wish to include.

fine_tuning_job = openai.FineTune.create(

model_engine=model_engine,

n_epochs=n_epochs,

batch_size=batch_size,

learning_rate=learning_rate,

max_tokens=max_tokens,

training_file=os.path.abspath(training_file),

validation_file=os.path.abspath(validation_file),

)

job_id = fine_tuning_job["id"]

print(f"Fine-tuning job created with ID: {job_id}")

You can use the OpenAI API to monitor the progress of your fine-tuning job. The following code snippet shows how to fetch the status of the fine-tuning job:

import time

while True:

fine_tuning_status = openai.FineTune.get_status(job_id)

status = fine_tuning_status["status"]

print(f"Fine-tuning job status: {status}")

if status in ["completed", "failed"]:

break

time.sleep(60)

fine_tuned_model_id = fine_tuning_status["fine_tuned_model_id"]

# Use the fine-tuned model for text generation

def generate_text(prompt, model_id, max_tokens=50):

response = openai.Completion.create(

engine=model_id,

prompt=prompt,

max_tokens=max_tokens,

n=1,

stop=None,

temperature=0.5,

)

return response.choices[0].text.strip()

prompt = "Your example prompt goes here."

generated_text = generate_text(prompt, fine_tuned_model_id)

print(f"Generated text: {generated_text}")

Training Data

{"prompt": "What is the capital of France?", "completion": "Paris"}

{"prompt": "Which gas do plants absorb from the atmosphere?", "completion": "Carbon dioxide"}

{"prompt": "What is the largest mammal on Earth?", "completion": "Blue whale"}

{"prompt": "Which element has the atomic number 1?", "completion": "Hydrogen"}

Validation Data

{"prompt": "What is the chemical formula for water?", "completion": "H2O"}

{"prompt": "What is the square root of 81?", "completion": "9"}

{"prompt": "Who wrote the play 'Romeo and Juliet'?", "completion": "William Shakespeare"}

{"prompt": "What is the freezing point of water in Celsius?", "completion": "0 degrees Celsius"}

References:

https://medium.com/@smitkumbhani080/how-to-train-a-pre-trained-large-language-model-llm-in-python-using-openai-easy-27680c92fc3d

-- Living Mobile --

Sunday, July 9, 2023

Training a pre-traininged LLM using OpenAI

No comments:

Post a Comment

Followers

Blog Archive

About Me