Sunday, July 9, 2023

Training a pre-traininged LLM using OpenAI

Pre-training is expensive but Fine Tuning is comparatively Cheaper 


training_data = """

Your training data goes here.

This can be a collection of articles, books, or any other relevant text.

"""


ine-tuning the Model To fine-tune the GPT-3.5 model with your training data, use the fine_tune function from the OpenAI library. Specify the training data, the model name, and any additional parameters you wish to include.



fine_tuning_job = openai.FineTune.create(

    model_engine=model_engine,

    n_epochs=n_epochs,

    batch_size=batch_size,

    learning_rate=learning_rate,

    max_tokens=max_tokens,

    training_file=os.path.abspath(training_file),

    validation_file=os.path.abspath(validation_file),

)


job_id = fine_tuning_job["id"]

print(f"Fine-tuning job created with ID: {job_id}")



You can use the OpenAI API to monitor the progress of your fine-tuning job. The following code snippet shows how to fetch the status of the fine-tuning job:


import time


while True:

    fine_tuning_status = openai.FineTune.get_status(job_id)

    status = fine_tuning_status["status"]

    print(f"Fine-tuning job status: {status}")


    if status in ["completed", "failed"]:

        break


    time.sleep(60)




fine_tuned_model_id = fine_tuning_status["fine_tuned_model_id"]


# Use the fine-tuned model for text generation

def generate_text(prompt, model_id, max_tokens=50):

    response = openai.Completion.create(

        engine=model_id,

        prompt=prompt,

        max_tokens=max_tokens,

        n=1,

        stop=None,

        temperature=0.5,

    )

    return response.choices[0].text.strip()


prompt = "Your example prompt goes here."

generated_text = generate_text(prompt, fine_tuned_model_id)

print(f"Generated text: {generated_text}")



Training Data


{"prompt": "What is the capital of France?", "completion": "Paris"}

{"prompt": "Which gas do plants absorb from the atmosphere?", "completion": "Carbon dioxide"}

{"prompt": "What is the largest mammal on Earth?", "completion": "Blue whale"}

{"prompt": "Which element has the atomic number 1?", "completion": "Hydrogen"}

Validation Data


{"prompt": "What is the chemical formula for water?", "completion": "H2O"}

{"prompt": "What is the square root of 81?", "completion": "9"}

{"prompt": "Who wrote the play 'Romeo and Juliet'?", "completion": "William Shakespeare"}

{"prompt": "What is the freezing point of water in Celsius?", "completion": "0 degrees Celsius"}



References:

https://medium.com/@smitkumbhani080/how-to-train-a-pre-trained-large-language-model-llm-in-python-using-openai-easy-27680c92fc3d

No comments:

Post a Comment