OpenAI Fine-tuning Usage

· 2 min read

OpenAI provides Fine-tuning functionality that allows customizing AI models

Prerequisites

  1. OpenAI account registration and api key acquisition

  2. Install the toolkit, using Node.js as an example

    npm i openai
    

Generate Dataset

Organize data in the following format:

https://github.com/alanhe421/express-demo/blob/c36ea133139b5d611845628a0c6ba5d6e995c209/test/trainai.jsonl

OpenAI officially recommends 50-100 examples for fine-tuning datasets, with a minimum requirement of 10 examples.

{"messages": [{"role": "system", "content": "Given a sports headline, provide the following fields in a JSON dict, where applicable: \"player\" (full name), \"team\", \"sport\", and \"gender\"."}, {"role": "user", "content": "Sources: Colts grant RB Taylor OK to seek trade"}, {"role": "assistant", "content": "{\"player\": \"Jonathan Taylor\", \"team\": \"Colts\", \"sport\": \"football\", \"gender\": \"male\" }"}]}

For the reasoning behind this data format requirement, see the official explanation: https://platform.openai.com/docs/guides/fine-tuning/data-formatting

Validate Data

openai tools fine_tunes.prepare_data -f ./test/trainai.jsonl

When performing data validation, you may encounter some errors, such as:

missing pandas

OpenAI error:

missing pandas

The error occurs because libraries like numpy and pandas, due to their large size, are not installed by default. They are required for certain features of the library but are typically not needed for API communication. If you encounter a MissingDependencyError, install them using:

 pip install pandas   

Create Model

openai api fine_tunes.create -t test/trainai.jsonl -m davinci 

After successful creation, the command line will display the model name/training cost

Use Model

Once the model training is successful, it can be used. In addition to direct API calls, for simple testing you can use the official playground. The model dropdown will show the successfully trained models, and you can test them directly.

https://static.1991421.cn/2023/2023-03-26-232507.jpeg

Update Model?

OpenAI currently supports secondary fine-tuning of already fine-tuned models.

Pricing

Model training costs money. For specific pricing, see: https://openai.com/pricing#language-models

Final Thoughts

With Fine-tuning, building applications like official documentation assistants/customer service will become much simpler.