Create a dataset file
Your dataset should be ajsonl file (a file containing a json object in every line), following the same format as the OpenAI fine-tuning data format. The first message should be a system message, and afterwards, roles should alternate between user and assistant:
Start a Finetuning Run
To start a finetuning run, click onto “Train a model” in Haven’s dashboard.
- Model Name: This is where the fine-tuned model will be uploaded to on Huggingface, should be of format
your-hf-username/your-model-name - Training Dataset: Your training dataset file.
- Base model: Model you want to finetune. We suggest HuggingFaceH4/zephyr-7b-beta.
- Learning Rate: Can be
high,mediumorlow. In general, smaller datasets with less than 500 chats should usehigh. - Number of epochs: The number of training iterations over your full dataset. This value should normally be in the range of one to five.
- Huggingface Token: A write-access Huggingface token to upload your model. To obtain an access token, see here.
Testing your trained Model
Once training has finished, you will be able to see your model repository by clicking onto the Huggingface button in Haven’s dashboard.
transformers / peft library:
- Self-Hosting on AWS / GCP with vllm
- Running the model on your laptop using llama.cpp
- Deploying a Huggingface Inference Endpoint
- Running on Haven (coming soon)