Step-by-step coding guide to using Unsploth AI on Google CoLab using hybrid datasets and Lora-optimized step-by-step coding guide for efficiently fine-tuning QWEN3-14B

Fine-tuning LLM often requires a wide range of resources, time and memory, which can hinder rapid experimentation and deployment. Unsploth AI revolutionizes this process by enabling fast, efficient latest models with minimal GPU memory, such as Qwen3-14b, leveraging advanced technologies (such as 4-bit quantization) and LORA (low-level adaptation). In this tutorial, we use a combination of reasoning and following datasets to fine-tune the actual implementation on Google Colab QWEN3-14B, combining Unsloth’s FastanguageModel Utilities with TRL.SFTTRAINER users to achieve powerful micro-effect performance with consumer-grade hardware.
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
!pip install unsloth
else:
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
!pip install --no-deps unsloth
We install all the required libraries needed to fine-tune the QWEN3 model using Unsploth AI. It can conditionally install dependencies based on the environment and use the lightweight approach on COLAB to ensure compatibility and reduce overhead. Includes key components like BitsandBytes, TRL, Xformers, and Unsloth_zoo to enable 4-bit quantitative training and LORA-based optimization.
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Qwen3-14B",
max_seq_length = 2048,
load_in_4bit = True,
load_in_8bit = False,
full_finetuning = False,
)
We load the QWEN3-14B model using FastLanguageModel from the Unsploth library, which has been optimized for efficient fine-tuning. It initializes the model at a context length of 2048 and loads it with 4 bit precision, greatly reducing memory usage. Complete fine-tuning is disabled to make it suitable for lightweight parameter-efficient technologies such as Lora.
model = FastLanguageModel.get_peft_model(
model,
r = 32,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha = 32,
lora_dropout = 0,
bias = "none",
use_gradient_checkpointing = "unsloth",
random_state = 3407,
use_rslora = False,
loftq_config = None,
)
We use fastlanguagemodel.get_peft_model to apply lora (low-level adaptation) to the QWEN3 model. It injects trainable adapters into specific transformer layers (e.g. Q_PROJ, V_PROJ, etc.) at a level of 32, which can be fine-tuned efficiently while freezing most model weights. The memory usage is further optimized using the “unplugged” gradient checkpoint, making it suitable for training large models on limited hardware.
from datasets import load_dataset
reasoning_dataset = load_dataset("unsloth/OpenMathReasoning-mini", split="cot")
non_reasoning_dataset = load_dataset("mlabonne/FineTome-100k", split="train")
We use the library to load the dataset of two precurves from the hug face wheel. Reasoning_dataset contains the business chain (COT) problem in OpenMathReason-Mini from Unsploth, aiming to enhance logical reasoning in the model. non_reasoning_dataset draws general guidance follow-up data from Mlabonne’s Finetome-100K, which helps the model learn a wider range of conversational and task-oriented skills. Together, these datasets support a comprehensive fine-tuning goal.
def generate_conversation(examples):
problems = examples["problem"]
solutions = examples["generated_solution"]
conversations = []
for problem, solution in zip(problems, solutions):
conversations.append([
{"role": "user", "content": problem},
{"role": "assistant", "content": solution},
])
return {"conversations": conversations}
This function (generate_Conversation) converts the original question-answer data from the inference dataset to a chat-style format suitable for micro-tuning. For each question and its corresponding generated solution, a conversation was conducted, where the user asked the question and the assistant provided the answer. The output is a dictionary list of structures expected by the chat-based language model and the tokenized data is prepared using the chat template.
reasoning_conversations = tokenizer.apply_chat_template(
reasoning_dataset["conversations"],
tokenize=False,
)
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(non_reasoning_dataset)
non_reasoning_conversations = tokenizer.apply_chat_template(
dataset["conversations"],
tokenize=False,
)
import pandas as pd
chat_percentage = 0.75
non_reasoning_subset = pd.Series(non_reasoning_conversations).sample(
int(len(reasoning_conversations) * (1.0 - chat_percentage)),
random_state=2407,
)
data = pd.concat([
pd.Series(reasoning_conversations),
pd.Series(non_reasoning_subset)
])
data.name = "text"
We prepare the fine-tuning dataset by converting the inference and instruction dataset into a consistent chat format and then combining it. It first applies the token’s apply_chat_template to convert the structured conversation into a string that can be erkniz. The Standardize_sharegpt function normalizes the instruction dataset into a compatible structure. The 75-25 mix is then created by sampling 25% of the non-contest (instruction) conversations and combining them with the inference data. This mix ensures that the model is exposed to logical reasoning and general guidance to adhere to the task, thereby improving its versatility during training. The final combined data is stored as a single column panda series called “text”.
from datasets import Dataset
combined_dataset = Dataset.from_pandas(pd.DataFrame(data))
combined_dataset = combined_dataset.shuffle(seed=3407)
from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=combined_dataset,
eval_dataset=None,
args=SFTConfig(
dataset_text_field="text",
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
max_steps=30,
learning_rate=2e-4,
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
report_to="none",
)
)
We have preprocessed conversations, wrap them in the embracing facial dataset (make sure the data is in a consistent format), and then mix the dataset and repeatability with fixed seeds. Then, initialize the fine-tuning coach using TRL’s SFTTrainer and SftConfig. Set up the trainer to use a combined dataset (with text column field “text”) and define training hyperparameters such as batch size, gradient accumulation, number of warm-up and training steps, learning rate, optimizer parameters, and linear learning rate scheduler. This configuration is designed to make effective fine-tuning while maintaining repeatability and recording minimal details (report_to = “none”).
Trainer.Train() uses SFTTrainer to start the fine-tuning process of the QWEN3-14B model. It trains the model on a ready-made mixed dataset of prepared inference and guidance following conversations, optimizing the parameters of Lora adaptation only through basic discomfort settings. Training will be performed according to the previously specified configuration (e.g., max_steps = 30, batch_size = 2, lr = 2e-4), and the progress will be printed in each recording step. This final command initiates an actual model adaptation based on your custom data.
model.save_pretrained("qwen3-finetuned-colab")
tokenizer.save_pretrained("qwen3-finetuned-colab")
We save the fine-tuned model and token amount into the “QWEN3-Fentade-Kolble” directory. By calling save_pretaining(), the adapted weights and token configuration can be reloaded later for inference or further training, or locally or uploaded to the hugged face hub.
All in all, with the uncomfortable AI of QWEN3-14B (such as Qwen3-14b), using limited resources is feasible and very effective and easy to access. This tutorial demonstrates how to load a 4-bit quantitative version of a model, apply a structured chat template, mix multiple datasets for better generalization, and train using TRL’s SFTTrainer. Whether you are building a custom assistant or a professional domain model, Unsploth’s tools greatly reduce the barriers to fine-tuning on a large scale. As the open source fine-tuning ecosystem develops, Unsploth continues to lead to faster, cheaper, and more practical LLM training.
View Colab notebook. All credits for this study are to the researchers on the project. Also, please feel free to follow us twitter And don’t forget to join us 95k+ ml reddit And subscribe Our newsletter.
Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.
🚨Build a Genai you can trust. ⭐️Parlant is your open source engine for controlled, compliance and purposeful AI conversations – Star Parlant on Github! (Promotion)