A Private Q&A with Your Diary
Remember that feeling of flipping through a dusty diary, a forgotten treasure trove of memories? But wouldn’t it be amazing if your diary could talk back?
Imagine having the ability to ask questions about your life and receive answers derived from your personal reflections, all while ensuring your data remains completely secure and offline. This tutorial will guide you through utilizing the AI magic to unlock the wisdom hidden within your diary entries.
Here’s a glimpse into how it works:
You Ask: “Which hostel did I live in?”
It answers: “According to the diary entry on May 6, 2017, you lived in Pampa Hostel during your time at IIT Madras.
Pre-requisites:
- Python3 and Ollama installed
- A machine that can run llama3 locally (if not, check Appendix for alternative)
- Python dependencies (install the following through pip)
# Install appropriate Python libraries
# LlamaIndex (to store documents in embedded vector space)
pip install llama_index
# LlamaIndex Ollama LLM bridge
pip install llama-index-llms-ollama
# LlamaIndex Huggingface embeddings to transform the query to vector space
# This avoids calling OpenAI embeddings
pip install llama-index-embeddings-huggingface
Tools used:
LlamaIndex
LlamaIndex is a framework for building context-augmented LLM applications. Context augmentation refers to any use case that applies LLMs on top of your private or domain-specific data. We will be using the python library to make life simpler for us.
Ollama
Ollama focuses on developing cutting-edge tools and technologies for LLMs. We will use Ollama to run llama3 model on our local machine.
Hugging Face Embeddings by BAAI
Hugging Face is a platform where the machine learning community collaborates on models, datasets, and applications. BAAI has released free-to-use embeddings on Huggingface that we will use in place of OpenAI embeddings. We have used bge-small for feature extraction, but you can use bge-large, bge-base or any other embeddings for that matter.
git clone https://huggingface.co/BAAI/bge-small-en-v1.5
Import appropriate libraries
from llama_index.core import Document, VectorStoreIndex, Settings, SimpleDirectoryReader
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
Read all diary pages as documents.
Instead of using a directory with text files, you can read a PDF, EPUB and 10+ other formats through SimpleDirectoryReader. And there are 20+ other connectors like GoogleDocs Reader, Webpage reader, Remote file reader or database reader that you can explore based on where you have your data stored.
# Load documents and build index
# This directory contains pages of my life, each page representing one date.
# It may look like
#
# ----- 2024-06-09.txt -----
# Hasit's Personal Diary - June 9, 2024
#
# It was a lazy sunday. I was playing around with LlamaIndex, and felt I should
# write a very basic and easy to use tutorial on how to use
# Large language model to search and get answers from your diary.
documents = SimpleDirectoryReader(
"./diary" # Replace with your directory path (this contains all diary pages)
).load_data()
Set embedding model
We are using BAAI small model for embedding, but you can use some other embedding model as well (check Appendix)
# EMBED_MODEL_NAME refers to model for embedding document to vector space
# This is the same model we cloned, if you cloned bge-large-en-v1.5,
# change this accordingly
EMBED_MODEL_NAME = "bge-small-en-v1.5"
# Global Settings for embedding model
# setting this ensures this embedding is used in your Vector store
Settings.embed_model = HuggingFaceEmbedding(
model_name=EMBED_MODEL_NAME
)
Create local in-memory vector store index
# Create a vector store index
index = VectorStoreIndex.from_documents(documents=documents)
Build query engine from index
We are using llama3 of Ollama, but you can use OpenAI or any other LLM here.
# Instantiate query engine
# We are using query engine for simplicity,
# It does not have any context about previous conversations
# Each query is fresh query
engine = index.as_query_engine(
llm=Ollama(model="llama3", request_timeout=120.0)
)
The last and easiest part: run a loop to query your past
# Query the index
while True:
# Get user query
query = input("Ask a question about Hasit's diary (or type 'quit' to exit): ")
# Exit condition
if query.lower() == 'quit':
break
# Query the engine and print results
result = engine.query(query)
print(result)
Final Notes:
- This is far from complete, but it’s a start. We can use LlamaIndex ingestion to keep the index updated, and it should ideally be persisted (we are keeping it in-memory here).
- This is built as a query engine without keeping past context, without optimizing prompt template, temperature, or merging multiple documents.
- If you can’t run Llama locally due to compute or memory constraints, you can change the LLM to a different provider. You can use OpenAI (paid), HuggingFace inference endpoints (free), or any other LLM. Keep in mind that you’ll be sharing your data with the respective organization(s) though.
Appendix
Using Llama-3 inference API by HuggingFace as LLM
from llama_index.llms.huggingface import HuggingFaceInferenceAPI
llm = HuggingFaceInferenceAPI(
# any model that you want to use (and have access to)
model_name="meta-llama/Meta-Llama-3-8B",
token='hf_XXX'
)
Using OpenAI embedding
import os
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
os.environ["OPENAI_API_KEY"] = "sk-..."
embed_model = OpenAIEmbedding(embed_batch_size=10)
Settings.embed_model = embed_model
Using OpenAI as LLM
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
llm = OpenAI(temperature=0.2, model="gpt-4")