2026-03-04 17:21:41
Imagine asking:
"What's the best luxury safari in Maasai Mara?"
and instantly getting personalized travel recommendations powered by
AI.
That's exactly what I built --- an AI Tourism Intelligence Assistant
that helps travelers discover the best travel packages in Kenya based on
their budget, travel style, duration, and preferred destination.
In this article, I'll walk you through:
• The idea behind the project
• How I built the AI recommendation system
• The RAG architecture powering it
• How vector search makes travel discovery smarter
• Deployment with Streamlit
Kenya is one of the world's most beautiful tourism destinations,
offering:
But planning trips can be frustrating because:
• Travel packages are scattered across multiple websites
• Platforms rarely provide personalized recommendations
• Comparing destinations based on budget or style is difficult
So I decided to build an AI-powered tourism assistant that could:
✔ Understand traveler preferences
✔ Retrieve relevant travel packages
✔ Generate intelligent recommendations
Users simply input their preferences:
The system then returns relevant travel packages from a tourism
database.
Example query:
Budget: $2000
Days: 5
Style: Relaxing
Destination: Diani
The assistant responds with recommended travel packages matching those
criteria.
Python
PostgreSQL
pgvector
Mistral AI embeddings
Retrieval-Augmented Generation (RAG)
Playwright
BeautifulSoup
SQLAlchemy
Streamlit
Streamlit Cloud
Neon PostgreSQL
Tourism Websites
│
▼
Web Scraping (Playwright)
│
▼
PostgreSQL Database
│
▼
Embedding Generation (Mistral AI)
│
▼
Vector Database (pgvector)
│
▼
Recommendation Engine
│
▼
Streamlit Web Application
The project uses Retrieval‑Augmented Generation (RAG) to deliver
intelligent responses.
Instead of the AI guessing answers, it retrieves real travel packages
from the database first.
Pipeline:
User Query
│
▼
Convert Query → Embedding
│
▼
Vector Similarity Search
│
▼
Retrieve Relevant Travel Packages
│
▼
Generate Personalized Response
This ensures the AI responds with real tourism data rather than
hallucinations.
The database stores travel information in structured tables such as:
travel_packages
destinations
Each travel package contains:
Traditional search relies on keywords.
Vector search understands meaning and context.
For example, if a user searches:
"Affordable safari in Kenya"
The system can still return:
• Budget Maasai Mara packages
• Lake Nakuru safari deals
• Amboseli wildlife tours
Even if those exact words were not used.
The frontend is built using Streamlit, which makes it easy to create
interactive data apps.
Users can:
✔ Enter travel preferences
✔ Browse travel packages
✔ Receive AI‑powered recommendations
The application is deployed using:
Streamlit Cloud for hosting the web app.
Neon PostgreSQL for the managed database.
This allows the project to run fully online.
The project successfully delivers:
✔ AI-powered tourism recommendations
✔ Semantic search using vector embeddings
✔ A fully deployed web application
✔ Personalized travel package discovery
Many travel websites load content dynamically, which required
Playwright.
Scraped data often contained:
• Missing prices
• Duplicate packages
• Inconsistent destination names
Embedding generation triggered API rate limits, requiring retry
logic.
Deployment required careful setup of:
Future versions of the system could include:
• AI itinerary generation
• Social media tourism trend analysis
• Integration with booking APIs
• User accounts and saved trips
Combining vector databases, AI retrieval systems, and interactive web
apps opens powerful opportunities for building intelligent data
products.
This project demonstrates how AI can improve tourism discovery and
travel planning.
2026-03-04 17:18:50
"Cooking videos are great, but following along in the kitchen is a pain. You're elbow-deep in dough and suddenly need to rewind for that one ingredient you missed."
So I built a small pipeline that takes any YouTube cooking video, pulls the audio, sends it to Amazon Transcribe, and gives me a clean text file of the entire recipe.
No paid tools. No complex setup. Just AWS services and a few Python scripts.
YouTube Video
↓
Download Audio (yt-dlp)
↓
Upload to S3
↓
Amazon Transcribe
↓
recipe.txt
Four steps. That's it.
I used yt-dlp to pull just the audio from the video. No need to download the full video.
yt-dlp \
--extract-audio \
--audio-quality 0 \
--output "output/audio.%(ext)s" \
"https://youtu.be/YOUR_VIDEO_ID"
One thing I ran into — ffmpeg was not installed on my machine, so the mp3 conversion failed. But Amazon Transcribe supports webm format natively, so I skipped the conversion entirely and uploaded the raw .webm file. Saved time.
BUCKET_NAME="recipe-transcribe-$(date +%s)"
aws s3 mb s3://$BUCKET_NAME --region us-east-1
aws s3 cp output/audio.webm s3://$BUCKET_NAME/audio.webm
Using date +%s as a suffix keeps the bucket name unique without any extra thinking.
import boto3
BUCKET_NAME = "your-bucket-name"
JOB_NAME = "recipe-job-01"
REGION = "us-east-1"
MEDIA_URI = f"s3://{BUCKET_NAME}/audio.webm"
client = boto3.client("transcribe", region_name=REGION)
client.start_transcription_job(
TranscriptionJobName=JOB_NAME,
Media={"MediaFileUri": MEDIA_URI},
MediaFormat="webm",
LanguageCode="en-US",
OutputBucketName=BUCKET_NAME,
OutputKey="transcript.json",
)
Amazon Transcribe picks up the file from S3 and writes transcript.json back to the same bucket once done.
while True:
response = transcribe.get_transcription_job(TranscriptionJobName=JOB_NAME)
status = response["TranscriptionJob"]["TranscriptionJobStatus"]
print(f"Status: {status}")
if status == "COMPLETED":
break
if status == "FAILED":
raise RuntimeError("Job failed")
time.sleep(15)
# Download and extract plain text
s3.download_file(BUCKET_NAME, "transcript.json", "output/transcript.json")
with open("output/transcript.json") as f:
data = json.load(f)
text = data["results"]["transcripts"][0]["transcript"]
with open("output/recipe.txt", "w") as f:
f.write(text)
The script checks every 15 seconds. For a 10-minute video, the job finished in about a minute.
Here's what came out for a Guntur Chicken Masala video:
Readable. Accurate. Ready to use in the kitchen.
IAM Permissions You Need
{
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:PutObject",
"s3:GetObject",
"transcribe:StartTranscriptionJob",
"transcribe:GetTranscriptionJob"
],
"Resource": "*"
}
What I'd Build Next
The full code is on GitHub:
(https://github.com/robindeva/Extracting-a-Recipe)
2026-03-04 17:18:33
A few weeks ago, something stressful happened.
I needed urgent access to one of my GitHub repositories.
And I couldn’t access it.
It wasn’t a dramatic outage.
It wasn’t the end of the world.
But it was enough to make me uncomfortable.
As developers, our repositories are everything.
Our code.
Our projects.
Our work.
Sometimes even our income.
And that day I thought:
“If I had my repositories automatically mirrored somewhere else, I wouldn’t be worried right now.”
That’s when the idea for Nexora was born.
Most developers host their projects on GitHub.
Some use GitLab.
Some use both.
But very few automatically sync their repositories between platforms.
If something happens — access issue, account limitation, private repo problem — you suddenly realize:
You are fully dependent on one provider.
Manual mirroring exists.
Git remotes exist.
But it’s not simple.
And honestly, most of us don’t set it up.
What if:
That’s Nexora.
A simple tool that:
No more “what if I lose access?”
I’m a developer.
I’ve deployed apps.
I’ve worked with VPS.
I’ve built APIs.
I know how fragile infrastructure can feel sometimes.
Nexora isn’t built from theory.
It’s built from that small moment of stress when I couldn’t access my code.
Sometimes products don’t start from huge market research.
Sometimes they start from:
“This annoyed me. I’m fixing it.”
If your repositories matter, redundancy matters.
Nexora is currently in early stage.
I’m validating the idea and gathering early adopters.
If this sounds useful to you, you can join the waitlist here:
👉 Join the Nexora Waitlist: https://nexoravitrine.vercel.app
Do you currently mirror your repositories?
Would automatic sync between GitHub and GitLab be useful to you?
What would make you trust such a tool?
I’m building this in public — so your feedback matters.
Thanks for reading 🚀
2026-03-04 17:15:59
Creating software for a niche market presents its own Challenges & Opportunities. When we began developing Dance Master Pro, we had the simple goal of allowing dance studios to lower their Administrative workload. This allows them to focus on Teaching & building their Community.
Almost immediately we learned that dance studios function differently than many other small businesses.
Most studios have structured programs that charge tuition Monthly, have enrollment/registration seasons, and produce Large recitals that involve hundreds of Students and Family members. However many studios still run their businesses with Spreadsheets, Email threads, and Manual payment tracking.
The back-end operations for a studio have tremendous complexity.
Studio Owners have to Manage Student Enrollment, Track Costs, & conduct regular (& usually Monthly) tuition collections, at the same time Communicate with their Clients (Parents), Develop and execute Recital Logistics, and Create Parent-Student Class Schedules.
Many studio owners can easily spend 10- 15 hours per week on Administrative aspects of their Business.
This influenced the direction of our product development strategy.
Rather than developing an all-purpose booking system, we concentrated our efforts into creating workflows that provide structure to workloads for dance studios — be it through managing tuition through recurring payments, tracking students through their enrollments or streamlining the production of a recital.
The most surprising realization gained during the course of our operations within such a narrowly defined market, is the degree to which it is vital to understand intimately the day-to-day surroundings and realities of the users being served.
Generic software solutions frequently do not work because they attempt to address all customers.
Conversely, niche SAS products typically perform well and solve very specific daily problems in the operations of a business.
Another lesson we learned is that niche markets usually create very strong communities. For example, dance studios often depend on their customers for referrals, family-related business relationships over long periods of time and continuing education of their students.
Technology in this area must enhance relationships — not replace them.
Our goal has been clear from day one. We want to minimize administrative friction for studio owners so they can spend as much time as possible doing what they most enjoy — teaching dance and building community around themselves.
For anyone developing a niche SAAS product, one of the greatest advantages is clarity. When you develop a true understanding of your users' daily operational challenges, it makes developing meaningful solutions much easier.
If you are curious about the operational processes of dance studios, we have provided additional explanations of how they operate in this post:
2026-03-04 17:15:42
In this video I demonstrate the new OpenTelemetry injector.
It's a mechanism to automatically inject OTEL into your code with zero code or startup script changes. It's potentially a great way to gain Observability of non Kubernetes workloads like VMs. It leverages LD_PRELOAD. You add the LD_PRELOAD instruction to your VM at startup and the rest happens automatically.
Do heed the warnings towards the end of the video though since it's still early days for this tool!
2026-03-04 17:11:29
If you’ve had your HTTP request blocked despite using correct headers, cookies, and clean IPs, there’s a chance you are running into one of the simplest forms of blocking, and one of the most confusing for beginners.
Chances are, you will recognise the problem. You found the hidden API, and your request works perfectly in Postman... but it fails instantly within your Python code.
It’s called TLS fingerprinting. But the good news is, you can solve it. In fact, when I showed this to some developers at Extract Summit, they couldn’t believe how straightforward it was to fix.
CAPTION: “I copied the request -> matching headers, cookies and IP, but it still failed?”
Your TLS fingerprint
Let’s start with a question. How do the servers and websites know you’ve moved from Postman to making the request in Python? What do they see that you can’t? The key is your TLS fingerprint.
To use an analogy: We’ve effectively written a different name on a sticker and stuck it to our t-shirt, hoping to get past the bouncer at a bar.
Your nametag (headers) says "Chrome."
But your t-shirt logo (TLS handshake) very obviously says "Python."
It’s a dead giveaway. This mismatch is spotted immediately. We need to change our t-shirt to match the nametag.
To understand how they spot the “logo”, we need to look at the initial “Client Hello” packet. There are three key pieces of information exchanged here:
This is because Python’s requests library uses OpenSSL, while Chrome uses Google's BoringSSL. While they share some underlying logic, their signatures are notably different. And that’s the problem.
The root cause of this mismatch lies in the underlying libraries.
Python’s requests library relies on OpenSSL, the standard cryptographic library found on almost every Linux server. It is robust, predictable, and remarkably consistent.
Chrome, however, uses BoringSSL, Google’s own fork of OpenSSL. BoringSSL is designed specifically for the chaotic nature of the web and it behaves very differently.
The biggest giveaway between the two is a mechanism called GREASE (Generate Random Extensions And Sustain Extensibility).
[
"TLS_GREASE (0xFAFA)",
....
]
Chrome (BoringSSL) intentionally inserts random, garbage values into the TLS handshake - specifically, in the cipher suites and extensions lists. It does this to "grease the joints" of the internet, ensuring that servers don't crash when they encounter unknown future parameters.
This is one of the key changes
0x0a0a).
So, when an anti-bot system sees a handshake claiming to be "Chrome 120" but lacking these random GREASE values, it knows instantly that it is dealing with a script. It’s not just that your shirt has the wrong logo; it’s that your shirt is too clean.
Anti-bot companies take all that handshake data and combine it into a single string called a JA3 fingerprint.
Salesforce invented this years ago to detect malware, but it found its way into our industry as a simple, effective way to fingerprint HTTP requests. Security vendors have built databases of these fingerprints.
It is relatively straightforward to identify and block any request coming from Python’s default library because its JA3 hash is static and well-known.
This code snippet would yield the below JSON response.
def get_ja3_info():
url = "https://tls.peet.ws/api/clean"
with requests.Session() as session:
response = session.get(url)
response.raise_for_status()
data = response.json()
print(json.dumps(data))
Note the lack of akamai_hash
{
"ja3":
"771,4866-4867-4865-49196-49200-49195-49199-52393-52392-49188-49192-49187-49191-159-158-107-103-255,0-11-10-16-22-2
3-49-13-43-45-51-21,29-23-30-25-24-256-257-258-259-260,0-1-2",
"ja3_hash": "a48c0d5f95b1ef98f560f324fd275da1",
"ja4": "t13d1812h1_85036bcba153_375ca2c5e164",
"ja4_r":
"t13d1812h1_0067,006b,009e,009f,00ff,1301,1302,1303,c023,c024,c027,c028,c02b,c02c,c02f,c030,cca8,cca9_000a,000b,000
d,0016,0017,002b,002d,0031,0033_0403,0503,0603,0807,0808,0809,080a,080b,0804,0805,0806,0401,0501,0601,0303,0301,030
2,0402,0502,0602",
"akamai": "-",
"akamai_hash": "-",
"peetprint":
"772-771|1.1|29-23-30-25-24-256-257-258-259-260|1027-1283-1539-2055-2056-2057-2058-2059-2052-2053-2054-1025-1281-15
37-771-769-770-1026-1282-1538|1||4866-4867-4865-49196-49200-49195-49199-52393-52392-49188-49192-49187-49191-159-158
-107-103-255|0-10-11-13-16-21-22-23-43-45-49-51",
"peetprint_hash": "76017c4a71b7a055fb2a9a5f70f05112"
}
Putting the above JA3 hash into ja3.zone clearly shows this is a python3 request, using urllib3:
As mentioned, simply changing headers and IP addresses won’t make a difference, as these are not part of the TLS handshake. We need to change the ciphers and Extensions to be more like what a browser would send.
The best way to achieve this in Python is to swap requests for a modern, TLS-friendly library like curl_cffi or rnet.
Here is how easy it is to switch to curl_cffi:
from curl_cffi import requests
# note the impersonate argument & import above
def get_ja3_info():
url = "https://tls.peet.ws/api/clean"
with requests.Session() as session:
response = session.get(url, impersonate="chrome")
response.raise_for_status()
data = response.json()
print(json.dumps(data))
"akamai_hash": "52d84b11737d980aef856699f885ca86"
CAPTION: Note - I searched via the akamai_hash here as the fingerprint from the JA3 hash wasn’t in this particular database.
By adding that impersonate parameter, you are effectively putting on the correct t-shirt.
Make curl_cffi or rnet your default HTTP library in Python. This should be your first port of call before spinning up a full headless browser.
A simple change (which brings benefits like async capabilities) means you don’t fall foul of TLS fingerprinting. curl-cffi even has a requests-like API, meaning it's often a drop-in replacement.