2026-03-14 09:30:33
An Azure Firewall is a cloud-based network security service in Microsoft Azure that helps protect your virtual network resources by filtering and controlling traffic between your Azure resources and the internet or other networks.
Scenario:
Your organization requires centralized network security for the application virtual network. As the application usage increases, more granular application-level filtering and advanced threat protection will be needed. Also, it is expected the application will need continuous updates from Azure DevOps pipelines. You identify these requirements.
. Azure Firewall is required for additional security in the app-vnet.
. A firewall policy should be configured to help manage access to the application.
. A firewall policy application rule is required. This rule will allow the application access to Azure DevOps so the application code can be updated.
. A firewall policy network rule is required. This rule will allow DNS resolution.
Skilling Tasks:
. Create an Azure Firewall.
. Create and configure a firewall policy.
. Create an application rule collection.
. Create a network rule collection.
Architecture diagram:
No 1. Create Azure Firewall subnet in our existing virtual network.
i. In the search box at the top of the portal, enter Virtual networks. Select Virtual networks in the search results.
iv. Select + Subnet and Configure.
v. Save Changes.
Note: Leave all other settings as default.
No 2. Create an Azure Firewall.
i. In the search box at the top of the portal, enter Firewall. Select Firewall in the search results.
iii. Use Values provided in your deployment guide to create the firewall.

v. Select Review + create and then select Create.
No 3. Update the Firewall Policy.
i. In the portal, search for and select Firewall Policies.
No 4. Add an application rule.
i. In the Rules blade, select Application rules and then Add a rule collection.

ii. Configure the application rule collection and then select Add.
Note: The AllowAzurePipelines rule allows the web application to access Azure Pipelines. The rule allows the web application to access the Azure DevOps service and the Azure website.
No 5. Add a network rule.
i. In the Rules blade, select Network rules and then Add a network collection.
2026-03-14 09:24:22
If you've ever tried to pull financial data from the SEC's EDGAR system, you probably know where this is going.
I wanted to build a stock screener. Simple idea — just show me revenue trends for a few companies. Should take an afternoon, right?
Nope. First you need a company's CIK number (Apple is 0000320193 — don't ask me why it's zero-padded to 10 digits). Then you download their XBRL filings, which are nested XML with namespaces like us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax. Then you realize different companies use slightly different tags for the same concept. Then you start questioning your life choices.
I spent way too long on this, so I wrapped the whole thing in a REST API. Pass a ticker, get JSON back. Done.
Company info:
GET /v1/company/AAPL
{
"cik": "320193",
"ticker": "AAPL",
"name": "Apple Inc.",
"sic": "3571",
"sic_description": "Electronic Computers",
"fiscal_year_end": "0926",
"state": "CA"
}
No CIK lookup. Just the ticker.
Financial statements:
GET /v1/financials/AAPL?statement=income_statement&limit=3
{
"ticker": "AAPL",
"name": "Apple Inc.",
"statements": {
"income_statement": [
{
"concept": "revenue",
"label": "Revenue from Contract with Customer",
"unit": "USD",
"data": [
{
"fiscal_year": 2025,
"fiscal_period": "FY",
"value": 383285000000.0,
"filed": "2025-10-31",
"form": "10-K"
}
]
}
]
}
}
Revenue, net income, EPS, operating expenses — all the stuff that took me hours to extract from XBRL, now in one call.
import requests
url = "https://sec-edgar-data-api.p.rapidapi.com/v1/financials/AAPL"
headers = {
"x-rapidapi-key": "YOUR_API_KEY",
"x-rapidapi-host": "sec-edgar-data-api.p.rapidapi.com"
}
params = {"statement": "income_statement", "limit": 5}
response = requests.get(url, headers=headers, params=params)
data = response.json()
for item in data["statements"]["income_statement"]:
if item["concept"] == "revenue":
for year in item["data"]:
print(f"FY{year['fiscal_year']}: ${year['value']:,.0f}")
FY2025: $383,285,000,000
FY2024: $394,328,000,000
FY2023: $365,817,000,000
That stock screener I mentioned? Finally built it.
Search companies (fuzzy matching, so typos are fine):
GET /v1/company?q=tesla
Filing history — 10-K, 10-Q, 8-K, whatever:
GET /v1/filings/TSLA?form=10-K&limit=5
Balance sheet and cash flow:
GET /v1/financials/MSFT?statement=balance_sheet
GET /v1/financials/MSFT?statement=cash_flow
Covers all 10,000+ SEC-registered public companies. The data comes directly from SEC EDGAR (data.sec.gov), so it updates as companies file.
Fair question. I looked at the alternatives — most are either $50+/month for basic access, or they give you 47 endpoints when you just need 4. I wanted something I could hand to a junior dev and they'd figure it out in 5 minutes. That's basically the design principle: if it needs documentation longer than a README, it's too complicated.
Free tier on RapidAPI — 100 requests/month, no credit card:
SEC EDGAR Data API on RapidAPI
I'm also running a bot at @SECEdgarBot that tweets when notable filings drop and flags interesting financial signals. Still early, but it's been fun to build.
All data sourced from SEC EDGAR (data.sec.gov) — publicly available US government data.
2026-03-14 09:18:33
"Please analyze this document"
Have you ever encountered a situation where, when you submitted a Google Document URL to the latest AI models like Gemini 3.1 Pro or Claude Opus 4.6, you received a response saying, "I cannot directly access the URL and read its contents"?
While AI is a powerful tool for text generation and summarization, it cannot retrieve content from authenticated internal document URLs as-is. This results in inefficient manual tasks such as copying and pasting lengthy meeting minutes or proposals, or downloading files as PDF/DOCX formats before uploading them.
This article explains how to solve the issue of AI not being able to read Google Documents using Google Drive API and OAuth 2.0. With proper configuration and a few lines of Python code, you can securely obtain document content as input for AI.
Because AI cannot read the URL directly, users resort to manually copying and pasting content or downloading and uploading files. For lengthy documents, this is time-consuming and risks copy errors or layout issues leading to missing information.
One might think setting the document to "public on the web" would allow AI to read it directly. However, this is highly dangerous from a security perspective. Exposing confidential internal documents publicly can lead to severe data leaks.
Even when attempting to use Google Drive API, incorrect permission settings can result in a googleapiclient.errors.HttpError: <HttpError 403: Insufficient Permission> error. Without understanding the necessary scopes, navigating Google Cloud Console can lead to getting stuck.
The core solution to resolve errors and balance security with convenience is "secure permission delegation via OAuth 2.0" and "scope configuration based on the principle of least privilege."
OAuth 2.0 is a mechanism that allows an application (Python script) to securely access Google accounts without knowing the user's password, using temporary "access tokens" issued by Google.
The "scope" defines which operations the access token permits. For retrieving document text, set the following scope:
https://www.googleapis.com/auth/drive.readonly
drive.readonly is the minimal and safest permission, allowing only read access to Google Drive files. This eliminates the risk of accidental deletion or modification.
.../auth/drive.readonly, then save.credentials.json, placing it in the working directory.Install required libraries. Here's an example using the fast package manager uv (regular pip works too).
uv pip install google-api-python-client google-auth-httplib2 google-auth-oauthlib
Save the following Python code as main.py.
import os.path
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
SCOPES =
def get_google_doc_content(doc_id):
"""Function to retrieve Google Document content as text"""
creds = None
# Load existing token if available
if os.path.exists("token.json"):
creds = Credentials.from_authorized_user_file("token.json", SCOPES)
# Re-authenticate if token is missing or invalid
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
"credentials.json", SCOPES
)
creds = flow.run_local_server(port=0)
# Save token for future use
with open("token.json", "w") as token:
token.write(creds.to_json())
try:
service = build("drive", "v3", credentials=creds)
# Google Documents are not binary files, so use export_media method
# Retrieve as plain text (MIME type text/plain)
request = service.files().export_media(
fileId=doc_id,
mimeType="text/plain"
)
# Download and decode content
response = request.execute()
text_content = response.decode('utf-8')
print(f"--- Content retrieved successfully for document ID: {doc_id} ---")
return text_content
except HttpError as err:
print(f"Error occurred: {err}")
return None
if name == "main":
# Target Google Document ID to process
# For URL https://docs.google.com/document/d/abc123xyz.../edit, ID is abc123xyz...
TARGET_DOC_ID = "Your document ID here"
content = get_google_doc_content(TARGET_DOC_ID)
if content:
print("\n=== Document content ===\n")
print(content + "...\n(Truncated)")
# Additional processing with the retrieved content (e.g., sending to AI API)
When the script is executed for the first time, a browser window opens displaying the Google login screen. Select your account and grant the requested permissions to complete authentication. Simultaneously, token.json is generated, enabling subsequent runs without browser authentication.
The key point of the retrieval logic is the use of the service.files().export_media method. While regular files (images, PDFs, etc.) are downloaded using get_media, Google Document format does not have an actual file, so export_media must be used to convert it to the specified format (in this case, text/plain).
Note: The content exported via export_media is limited to 10MB. Extreme caution is needed when handling extremely large documents.
Once the text is retrieved via API, it can be directly passed to services like Gemini API or Claude API.
Additionally, if you have a local PC environment equipped with a high-end GPU such as the RTX 5090 (32GB VRAM), you can load the retrieved text into local LLMs like Gemma 3 or NVIDIA Nemotron. This enables a completely offline environment for document retrieval, allowing secure data analysis of sensitive information.
Following the steps introduced here provides the following benefits:
Leveraging APIs to build an environment where AI can operate efficiently is crucial for future automation workflows. Be sure to obtain credentials.json and try streamlining your document processing.
2026-03-14 09:18:30
If you, like me, were thrilled by the explosive responsiveness of Google Gemini 2.5 Flash and dreamed of running it locally without privacy concerns, this article is for you. As a lawyer and auditor, I work daily with vast XBRL data and PDF documents, building a self-evolving AI system. My goal is clear: to construct a local LLM system that surpasses, or at least matches, Gemini 2.5 Flash in reasoning capability and speed, enabling it to achieve 80% accuracy on the bar exam multiple-choice section and flawless case handling in essays.
However, reality was harsh. The PC I used—a high-performance ASUS gaming rig with an RTX 5070 Ti and 8GB VRAM—was purchased with the assumption it could handle 32B-class models. Yet, when attempting to run such models, inference speed became unbearably slow, like a turtle. Even 7B models were sluggish, and 32B models caused main memory overflow, requiring data offloading to system RAM. This resulted in token generation taking minutes, with a spinning sandclock during web loading—a feeling of despair that eroded my development motivation. I felt trapped by the "intelligence wall," contemplating giving up.
Yet, through dialogue with Gemini, I found a breakthrough. This article details how I escaped three specific traps to reach the conclusion of "RTX 5090 + 32GB VRAM" as the optimal configuration, with step-by-step instructions for replicating a Gemini Flash-equivalent local LLM environment in 5 minutes, tailored for intermediate engineers.
No more suffering from slow inference speeds. Your local LLM environment can transform into a "knowledge lab" today.
My project goal was to run a model with Gemini 2.5 Flash-level "intelligence" locally. I believed a minimum of 32B (32 billion parameters) was necessary. However, with my RTX 5070 Ti (16GB VRAM), this goal was physically impossible.
7B models ran, but complex queries or long text generation caused delays of seconds to tens of seconds. Attempting 32B models like DeepSeek-R1-Distill-Qwen-32B caused VRAM overflow, offloading parts to system RAM. This resulted in inference speeds over 10x slower. The bottleneck was PCIe bus bandwidth (max ~64GB/s for Gen4 x16) versus GPU internal memory bandwidth (hundreds of GB/s to 1TB/s). The overhead of data transfer between layers caused questions to take minutes to answer, breaking my thought cycle.
After failures and trials, I reached a clear conclusion: "VRAM abundance is justice," and "RTX 5090 (32GB VRAM) is the only choice for local Gemini 2.5 Flash-equivalent performance."
The biggest bottleneck was 32B models exceeding VRAM and spilling into main memory. To solve this fundamentally, 32B models must fit entirely in VRAM. RTX 5090's 32GB VRAM was the decisive solution.
Loading a 32B model in FP16 (16-bit floating point) requires ~64GB VRAM—insufficient even for RTX 5090. However, 4-bit quantization (AWQ, GPTQ, GGUF) is standard, reducing model size to ~1/4. A 32B model becomes ~18-20GB. Adding KV cache for context length, RTX 4090's 24GB VRAM leaves only ~4GB after loading, causing OOM with long contexts. RTX 5090's 32GB VRAM provides >10GB headroom, handling thousands of tokens smoothly and enabling RAG tasks comfortably.
I chose PC Kobo's LEVEL-R789-LC285K-XK1X model. The decisive factors were:
Open Windows PowerShell with administrator privileges and run the following command to install WSL2 and Ubuntu:
wsl --install -d Ubuntu-24.04
After installation, open the WSL2 terminal and keep the system up to date:
sudo apt update && sudo apt upgrade -y
sudo apt install build-essential git curl wget -y
In WSL2, installing NVIDIA drivers on the Windows side is sufficient for GPU recognition. No driver installation is needed on the WSL2 side. Install only the CUDA Toolkit (version 13.1):
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-1
Verify GPU recognition:
Success is confirmed when "NVIDIA GeForce RTX 5090" and "32768MiB" (32GB VRAM) are displayed.
### Step 2: Building Python Environment (uv)
Use the fast `uv` package manager to create a clean Python virtual environment:
```curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
uv venv llm-env --python 3.11
source llm-env/bin/activate```
### Step 3: Installing Inference Libraries
Install core libraries for LLM inference. Follow PyTorch's official guide for your CUDA version, then add:
```uv pip install transformers accelerate bitsandbytes sentencepiece protobuf scipy
uv pip install vllm # High-speed inference engine```
### Step 4: Running LLM Model (Transformers Version)
Use Hugging Face's `transformers` library to load and run the model. Save the following code as `run_llm.py`. This configuration leverages 4-bit quantization to efficiently utilize 32GB VRAM:
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
# Model ID (example: DeepSeek-R1 32B distilled model)
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
# 4-bit quantization settings
# Utilizes RTX 5090's power to save VRAM and handle long contexts
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16 # RTX 50 series natively supports bfloat16
)
print(f"Loading model: {model_id}...")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
# Load model
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto", # Automatically assigns to GPU
trust_remote_code=True
)
print("Model loaded successfully!")
print(f"Current VRAM usage: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
def generate_text(prompt):
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.1
)
return tokenizer.decode(outputs, skip_special_tokens=True)
print("\n--- Start Chat (type 'exit' to quit) ---")
while True:
user_input = input("You: ")
if user_input.lower() == "exit":
break
prompt = f"User: {user_input}\nAssistant:"
response = generate_text(prompt)
print(f"AI: {response.split('Assistant:').strip()}")
python run_llm.py
While transformers offers ease of use, vLLM delivers superior performance for production-level speed. By combining the RTX 5090's expansive VRAM and the PagedAttention algorithm, throughput can be increased severalfold.
# Load 32B model with 4-bit quantization (AWQ) and start server
# ※ Model must be provided in AWQ format.
python -m vllm.entrypoints.openai.api_server \
--model casperhansen/deepseek-r1-distill-qwen-32b-awq \
--quantization awq \
--dtype half \
--gpu-memory-utilization 0.9 \
--port 8000
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "casperhansen/deepseek-r1-distill-qwen-32b-awq",
"messages":,
"temperature": 0.7
}'
This response speed will let you experience the true power of the RTX 5090. The sight of tokens flowing like a waterfall is breathtaking.
Even with the best hardware, software issues can stall development. I've compiled common pitfalls I've encountered or heard about, along with avoidance strategies.
When new GPUs are released, older drivers or CUDA versions may not work properly. Especially PyTorch libraries are closely tied to specific CUDA versions. The RTX 5090 (Blackwell generation) may not function with older CUDA (e.g., 11.x series).
Avoidance Strategy: Always install the latest stable NVIDIA drivers and CUDA Toolkit. Check the PyTorch official site for the CUDA version compatible with your PyTorch version and install accordingly. As in this article, use CUDA 13.1 and develop the habit of checking driver version with nvidia-smi and CUDA version with nvcc -V.
Attempting to load a model that 'should fit' in VRAM can cause CUDA Out Of Memory errors, crashing the process. Alternatively, offloading to shared GPU memory (main RAM) can cause extreme slowdowns.
While the RTX 5090 (32GB VRAM) provides ample space for 32B models, 70B-class models require optimization.
・Use quantization: 4-bit (AWQ, GPTQ) or trending EXL2 formats.
・Limit context length: Infinite conversations cause KV cache bloat. Use max_model_len to restrict.
The RTX 5090, while highly performant, can exceed 450W-500W power consumption. Insufficient power supply can cause sudden shutdowns under high load.
Select a PC with a minimum 1000W, preferably 1200W+ 80PLUS PLATINUM certified power supply. Ensure secure 12VHPWR cable connections and regularly check that connectors are fully seated.
2026-03-14 09:18:27
As a practical testing ground for verifying reasoning optimization and model handling, I first touched an OSS shogi software in January 2026.
As a result, I reached rank 1 by playing over 200 games with a rating exceeding 4500 on Floodgate (an online shogi server for computer shogi). Since I started programming in December 2025, this was achieved in approximately two months after touching the OSS.
This article is not a how-to guide on implementation, but rather discusses what was learned through shogi AI and how it can be applied to LLM research from the perspective of an LLM/RAG researcher.
In LLM research, one frequently encounters challenges such as reasoning optimization and model selection. However, LLM evaluation can be ambiguous. "Is the answer good?" often involves subjectivity. In contrast, shogi AI has clear wins/losses and ratings, allowing immediate numerical verification of strategy effectiveness.
Additionally, skill sets such as CUDA/TensorRT build and batch processing optimization are completely common between LLM and shogi AI. Shogi AI serves as an ideal experimental ground for verifying these technologies through a strict win/loss feedback loop.
The constructed system has a 3-layer architecture.
Phase 1: Book (Opening Database) — Immediate move via Python dictionary lookup. No C++ engine startup, zero GPU/CPU load.
Phase 2: MCTS + DL Model — Inference of a large 40-block ResNet using TensorRT. Quantized to fit within RTX 5090's 32GB VRAM.
Phase 3: α-β + NNUE — Fast position evaluation via CPU search. Handles endgame reading victories.
A Python wrapper manages phase switching and protocol communication, selecting engines based on position characteristics. This design philosophy of "winning with the entire architecture rather than a single model" is fundamentally the same as RAG system composition (search → ranking → generation multi-stage pipeline).
I forked two OSS engines (DL and NNUE) and removed unnecessary features at the source level.
In the DL engine, features such as multi-GPU support, multiple backend branching, and various mate search were removed to specialize for RTX 5090 × TensorRT. USI options were reduced from 63 to 43 (-32%).
In the NNUE engine, test commands, book generation commands, and learning-related code were compiled out, reducing binary size from 916KB to 514KB (44% reduction).
This "cutting" work directly applies to LLM operations. Instead of adding functionality via LoRA or Fine-Tuning to distilled models, reduce unnecessary branches and control via prompts — a policy fully aligned with the article "An Era Without LoRA or FT: How to Approach Distilled Models."
We manage a database of approximately 7 million book positions on the Python side. Book loading has been accelerated.
A notable feature is the real-time rewriting of the book during matches. After a loss, the early-game branching points are identified, and the book is modified to select different moves in the next match. The book is continuously refined as matches accumulate.
This "updating the database from experience and reflecting it in subsequent reasoning" cycle is identical to the feedback loop in RAG. The structure is the same as improving search result quality from dialogue logs.
During development, I used Claude Opus as a coding partner. For niche specialized tools like dlshogi and YaneuraOu, LLM hallucinations frequently occur. Blindly trusting confidently generated code can lead to incorrect modifications that not only don't work but also lower shogi strength.
The lesson here is that "LLM is translation, not reasoning." The correct usage is to perform calculations with specialized engines (e.g., search engines for shogi AI, domain-specific logic for business) and use LLM for natural language translation of inputs/outputs. This aligns with RAG design principles: "Don't give LLM knowledge, but generate based on facts obtained from external sources."
After organizing insights from two months of shogi AI development:
This shogi AI experience has been returned to LLM research, and LLM research insights have been applied to shogi AI architecture design. This cycle is the greatest value of venturing into different fields. Currently, I'm back to researching local LLMs (building systems using NVIDIA's Nemotron models), but I'll participate again when the GPU is free. It was very enjoyable.
Hardware Used:
2026-03-14 09:18:24
For modern engineers, LLMs (Large Language Models) like Gemini and ChatGPT are more than mere tools; they are a "second brain." From daily coding, debugging, and architectural considerations to career advice, we entrust a massive amount of our thought processes to AI. However, we face a critical issue here: "Is this valuable dialogue data truly ours?"
When buried in browser histories and practically unsearchable, past insights cannot be utilized. Moreover, if standard features like Google Takeout fail to work as expected, our intellectual assets are at risk of disappearing. Furthermore, even if you acquire powerful hardware like the latest RTX 5090 (32GB VRAM), you cannot maximize its performance without the appropriate data and workflows.
This article is a practical guide for engineers who extensively use Gemini, covering everything from techniques to export easily scattered conversation histories, to building a knowledge base using Google Workspace, and developing applications combining local LLMs and RAG (Retrieval-Augmented Generation).
From gritty hacks to automation scripts, all code is designed to work. Use this as a reference to shift your AI utilization from "consumption" to "assetization."
Dialogues with Gemini reflect your thoughts. Let's start by securing this data locally. However, there are some points to note here.
Google has a data export feature called "Google Takeout," but its behavior can be unstable regarding Gemini history. When attempting to export hundreds of chat histories, I once had a perplexing experience where the downloaded Zip file was "less than 1MB" and completely empty inside.
Particularly when using Google Workspace (enterprise accounts) or when API usage is mixed in, chat histories on the Web UI may not be archived correctly. Even if exported, they are in a complex JSON structure, which is not in a human-readable format as is.
If you successfully obtain GeminiChat.json via Google Takeout, you need to convert it into highly readable Markdown or CSV. The following Python script parses the nested JSON structure, formats the dates and titles, and outputs them.
import json
import os
import csv
from datetime import datetime
def format_timestamp(ts_str):
"""Format ISO 8601 timestamps"""
try:
if ts_str.endswith('Z'):
ts_str = ts_str
dt_object = datetime.fromisoformat(ts_str)
return dt_object.strftime("%Y-%m-%d %H:%M:%S")
except ValueError:
return ts_str
def process_gemini_json(json_file_path, output_dir="exported_gemini_chats"):
"""Read GeminiChat.json and output Markdown and CSV"""
if not os.path.exists(output_dir):
os.makedirs(output_dir)
all_chat_data = []
try:
with open(json_file_path, 'r', encoding='utf-8') as f:
data = json.load(f)
except Exception as e:
print(f"Error: {e}")
return
# Normalize data structure (ensure it's a list)
if isinstance(data, dict):
data = [data]
print(f"Processing {len(data)} chat entries...")
for i, chat_entry in enumerate(data):
title = chat_entry.get('title', f"Untitled_Chat_{i+1}")
# Remove characters unusable in filenames
safe_title = "".join(c for c in title if c.isalnum() or c in (' ', '-', '_')).strip()
if not safe_title: safe_title = f"chat_{i+1}"
created_at = format_timestamp(chat_entry.get('create_time', 'Unknown'))
# Generate Markdown
md_filename = os.path.join(output_dir, f"{safe_title}.md")
full_text = ""
with open(md_filename, 'w', encoding='utf-8') as md_f:
md_f.write(f"# {title}\n\n")
md_f.write(f"Date: {created_at}\n\n")
conversations = chat_entry.get('conversations', [])
if not conversations and 'content' in chat_entry:
# Fallback for different structures
conversations = [{"speaker": "AI", "text": chat_entry.get('content')}]
for convo in conversations:
speaker = convo.get('speaker', 'Unknown')
text = convo.get('text', '')
md_f.write(f"## {speaker}\n{text}\n\n")
full_text += f"{speaker}: {text}\n"
all_chat_data.append({
'title': title,
'created_at': created_at,
'summary': full_text.replace('\n', ' ')[:100] + '...',
'file': md_filename
})
# CSV Output
csv_file = os.path.join(output_dir, "summary.csv")
with open(csv_file, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=['title', 'created_at', 'summary', 'file'])
writer.writeheader()
writer.writerows(all_chat_data)
print(f"Done: Saved to {output_dir}.")
if __name__ == "__main__":
# Specify the JSON file path here
# json_path = "takeout/Gemini/GeminiChat.json"
# process_gemini_json(json_path)
print("Please specify a JSON path to run")
If Google Takeout does not work, or if the "Gemini Apps" item itself does not appear, an approach to directly save the information displayed in the browser is effective. By using Chrome extensions such as "ChatExporter for Gemini," you can extract text directly from the DOM and save it in Markdown format.
This method does not rely on server-side issues and can reliably save what is currently visible, making it highly effective as a backup. Even if there is a massive amount of history, it is crucial to select a tool that sequentially retrieves data in conjunction with browser scrolling.
It would be a waste to leave the exported data as is. Next, we will rebuild this as a searchable "knowledge base" within Google Workspace. By combining Google Apps Script (GAS) and AppSheet, you can create a secure AI assistant that requires no server management.
By keeping "Google Sheets (database)," "GAS (logic)," and "AppSheet (UI)" loosely coupled, this system achieves high maintainability.
The following code is a GAS function that receives a request from AppSheet, calls the API of the latest inference model, Gemini 3.1 Pro, generates an answer, and appends the result to a spreadsheet.
const GEMINI_API_KEY = PropertiesService.getScriptProperties().getProperty('GEMINI_API_KEY');
const SHEET_ID = 'YOUR_SPREADSHEET_ID';
function callGemini(prompt, historyContext) {
// Use Gemini 3.1 Pro Preview
const url = `https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-pro-preview:generateContent?key=${GEMINI_API_KEY}`;
// Append past context to prompt if available (Simple RAG)
const finalPrompt = historyContext
? `Please answer based on the following past context.\nContext: ${historyContext}\n\nQuestion: ${prompt}`
: prompt;
const payload = {
"contents": [
{
"parts": [{"text": finalPrompt}]
}
]
};
const options = {
"method": "post",
"contentType": "application/json",
"payload": JSON.stringify(payload),
"muteHttpExceptions": true
};
try {
const response = UrlFetchApp.fetch(url, options);
const json = JSON.parse(response.getContentText());
if (json.candidates && json.candidates[0].content) {
return json.candidates[0].content.parts[0].text;
} else {
return "Error: Could not generate a response.";
}
} catch (e) {
return `Communication Error: ${e.toString()}`;
}
}
// Function executed via Webhook or Trigger from AppSheet
function onUserQuery(e) {
// * In reality, assumes receiving arguments from AppSheet Automation, etc.
const userPrompt = e ? e.prompt : "Test Question";
const sheet = SpreadsheetApp.openById(SHEET_ID).getSheetByName('Log');
// Logic to search related information from past logs (Simplified)
const lastRow = sheet.getLastRow();
let context = "";
if (lastRow > 1) {
context = sheet.getRange(lastRow, 3).getValue(); // Use previous answer as context
}
const aiResponse = callGemini(userPrompt, context);
// Save timestamp, prompt, and response
sheet.appendRow([new Date(), userPrompt, aiResponse]);
return aiResponse;
}
This script works by creating it as a GAS project from Google Sheets extensions and setting the API key in script properties. This allows all your conversations to accumulate in a spreadsheet, making it a searchable database.
Once the knowledge base is ready, the next step is output automation. Here, we leverage the power of the latest high-end GPU "RTX 5090" to achieve content generation specialized in professional fields like law and technology, and automated output to Google Slides.
The most significant feature of the RTX 5090 is its vast 32GB VRAM capacity and high inference performance thanks to the Blackwell architecture. This allows loading mid-to-large-scale LLMs like Gemma 3 (27B) and Qwen2.5-32B fully with minor quantization.
In specialized tasks involving complex logical puzzles, hallucinations tend to occur with general API-based models. Thus, it is effective to use a library called Unsloth to fine-tune a local LLM on the RTX 5090.
from unsloth import FastLanguageModel
import torch
def train_local_model():
# Load model leveraging RTX 5090's 32GB VRAM
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/gemma-3-27b-it", # Use Gemma 3 27B model
max_seq_length = 4096,
dtype = None, # Auto setup
load_in_4bit = True, # 4-bit quantization to run 27B model comfortably
)
model = FastLanguageModel.get_peft_model(
model,
r = 16,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj"],
lora_alpha = 16,
lora_dropout = 0,
bias = "none",
)
# Dataset loading and Trainer configuration go here
print("Training started: RTX 5090 VRAM 32GB Environment")
Highly accurate answers and explanations generated by local LLMs or the Gemini API are output structured in JSON format. This is read by Google Apps Script to automatically generate slides.
function generateSlidesFromData() {
// Slide data to generate (Assuming received in JSON from Python)
const slidesData = [
{
title: "Results",
body: "Achieved accuracy comparable to Gemini 3.1 Pro in specialized interpretation.",
points: ["Faster inference", "Lower cost"]
}
];
const presentation = SlidesApp.create("AI Generated Report_RTX5090 Verification");
const slides = presentation.getSlides();
if (slides.length > 0) slides[0].remove();
slidesData.forEach(data => {
const slide = presentation.appendSlide(SlidesApp.PredefinedLayout.TITLE_AND_BODY);
const titleShape = slide.getShapes().find(s => s.getPlaceholderType() === SlidesApp.PlaceholderType.TITLE);
if (titleShape) titleShape.getText().setText(data.title);
const bodyShape = slide.getShapes().find(s => s.getPlaceholderType() === SlidesApp.PlaceholderType.BODY);
if (bodyShape) {
let textContent = data.body + "\n\n";
data.points.forEach(p => textContent += `- ${p}\n`);
bodyShape.getText().setText(textContent);
}
});
Logger.log("Slide generation complete: " + presentation.getUrl());
}
This automation significantly reduces document creation time. Humans can focus on structure and content review, freed from the simple task of layout adjustments.
Lastly, we'll build a real-time dashboard useful for daily data analysis and monitoring. With the Python framework "Streamlit", you can create web apps in a few lines of code without HTML or CSS knowledge. By integrating the Gemini API, a dashboard where AI interprets data meanings in real-time is completed.
The following code takes user input, performs sentiment analysis using Gemini 2.0 Flash (characterized by fast responses), and graphs the results in real-time.
import streamlit as st
import google.generativeai as genai
import os
import pandas as pd
import altair as alt
import json
# Set API Key (Recommended to read from environment variables)
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
st.error("API key is not set.")
st.stop()
genai.configure(api_key=GOOGLE_API_KEY)
# Use a fast model suitable for real-time processing
model = genai.GenerativeModel('gemini-2.0-flash')
def main():
st.set_page_config(layout="wide", page_title="Gemini AI Dashboard")
st.title("Gemini Real-time Analysis Dashboard")
# Initialize session state
if "messages" not in st.session_state:
st.session_state.messages = []
# Sidebar: Display history
with st.sidebar:
st.header("Analysis History")
for msg in reversed(st.session_state.messages):
st.text(f"{msg['role']}: {msg['content'][:20]}...")
# Main Area: Input and Display
user_input = st.text_input("Enter text to analyze:")
if st.button("Start Analysis") and user_input:
st.session_state.messages.append({"role": "User", "content": user_input})
with st.spinner("Gemini is thinking..."):
try:
# Prompt forcing JSON output
prompt = f"""
Perform sentiment analysis on the following text and output the percentages of positive, negative, and neutral in JSON format.
Set the key to 'sentiment' and return a list format with {{"category": "...", "percentage": ...}}.
Text: {user_input}
"""
response = model.generate_content(prompt)
response_text = response.text
# Extract JSON part (Simple processing)
json_str = response_text.replace("```
json", "").replace("
```", "").strip()
data = json.loads(json_str)
# Draw graph
if "sentiment" in data:
df = pd.DataFrame(data["sentiment"])
chart = alt.Chart(df).mark_arc().encode(
theta=alt.Theta(field="percentage", type="quantitative"),
color=alt.Color(field="category", type="nominal"),
tooltip=['category', 'percentage']
).properties(title="Sentiment Analysis Results")
st.altair_chart(chart, use_container_width=True)
st.success("Analysis Complete")
st.session_state.messages.append({"role": "AI", "content": str(data)})
except Exception as e:
st.error(f"An error occurred: {e}")
if __name__ == "__main__":
main()
This dashboard can be applied to various uses, such as analyzing customer feedback or detecting the mood in internal chats. Gemini's fast response and Streamlit's convenience accelerate prototyping speeds.
In this article, we introduced four phases to utilize the Gemini API.
These technologies show their true value when combined rather than used alone. A powerful hardware environment like the RTX 5090 serves as a foundation for freely manipulating AI locally.
Start by exporting your history. Within it, you should find useful insights that trace your own thought processes.