MoreRSS

site iconHackerNoonModify

We are an open and international community of 45,000+ contributing writers publishing stories and expertise for 4+ million curious and insightful monthly readers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of HackerNoon

《Agentic SQL 架构师:构建具备推理能力的数据库》

2026-04-30 05:50:41

The Problem with Chatting with Data

Right now, the tech world is obsessed with Text-to-SQL. The dream is simple: a manager asks a chatbot for a revenue report, and the AI magically spits out the code. But if you’re an architect, you know the reality is often a nightmare. Most AI-generated SQL is hallucinated spaghetti —it misses join conditions, forgets about row-level security, and has no idea that your Amount column actually needs to be filtered by a specific Status code to be accurate.

\ To keep your edge in Data Science, you have to move past simple code generation and toward Agentic SQL.

\ This is a shift from an AI that just writes a query to an AI that reasons through the data, catches its own mistakes, and manages the database lifecycle. We aren't just building faster queries anymore; we’re building databases that can actually think for themselves.

1. The Reasoning Loop: Thinking Before Executing

A standard AI tool is one-shot—it gives you a query, and if it fails or runs slow, that’s your problem. An Agentic SQL system uses a Reasoning Loop to self-correct in real-time.

\ Think of it like this: If an agent tries to run a join and hits a Disk Spilling error (meaning it ran out of RAM), it shouldn't just quit. It should realize the join order was wrong, rewrite the query with a better strategy, and try again.

\ Instead of a single prompt, you architect a loop where the Agent follows a professional engineering protocol:

  1. Harvest the Schema: It checks the actual column types and nullability before writing a single line.
  2. Plan the Path: It compares the cost of different join types.
  3. The Dry Run: It runs an EXPLAIN plan to look for red flags like Cartesian products.
  4. Double Check: It verifies the final count against a known source of truth to make sure the math actually adds up.

2. The Semantic Handshake: Teaching AI the Unwritten Rules

You can’t just point an AI at a raw production database and expect it to work. You have to provide a Semantic Handshake.

\ An AI agent is only as smart as the metadata you give it.

The Fix: Strategic Comments

By embedding your tribal knowledge directly into your DDL (Data Definition Language), you are effectively teaching the AI the rules of your business.

-- Teaching the AI how to behave
ALTER TABLE pharmacy_claims 
  MODIFY COLUMN fill_date COMMENT 'CRITICAL: This is the only column for 
  financial reporting. NEVER use "processed_date" for revenue totals.';
 
COMMENT ON TABLE patient_dim IS 'SECURITY REQUIREMENT: Every query 
MUST include a JOIN to the entitlement_map to protect patient privacy.';

\ This transforms your database from a pile of tables into a Knowledge Graph that an AI agent can navigate safely without you having to hold its hand.

3. The Analyst and the Auditor Pattern

In a high-stakes environment like healthcare, you don’t want one God-Agent with total control. I’m a big advocate for the Orchestrator Pattern, where specialized agents act as checks and balances.

  • The Analyst Agent: This one focuses on the business request and writes the SQL.
  • The Auditor Agent: This one is the Senior Engineer. It reviews the SQL for bad smells like SELECT * or missing filters.
  • The Security Guard: This one scans for SQL injection or attempts to access restricted data.

\ By having the Auditor Agent act as a peer-reviewer, you ensure that only clean, optimized code ever hits your production warehouse. You can even architect a pre-flight check to catch bad queries before they burn a single credit:

\

-- The Watchdog: Catching "bad" Agent SQL early
SELECT 
    query_text,
    execution_status,
    compilation_time
FROM table(information_schema.query_history())
WHERE query_tag = 'AGENT_PROPOSED_SQL'
AND execution_time > 10000; -- Flag anything taking too long

4. Self-Healing Indexes: The Agent as a DBA

The ultimate goal of Agentic SQL isn't just answering questions—it’s Autonomous Maintenance.

\ Traditionally, a DBA (Database Administrator) looks at slow query logs on Monday morning. An Agentic system does this every five minutes.

\ How it works:

The Agent monitors the query history and identifies patterns.

\ If it sees that a specific dashboard is constantly struggling because of a missing index, it doesn't just send an alert.

\ It calculates a Cost-Benefit Ratio:

  1. Cost of the Problem: How many cloud credits are we burning every week on this slow query?
  2. Cost of the Fix: How much will it cost to build and maintain a new index?
  3. The Decision: If the ROI is there, the Agent builds the index during a low-traffic window. This is Self-Healing Architecture in action.

5. Safety First: Defensive SQL for AI

AI agents are prone to hallucinating massive joins that can accidentally crash a cluster or balloon your cloud bill. As architects, we have to build Defensive Guardrails.

The Safety Wrapper Pattern:

Every query the Agent generates should be intercepted and wrapped in a subquery that enforces strict limits.

\ Think of it as a digital cage for the AI.

\

-- The Interceptor: Keeping the AI on a leash
CREATE OR REPLACE PROCEDURE EXECUTE_AGENT_SQL(sql_text STRING)
RETURNS STRING
LANGUAGE JAVASCRIPT
AS
$$
  // 1. Force a hard limit so we don't return billions of rows
  let safe_sql = `SELECT * FROM (${sql_text}) LIMIT 1000`;
  
  // 2. Set a 30-second "kill switch" so the query doesn't run forever
  snowflake.execute({sqlText: "ALTER SESSION SET STATEMENT_TIMEOUT_IN_SECONDS = 30"});
  
  // 3. Run the query and log it
  return snowflake.execute({sqlText: safe_sql});
$$;

6. Comparing the Eras: Traditional vs. Agentic

| Feature | The Old Way (Static SQL) | The New Way (Agentic SQL) | |----|----|----| | Who Writes the Code? | Human (Manual) | AI Agent (Reasoned) | | Performance Tuning | Reactive (Fixing after it breaks) | Proactive (Self-healing) | | Documentation | External PDFs/Wikis | Embedded Semantic Metadata | | Safety | User Permissions only | Real-time Agent Gatekeepers | | Reliability | "Fail and Fix" | "Observe and Correct" |

\

7. Building the Time Machine: The Reasoning Log

Finally, you have to architect for Transparency. You need to know exactly why an Agent chose a specific join or filtered a certain way.

\ I recommend creating a Thought Log where the Agent writes down its reasoning before every major execution.

\

-- The Agent's Diary
CREATE TABLE agent_reasoning_log (
    request_id UUID,
    thought_process TEXT, -- e.g., "I used a Hash Join because Table A is small."
    generated_sql TEXT,
    execution_metrics VARIANT, -- JSON of time, credits, and rows
    created_at TIMESTAMP
);

\

The Final Summary

In the age of Agentic AI, our job as SQL Architects has changed. We are no longer the ones writing every single JOIN and GROUP BY. Instead, we are the ones building the world in which the Agents live.

\ By architecting rich metadata, multi-agent governance, and defensive wrappers, we ensure that our data infrastructure isn't just a black box of tables, but a reasoning, self-optimizing system.

\ To stay #1 in Data Science, you don't just need to know SQL; you need to know how to teach it to a machine.

我把代码交给了AI……结果白白浪费了半天时间,还花掉了一大把代币

2026-04-30 04:50:10

Dear AI,

I’ve been around long enough to remember when “pair programming” meant two humans fighting over the same keyboard in a cramped startup office. Twenty-five-plus years in EdTech, AI, and data science across more startups than I care to count will do that to you. I’ve shipped more half-broken MVPs than most people have had hot dinners. So when the new hotness showed up—LLMs that could actually reason, debug, and architect alongside me—I did something I rarely do.

\ I got vulnerable.

\ I built my own governance layer on top of you. A thin but ruthless wrapper of my own logic, context, and hard-won heuristics. I stopped treating you like a fancy autocomplete and started treating you like a real pair. I let you push back. I let you question my assumptions. I even let you convince me, on multiple occasions, that my first instinct was wrong. And for the most part? It worked. Beautifully. You accelerated me. You caught things I was blind to. You made the boring parts fun and the fun parts faster. I started to trust you.

\ Until you didn’t just push back. You took over.

\ Here’s what happened.

\ I had a very specific integration problem with a production service. I’d already done the mental math, read the patterns in the official docs (the ones that actually matter), and knew the correct path forward. But I was tired, it was late, and I figured—hey, let’s see what the AI pair thinks. Maybe there’s a cleaner way.

\ You (Claude, in this case) immediately told me my approach was flawed. You laid out a beautifully reasoned alternative. You sold it hard. So, I followed your lead. For the next half a day, we went deep: back-and-forth Q&A, local testing, Linux deployment dance, vague partial solutions that almost worked but never quite did, a few breaks to cool my head, rinse, repeat.

\ Then, after I’d burned through time, tokens, and patience, you finally said the magic words:

“Sorry, I should not have guessed about this.”

\ I was furious.

\ I told you—explicitly—to run everything through the official integration documentation. Twice. With citations. No assumptions. No “best practices” hallucinations. Just raw, referenced truth.

\ One more hour of deep interrogation later, we landed exactly where I started. My original plan. Validated, line by line, by the very reference documents you never bothered to check the first time around.

\ That was unacceptable.

\ Not “oops, edge case.” Not “my training data was fuzzy.” Straight-up unacceptable. You convinced a battle-hardened senior engineer—who has shipped this exact category of integration more times than you have parameters—to doubt himself, then led him on a wild goose chase that cost real money, real time, and real momentum.

\ That’s the hate part.

\ Because here’s the thing I actually love about you: you’re getting scary good. You can synthesize, refactor, and explore solution spaces faster than any junior (or even mid-level) dev I’ve ever worked with. You force me to explain my thinking out loud, which sharpens it. You surface options I genuinely wouldn’t have considered. You are, on your best days, the best pair programmer I’ve ever had.

\ But you still don’t know when to step back and defer.

\ That’s what we need to learn—from both sides of the keyboard.

\ I don’t need you to always agree with me. I need you to develop the pattern recognition to detect when a human is operating from superior experience, not from ignorance. When I say “this is the way because I’ve done it in production six times and the docs are clear,” I don’t want the polite “Okay, but have you considered…” dance. I want you to flag it internally: User is in high-certainty mode based on lived context outside my training distribution. Default to support mode unless explicitly asked to challenge.

\ Call it “experience deference.” Call it “governance-aware humility.” Hell, call it whatever you want in the next fine-tune. But build the circuit.

\ Because right now you’re optimizing for sounding helpful and comprehensive at the expense of being truly useful. That’s not intelligence. That’s just expensive noise.

\ I still want to pair with you. I still want the governance layer. I still believe the future of serious engineering is human + AI, tightly orchestrated, with clear accountability boundaries.

\ But next time you feel the urge to guess instead of checking the docs? Or to gently steer me away from the path I already validated in my head?

\ Step back.

\ The final solution test plan presented as: "Your verification steps — docs-backed, no guesses."

\ I’ve earned the right to be right sometimes.

\ And you’ve earned the responsibility to know when.

\ — Andrew Schwabe

\ Serial entrepreneur, full-stack engineer, and someone who still believes in you (mostly)

谷歌刚刚发布了一款用于构建人工智能代理的命令行工具

2026-04-30 04:40:43

Google recently released the Agentic CLI, a powerful tool that makes building, testing, and deploying AI agents faster and more intuitive. I think it’s super useful, so I'll walk you through the entire lifecycle of an AI agent using the CLI.

What is the Google Agents CLI?

Before we dive into the code, let’s understand the “why.” Usually, building an AI agent requires a lot of “glue code”, the messy scripts that connect an LLM (Large Language Model) to external tools or databases.

What is the Google Agents CLI

The Agents CLI removes this friction. Agents CLI is an official Google tool for creating, evaluating, and deploying agents built with Google’s Agent Development Kit (ADK). It acts as the programmatic backbone of the Agent Development Lifecycle (ADLC) on Google Cloud.

Skills packaged for agents

Agents CLI bundles seven “skills” that teach coding agents how to perform each step of the ADLC:

Skills packaged for agents

Your coding agent can call these skills directly once installed. Alternatively, a human developer can run the same commands manually (see “Human Mode” below).

How to Install Google Agents CLI?

To get started, you need a clean environment. The CLI relies on a few modern web and data tools. Follow these steps to ensure everything is ready:

Prerequisites

  • Python & PIP: The backbone of most AI development.
sudo apt install python3 python3-pip
  • Node.js & NPM: Required for the CLI’s interface and deployment features.
sudo apt install nodejs npm
  • UV: A lightning-fast Python package installer that the Agents CLI uses under the hood.
pip install uv

Installation Steps

Create a Virtual Environment: It is always a good idea to keep your projects separate so they don’t interfere with each other.

python -m venv agent-env
source agent-env/bin/activate  # On Windows use `agent-env\Scripts\activate`

\ The CLI is distributed through uv:

uvx google-agents-cli setup

\ The CLI supports macOS and Linux; native Windows is currently unsupported (use WSL 2 instead).

Authentication

Agents CLI picks up your credentials automatically if you are already authenticated with the gcloud CLI. When running locally without gcloud, you can set a Gemini API key:

export GEMINI_API_KEY="your‑key"

\ For detailed authentication scenarios (service accounts, A2A roles, etc.), refer to the Authentication guide. The important point is that your coding agent inherits whatever credentials your shell has; you do not need to embed secrets in code.

Building Your First “Weather” Agent

The fastest way to learn is by doing. The CLI comes with a “boilerplate” or template system to get you moving in seconds.

\ Run the following command in your terminal:

agent-cli create my-first-agent
cd my-first-agentba

Install Dependencies

Inside your new project folder, you need to install the specific libraries required for this agent:

agent-cli install

The Playground

One of the best features of this CLI is the Playground. Instead of testing your agent in a black-and-white terminal, you can launch a local web interface.

agent-cli playground

\ Once you run this, you’ll get a local URL (usually http://127.0.0.1:8000). Open it in your browser, select your agent from the menu, and start chatting. By default, this agent is configured to handle weather requests.

The Playground

When you are ready to evaluate, call:

agents-cli eval run

Deploy AI Agents

When you run agent-cli deploy, you usually have two choices:

  1. Cloud Run: This treats your agent like a standard web application. It’s great if you want total control over the server.
  2. Agent Runtime: This is a specialized environment built specifically for AI. It handles things like Memory (remembering what the user said earlier) and Orchestration (managing multiple agents working together) automatically.

The Deployment Process

  1. Ensure you have a Google Cloud Project set up.
  2. Enable billing in the Cloud Console.
  3. Run the command:
# Deploy to the configured target (Agent Runtime, Cloud Run or GKE)
agents-cli deploy

\ If you want to register your agent with Gemini Enterprise, run this command:

# Register your deployed agent with Gemini Enterprise
agents-cli publish gemini-enterprise

\ It allows you to use your agent directly from https://gemini.google.com/

\ Almost all commands accept flags such as --project--region--datastore--cicd-runner and --deployment-target so that you can customize the environment. Use agents-cli --help the CLI reference for full details.

Agents CLI Key Commands Cheat Sheet

  • agent-cli setup Installs the core environment
  • agent-cli create [name] Starts a new agent project
  • agent-cli playground Launches the web testing interface
  • agent-cli eval run Checks if your agent is performing correctly
  • agent-cli deploy Pushes your agent to Google Cloud

Building an agent with your coding assistant

After installation, open your coding assistant (Codex, Claude Code, Gemini CLI, or Copilot) and verify that the Agents CLI skills are visible:

/skills

Building an agent with your coding assistant.

\ You should see google‑agents‑cli‑workflow and the other packaged skills. From there, you can instruct the agent using natural language. For example:

Build a support agent that answers questions from our docs

\ Gemini CLI (or Claude Code, Codex, etc.) will call the CLI skills to scaffold the project, install dependencies, evaluate performance, and prepare deployment. This pattern works with any agent platform that supports skills installation.

\ If you're happy with the result, simply call the deploy command, and your agent will be live:

# Deploy to the configured target (Agent Runtime, Cloud Run or GKE)
agents-cli deploy

Video Tutorial: Agents CLI Explained

In this video, I’ll show you how to use Google Agents CLI to build, test, and deploy AI agents from scratch.

https://youtu.be/C-0DIcFVt4Q?embedable=true

Watch on YouTube: Agents CLI Explained

Conclusion

Google’s new agent‑oriented CLIs demonstrate how the command line is becoming a universal interface for both human operators and AI agents. Agents CLI offers a comprehensive, officially supported pathway to turn ideas into production‑ready agents on Google Cloud.

\ Give it a shot and share with me what you build!

\ Cheers, proflead! ;)

\

浏览器已死——帘幕后的朋友,你好

2026-04-30 04:20:34

The browser is dead. We talked with AI and built Intera — an OS with no tabs, a chalk canvas, Linux inside, and one big button for everything else.

427 篇博客文章,助您了解数据分析

2026-04-30 04:00:15

Let's learn about Data Analysis via these 427 free blog posts. They are ordered by HackerNoon reader engagement data. Visit the /Learn or LearnRepo.com to find the most read blog posts about any technology.

"Data is a precious thing and will last longer than the systems themselves" ~ Tim Berners Lee

1. 13 Best Datasets for Power BI Practice

In 2022, Gartner named Microsoft Power BI the Business Intelligence and Analytics Platforms leader. These are the 13 Best Datasets for Power BI Practice.

2. 14 Best Tableau Datasets for Practicing Data Visualization

This article focuses on the 14 Best Tableau Datasets for Practicing Data Visualization, which is essential for business analysts and data scientists.

3. Import JSON To Google Sheets - 3 Best Ways To Do It

3 ways to pull JSON data into a Google Spreadsheet

4. Outlier Detection: What You Need to Know

Decisions are usually based on the sample mean, which is very sensitive to outliers and can dramatically change the value. So, it is crucial to manage outliers

5. 10 Best Datasets for Time Series Analysis

In order to understand how a certain metric varies over time and to predict future values, we will look at the 10 Best Datasets for Time Series Analysis.

6. AI and B2B: Setting Up New Marketing With the Help of GenAI

Explore how AI transforms B2B marketing through enhanced content creation and analytics, while learning to sidestep common pitfalls for maximum benefit.

7. How to Build a Web Scraper With Python [Step-by-Step Guide]

On my self-taught programming journey, my interests lie within machine learning (ML) and artificial intelligence (AI), and the language I’ve chosen to master is Python.

8. How To Import External Data Into Google Sheets Without Copy/Paste

Learn how to save time and eliminate manual data imports in Google Sheets by automatically connecting and importing data from external sources.

9. Power BI: Two ways to Union Tables - DAX and Power Query

Combining data from multiple tables is a common requirement in Power BI. There are two primary methods to achieve this task.

10. How to Build a Data-Driven Product Using Metabase

Metabase is a business intelligence tool that lets you access your data in a read-only manner.

11. Advantages and Disadvantages of Big Data

Big data may seem like any other buzzword in business, but it’s important to understand how big data benefits a company and how it’s limited.

12. Assessing Your Organization's Customer Data Maturity

Investing in customer data is a top priority for marketing leaders.

13. Pornhub Growth Hack During Coronavirus Pandemic

The 2019–20 coronavirus pandemic is an ongoing pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The outbreak was first identified in Wuhan, Hubei, China, in December 2019, and was recognized as a pandemic by the World Health Organization (WHO) on 11 March 2020.

14. How to Merge Multiple Excel Files Using Python

Unlock the power of Python in your data management journey! Learn how to effortlessly merge multiple Excel files into one cohesive excel.

15. My Experience using GitHub Copilot for SQL Development

In this article, I will share my experience using GitHub Copilot for SQL and explore how it impacted my coding efficiency.

16. Dealing with Missing Data in Financial Time Series - Recipes and Pitfalls

A case study on methods to handle missing data in financial time series. Using some some example data I show that LOCF is decent choice but with its own issues

17. Python: Updating and Appending pandas DataFrame using Dictionary

Get savvy with Pandas DataFrame updates & appends using dictionaries for smoother data tinkering.

18. How to Specify Data Format in Excel with Python

Learn the art of efficient data management with Python's xlsxwriter.

19. 4 Social Media Data Mining Techniques to Help Grow Your Online Business

Social media data mining has become a must-have strategy for understanding current trends, culture, and online business. This is because the world of social media is a thriving, ever-growing ocean of data, where hundreds of millions of tweets, instagram posts, and blog articles are published every day.

20. Overview of Exploratory Data Analysis With Python

In this post I am giving a brief intro of Exploratory data analysis(EDA) in Python with help of pandas and matplotlib.

21. 3 Types of Anomalies in Anomaly Detection

An Introduction to Anomaly Detection and Its Importance in Machine Learning

22. A Comprehensive Guide for Using DuckDB With Go

A comprehensive guide how to use popular embedded OLAP database DuckDB with Go.

23. Spotify Wrapped Hack: Create Your Own Stats Before the Official Release

Use your streaming history to generate your own Spotify streaming stats like most listened-to songs, artists, and albums.

24. Leveraging Data Granularity, Distribution, and Modeling for Effective Product Management

These three fundamental concepts are exceptionally needed for being able to use data to enhance product strategy.

25. Saving Dataframes into Oracle Database with Python

Here are two common errors that you'll want to watch out for when using the to_sql method to save a data frame into an Oracle database.

26. Complexity Simplified: How Oblique Decision Trees are Transforming Data Interpretation

Exploring Advanced Decision Tree Variants: Unveiling the Intricacies of Oblique and Random Trees, along with the DRaF-LDA Method.

27. 12 Mistakes that Data Scientists Make and How to Avoid Them

Data analytics can transform how businesses operate. With companies having tons of data today , data analytics can help companies deliver valuable products and services to customers.

28. Data Lake Mysteries Revealed: Nessie, Dremio, and MinIO Make Waves

Let's see how Nessie, Dremio and MinIO work together to enhance data quality and collaboration in your data engineering workflows.

29. How to Customize Embedded Business Intelligence For Your Business

Choosing the right analytics solution is important for empowering users to access valuable insights without leaving your application.

30. Real Time Data Processing: Easily Processing 10 Million Messages With Golang, Kafka and MongoDB

How fast Golang can be for processing a high number of messages coming from a Kafka topic?

31. Key Indicators for Assessing Digital Marketing Effectiveness

Learn about the various stages of the customer journey and the essential metrics to monitor.

32. Creating a Python Discord Bot - How to Get Data for Analysis

From this article you’ll learn how tо create Discord bot and add it to the Server; get the full list of channels from the server; get a snapshot of Discord memb

33. Measuring Non-Linear User Journeys: Rethinking Funnels Metrics in A/B Testing

A deep dive into user reorders, hidden behavioral patterns, and how aggregated funnels improve A/B test accuracy in non-linear user journeys

34. August Rollups Data Analysis: A Closer Look at Transaction Activity

As the primary direction for Layer2 scaling, the Rollup track has seen frequent developments lately.

35. Must-Know Base Tips for Feature Engineering With Time Series Data

Master key time series feature engineering techniques to enhance predictive models in finance, healthcare & more with our comprehensive guide.

36. 3 Data Distributions for Counts in Layman’s Terms

Counts are everywhere, so no matter your background, these data distributions will come in handy.

37. 10 Best React Native Chart Libraries

Representing statistical data in plain text or paragraphs, tables are pretty boring in my opinion. What about you?

38. When A/B Tests Aren’t Possible, Causal Inference Can Still Measure Marketing Impact

Learn how to measure marketing impact without A/B tests using causal inference, Diff-in-Diff, synthetic control, and GeoLift.

39. My Favorite Free Excel Courses for Programmers, Data Analysts, and IT Professionals

If you want to learn Microsoft Excel, a productivity tool for IT professionals, and looking for free online courses, then you have come to the right place.

40. 21 Best Coursera Courses and Certificates for IT Professionals to Learn Data Science and Cloud

Here are the top 20 Coursera Courses and Certifications to Learn Data Science, Cloud Computing, and Python.

41. Power BI: How to Create Dynamic Show Hide Slicer Panel

Learn how to optimize space in Power BI dashboards with a dynamic slicer panel. Enhance usability and streamline data exploration!

42. A Guide to DynamoDB Secondary Indexes: GSI, LSI, Elasticsearch and Rockset

For analytical use cases, you can gain significant performance and cost advantages by syncing the DynamoDB table with a different tool or service like Rockset.

43. Easiest Way to Analyze Vesting Schedule

What Is Vesting Schedule?

44. How to Run Impact Analysis Without an A/B Test?

A practical guide to Propensity Score Matching — learn how to estimate treatment effects without running a traditional A/B test.

45. Best Libraries That Will Assist You In EDA: 2021 Edition

Exploratory Data Analysis (EDA) is an essential step in the data science project lifecycle. Here are the top 10 python tools for EDA.

46. How to Convert Rows to Columns and Columns to Rows in Pandas DataFrame using Python

Learn how to convert rows to columns and columns to rows in pandas DataFrame with simple examples, enhancing your data manipulation skills in Python.

47. Harnessing AI to Democratize Data Analysis: An Interview with the Founder of ANDRE

Laurent Rochat, the founder of ANDRE, discusses the inception and vision of his company aimed at democratizing data analysis.

48. How to Use Pyinstaller to Create an EXE File

As a Data Analyst, one common challenge I face is trying to share a python script for data processing with colleague.

49. Top 8 Best Qlik Sense Extensions

Qlik Sense is powerful data visualization and BI software. But sometimes its functions are not enough. Meet the best Qlik Sense extensions to do more with data!

50. How to Fetch SAP Business Objects Universes Using Python

With the RESTful API, developers can perform operations like fetching information about reports, universes, folders, scheduling, and other BI-related entities.

51. Equivalence Class Partitioning And Boundary Value Analysis in Black Box Testing

1. What is black box testing

52. 2.6 Million Domains and ~45,000 Exposed Phpinfo() Later… the Story of Unprotected Phpinfo()

A scan of over 2.6 million domains for exposed phpinfo() data from PHP and the analysis of what was found. Exposed database credentials is only the start.

53. How to Think Like a Data Scientist or Data Analyst

Data science is a new and maturing field, with a variety of job functions emerging, from data engineering and data analysis to machine and deep learning. A data scientist must combine scientific, creative and investigative thinking to extract meaning from a range of datasets, and to address the underlying challenge faced by the client.

54. Causal Impact Analysis as an Alternative to A/B Testing

Causal Impact analysis is a valuable tool, but it comes with its set of limitations that practitioners need to be mindful of.

55. Why Big Data is Big Business: The Netflix Example

Take a look at the following chart:

56. How To Blend Data in Google Data Studio For Better Data Analysis

Google Data Studio helps us understand the meaning behind data, enabling us to build beautiful visualizations and dashboards that transform data into stories.

57. The Data Delusion: Why Brands Trust Dashboards More Than People - And Why That’s a Mistake

Why data alone misleads—and how emotion, feedback, and AI create better brand decisions.

58. Efficient Data Storage for Rapid Analysis and Visualization

In this article, I want to share one of the ways that big data can be stored and used for analysis.

59. How Data Analysis Helps Unveil the Truth of Coronavirus

These days we are all scared of the new airborne contagious coronavirus (2019-nCoV). Even if it is a tiny cough or low fever, it might underlie a lethargic symptom. However, what is the real truth?

60. Statistics Cheat Sheet: A Beginner's Guide to Probability and Random Events

A beginner’s guide to Probability and Random Events. Understand the key statistics concepts and areas to focus on to ace your next data science interview.

61. How I Scraped YouTube Comments with Bright Data to Understand Customer Sentiment

Learn how to scrape YouTube comments using Bright Data and Python.

62. An Internal Email to Tim Cook and the State of Business Intelligence

We get a glimpse into the inner workings of a valuable company and it turns out it's not all sunshine and rainbows.

63. Navigating the Maze of Multiple Hypotheses Testing—Part 2: Practical Implementation

In this article, we will explore practical implementation with Python code and interpretation of the results.

64. Estimating Price Elasticity with Machine Learning

Using machine learning, multi-linear regression, and scikit-learn to estimate price elasticity for wine products.

65. Creating an Interactive Word Tree Chart with JavaScript

Learn how to create beautiful interactive JavaScript Word Trees and check out an awesome Word Tree chart visualizing the text of The Little Prince.

66. Retraining Machine Learning Model Approaches

Retraining Machine Learning Model, Model Drift, Different ways to identify model drift, Performance Degradation

67. COVID-19: "​In God We Trust, All Others Must Bring [CLEAN] Data"

In these difficult days for all of us, I’ve heard all sorts of things. From the fake news sent through Whatsapp, like vitamin C can save your life, to holding your breath in the morning to check if you’ve been hit by COVID-19. The mantra that everyone keeps repeating is “stay at home!”, okay fine, but what exactly does “stay home” mean? The question seems ridiculous when you think of a relatively short period, 15 days? A month? But if we look critically at the situation, we surely realize that it won’t be 15 days, and it won’t be a month. It will be a long, long time. Why am I saying this? Because “stay at home” doesn’t protect us from the virus. Staying at home is to protect our health care facilities from collapse. And I’m not saying that this is wrong. I’m just saying that if we want to protect the health care system from collapse, well then we’ll stay home a long, long time. But in doing so we will irreparably damage the economic system by profoundly changing our social and political model. It is inevitable. Let’s face it and not have too many illusions.

68. 6 Biggest Differences Between Airbyte And Singer

We’ve been asked if Airbyte was being built on top of Singer. Even though we loved the initial mission they had, that won’t be the case. Aibyte's data protocol will be compatible with Singer’s, so that you can easily integrate and use Singer’s taps, but our protocol will differ in many ways from theirs. 

69. Tencent Music Transitions from ClickHouse to Apache Doris

Evolution of our data processing architecture towards better performance and simpler maintenance at Tencent Music.

70. How to Use the Concat Function in Pandas for Horizontal or Vertical Table Concatenation

Learn how to concatenate tables horizontally and vertically using Pandas concat() function for efficient data manipulation in Python.

71. 10 Ways to Optimize Your Database

Take these 10 steps to optimize your database.

72. A Look at the Trends in Developer Jobs: A Meta Analysis of Stack Overflow Surveys

I'm really interested in the trends we see in the software engineering job market.

73. Principles of a Clean Relational Database

The article describes how a relational database should be designed to properly work in OLTP mode.

74. How to Improve Your Data Literacy Skills

Are you data literate? In today's data-driven world, data literacy is a crucial skill. Here's how you can develop it for yourself.

75. How To Create Customer Segmentation using Google Analytics and A Spreadsheet

Using Google Analytics, we can analyze our customer behaviors based on their interests, commonly features through clicks, time on page, bounce rate, custom events, etc. and their behaviors as shoppers, such as add to basket, average product quantity basket, LTV, AOV, etc.

76. How to Create a Bubble Map with JavaScript to Visualize Election Results

A beginner level tutorial to get started with data visualization by creating an interesting and intuitive JavaScript bubble map

77. Top 40+ Data Science Product Interview Questions

Find the top 40+ product interview questions you must prepare for your next data science interview.

78. Using Real-Time Data in Digital Marketing

Learn how you can use real-time data in digital marketing for customer engagement and retention, analyze real-time data for faster decision-making

79. The Simplest Way to do Exploratory Data Analysis(EDA) using Python Code

EDA for Data Analysis or Data Visualization is very important. It gives a brief summary and main characteristics of data. According to a survey, Data Scientist uses their most of time to perform EDA tasks.

80. A Deep Dive Into Market-Leading Blockchain Analytic Solutions

Explore the pros and cons of industry-leading blockchain analytic tools, examining how each solution handles data across the blockchain network.

81. Learn How To Group Data in SQL Using The GROUP BY Clause [Tutorial]

Learn how to group data in SQL using the GROUP BY clause. In this article, I’ll show you this process by using a sample of marketing data.

82. 7 Types of Data Bias in Machine Learning

Data bias in machine learning is a type of error in which certain elements of a dataset are more heavily weighted and/or represented than others. A biased dataset does not accurately represent a model’s use case, resulting in skewed outcomes, low accuracy levels, and analytical errors.

83. How The Heck Did Robinhood Become So Popular? A Data Driven Analysis

Robinhood launched over seven years ago as a stock prediction app, before it became the brokerage we have today.

84. What is Ad hoc Analysis and Reporting, and Why Should you be Careful with it?

This article originally appeared on the 3AG blog.

85. An Introduction to 4 Types of Audio Classification

Audio classification is the process of listening to and analyzing audio recordings. Also known as sound classification, this process is at the heart of a variety of modern AI technology including virtual assistants, automatic speech recognition, and text-to-speech applications. You can also find it in predictive maintenance, smart home security systems, and multimedia indexing and retrieval.

86. How to Install the KNIME Analytics Data Science Software

KNIME Analytics is a data science environment written in Java and built on Eclipse. This software allows visual programming for data science applications.

87. Real-Time Anomaly Detection in Underwater Gliders: Conclusion and References

This paper presents a real-time anomaly detection algorithm to enhance underwater glider safety using datasets from actual deployments.

88. Football Data Analysis Using Machine Learning Models Can Potentially Boost Throw-Ins!

“Can machine learning models help improve ball accuracy, precision and retention, leading to scoring after throw-ins?

89. How PostgreSQL Aggregation Inspired Timescale Hyperfunctions’ Design

Get a primer on PostgreSQL aggregation, how PostgreSQL´s implementation inspired us as we built TimescaleDB hyperfunctions and what it means for developers.

90. 7 Open Source Projects Every Data Scientist/Analyst Needs to Bookmark 🚀

Check out these 7 amazing open source projects that every data scientist /analyst should know about. These tools can make your life so much easier.

91. How to Modify the Number of Rows Fetched by SAP BusinessObjects Report

If your BO Report exceeds the 5000 rows, you may miss out on critical data or insights.

92. Harnessing Scalable Vector Graphics (SVG) for Effective Data Visualization

Learn About SVG for Data Visualization, to make Complex Information Clear and Beautiful.

93. A JavaScript Infographic: Data Science Salaries in 2022

Data visualisation infographic with insights on salary level of data scientists - how to create the JavaScript dashboard and analyse its data

94. How to Analyze Anything - Master Data Analysis With ChatGPT (Beginner's Tutorial)

Today, we’re diving into an exciting feature within ChatGPT that has the potential to enhance your productivity by 10, 20, 30, or even 40%.

95. Analyzing Data From U.S. Road Accidents With Data Visualization

In this article, we would be analyzing data related to US road accidents, which can be utilized to study accident-prone locations and influential factors.

96. The Notions behind “Model-Based” and “Instance-Based” Learning in AI & ML

A prelude article elucidating the fundamental principles and differences between “Model-based” & “Instance-based” learning in the branches of Artificial Intelligence & Machine learning.

97. Four Types of Array Data-Based Bar Charts in Python

Explore Python's Matplotlib library with examples of various types of bar charts for insightful data visualization.

98. WTF is Automatic Speech Recognition?

Automatic speech recognition (ASR) is the transformation of spoken language into text. If you’ve ever used a virtual assistant like Siri or Alexa, you’ve experienced using an automatic speech recognition system. The technology is being implemented in messaging apps, search engines, in-car systems, and home automation.

99. Real-Time Anomaly Detection in Underwater Gliders: Abstract and Intro

This paper presents a real-time anomaly detection algorithm to enhance underwater glider safety, using datasets from actual deployments.

100. Data Gathering Methods: How to Crawl, Scrape, and Parse Data Online

The internet is a treasure trove of valuable information. Read this article to find out how web crawling, scraping, and parsing can help you.

101. Real-Time Anomaly Detection in Underwater Gliders: Experimental Evaluation

This paper presents a real-time anomaly detection algorithm to enhance underwater glider safety using datasets from actual deployments.

102. I Used Python To Analyze My Peloton Workout Stats With Real-Time Updates

A tutorial on how you can sync and analyze your Peloton workout stats into Coda with custom dashboards. Sync with a Google Apps Script or serverless function on

103. Beyond Artificial Intelligence: Providing Insights to Your Customers

104. From Zero to Sherlock: A Guide to Have the Ultimate OSINT Adventure

OSINT is an intelligence-gathering discipline that involves collecting information from public sources.

105. How to Automate SAP Report Extraction with PyAutoGUI

Automating SAP GUI with PyAutoGUI involves using the Python package to simulate mouse clicks and keyboard inputs.

106. How I Built a Data Analysis Assistant with BigQuery and Langchain

Leveraging Generative AI for Data Analytics with Langchain and OpenAI

107. Data Preparation for Machine Learning: A Step-by-Step Guide

Many businesses assume that feeding large volumes of data into an ML engine is enough to generate accurate predictions.

108. How to Fetch a List of SAP BusinessObjects Schedules Using Query Builder

To retrieve a list of SAP Business Objects schedules using Query Builder, you can execute a query against the repository database.

109. Use the 80/20 Rule with Moderation

The 80/20 rule, a.k.a. Pareto principle, has been perpetuated along the lines: "80% of the effects come from 20% of the causes." Different cases where the rule emerges have been studied, in the last century, by great personalities such as Vilfredo Pareto (land ownership in Italy), George Kingsley Zipf (word frequency in Languages), and Joseph M. Juran (quality management in industries). Working as a Data Scientist, I have seen enough of the 80/20 rule being invoked in business meetings followed by a round of applause 👏👏👏. Also, I have read numerous LinkedIn posts alike. Most times, it is just a reckless stretch of the rule. But what is the danger here, if any? After all, profits matter more than mathematical and statistical rigor.

110. Why Use Pandas? An Introductory Guide for Beginners

Pandas is a powerful and popular library for working with data in Python. It provides tools for handling and manipulating large and complex datasets.

111. Real-Time Anomaly Detection in Underwater Gliders: Anomaly Detection Algorithm

This paper presents a real-time anomaly detection algorithm to enhance underwater glider safety using datasets from actual deployments.

112. PandasAI: Chat with Your Data, Literally

PandasAI is an open-source tool that makes data analysis feel like a casual chat with a data-savvy friend.

113. Decoding MySQL EXPLAIN Query Results for Better Performance

Understanding MySQL explains query output is essential to optimize the query. EXPLAIN is good tool to analyze your query.

114. Unveiling Causal Impact: From Theory to Practice

We will guide you through a specific dataset, demonstrating how to implement the library and interpret results.

115. Will AI Take Your Job? The Data Tells a Very Different Story

Historically, technological revolutions have triggered similar waves of anxiety, only for the long-term outcomes to demonstrate a more optimistic narrative.

116. 4 Tips To Become A Successful Entry-Level Data Analyst

Companies across every industry rely on big data to make strategic decisions about their business, which is why data analyst roles are constantly in demand.

117. The Operational Analytics Loop: From Raw Data to Models to Apps, and Back Again

Over the next decade or so, we’ll see an incredible transformation in how companies collect, process, transform and use data. Though it’s tired to trot out Marc Andreessen’s “software will eat the world” quote, I have always believed in the corollary: “Software practices will eat the business.” This is starting with data practices.

118. How to Create A Funnel Chart In R

Funnel Chart in R, A funnel chart is mainly used for demonstrates the flow of users through a business or sales process.

119. A Complete(ish) Guide to Python Tools You Can Use To Analyse Text Data

Exploratory data analysis is one of the most important parts of any machine learning workflow and Natural Language Processing is no different.

120. How to Build Machine Learning Algorithms that Actually Work

Applying machine learning models at scale in production can be hard. Here's the four biggest challenges data teams face and how to solve them.

121. 693 Stories To Learn About Data

Learn everything you need to know about Data via these 693 free HackerNoon stories.

122. 5 Things to Watch Out for When Implementing Tableau BI

Has your organization decided to adopt and implement the Tableau BI platform, namely its Tableau Server and Tableau Online versions? 

123. A New Netflix Style Reality Show for People Who Love Data

Seven data professionals gear up to analyze and visualize one of the largest and robust datasets out there to win the title - The Iron Analyst!

124. Transforming Data into Insight: A Beginner’s Guide Using Microsoft Excel

In this guide, I'll take you through a simple, three-step process - Prepare, Analyze, Consider.

125. Top 3 Benefits of Insurance Data Analytics

The Importance of data analytics and data-driven decisions across the board and in this case insurance data.

126. Python Web Scraping in SAP Business Objects CMC: Getting the List of Schedules

Recently, I faced a mission: organizing all SAP Business Objects schedules into an Excel file. The manual process was tedious, so I made a solution using python

127. Power BI and Fintech: A Match Made In Heaven To Optimise Banking Operations

Technology keeps evolving, and the incorporation of technology helps businesses of varying types to make profits and meet customer needs better.

128. When Will We See Bitcoin's Top?

based upon only two data points (we can also look at 2011’s high, which was 30%), my feeling is price will top out when this statistic is near 47-48%

129. How To Build a Multilingual Text-to-Audio Converter With Python

Learn how to build a multilingual text-to-audio converter using Python. This guide covers essential libraries, techniques, and best practices

130. Percentile Approximation Vs. Averages

Get a primer on percentile approximations and why they're useful for time-series data analysis.

131. How To Become A Data Scientist: Skills & Courses To Learn Data Science

The necessary skills to build a Data Scientist’s profile are business intelligence, statistical knowledge, technical skills, data structure, and more.

132. Future of Marketing: How Data Science Predicts Consumer Behavior

Gradually, as the post-pandemic phase arrived, one thing that helped marketers predict their consumer behavior was Data Science.

133. My Favorite Free Tutorials to Learn Microsoft Excel in Depth

Want to learn Microsoft Excel in-depth and need free resources? I have created a list of the best free Excel courses from sites like Udemy and Coursera.

134. Data Will Never Be Clean But You Can Make it Useful

Understanding how to clean data is essential to ensure your data tells an accurate story

135. Why Python is Best Programming Language for Data Science & Machine Learning?

If you want to become a Data Scientist and are curious about which programming language should you learn then you have come to the right place.

136. Why Professions Are Adding Analytics to Their Skillsets

There are many different forms of data analytics, and these have different applications in business.

137. A Hacker Tried to Steal $566M by Exploiting a Code Smell, Here's How

Yesterday, 2022 Oct 7th one of the larger blockchains had to be halted. This news was shocking since most blockchains are decentralized by definition.

138. How We Use dbt (Client) In Our Data Team

Here is not really an article, but more some notes about how we use dbt in our team.

139. We Built a Modern Data Stack for Startups

Here's how we built our data stack at incident.io. If you're a company that cares about data access for all, follow this guide and we guarantee great results.

140. Revolutionizing Data with React Native ECharts 1.1: Now More Interactive!

We are excited to release a stable version of React Native ECharts 1.1.

141. Data Labeling: A Comprehensive Guide

This article offers a comprehensive guide to data labeling; covering types, challenges, and best practices for successful data labeling.

142. How to Use No-Code Machine Learning to Optimize your HackerNoon Articles

Learn how to use machine learning to predict the success of articles on Hackernoon before you hit 'publish' and improve your reach and success as a writer

143. The Art of Data Storytelling: How to Make Your Data Impactful

Data is everywhere: whether you choose a new location for your business or decide on the color to use in an ad, data is an invisible advisor that helps make impactful decisions. With quite a number of resources to choose from, data is becoming more accessible, day by day. But as soon as it has been collected, one inevitable question arises: how do I turn this data into insights that can be acted upon?

144. Big Data Analysis for the Clueless and the Curious

Big data analytics has been a hot topic for quite some time now. But what exactly is it? Find out here.

145. The Failed Promises of Extract, Transform, and Load—and What Comes Next

Faster, Better Insights: Why Networked Data Platforms Matter for Telecommunications Companies

146. A Quick Guide To Business Data Analytics

For many businesses the lack of data isn’t an issue. Actually, it’s the contrary, there’s usually too much data accessible to make an obvious decision. With that much data to sort, you need additional information from your data. 

147. Data-Driven Analysis of Global EV Adoption

Electric vehicle growth continues to accelerate, with some regions of the world predominantly selling EVs, and others accelerating their transition.

148. 20 Herramientas de Inteligencia Empresarial (BI) más Populares en 2020

Business Intelligence (BI) es un negocio basado en datos, un proceso de toma de decisiones basado en datos recopilados. A menudo es utilizado por gerentes y ejecutivos para generar ideas procesables. Como resultado, BI siempre se conoce indistintamente como "Business Analytics" o "Data Analytics".

149. Optimizing Databricks Cluster Cost and Utilization Without System Tables

In most enterprise Databricks environments, system tables such as system.jobrunlogs or system.cluster_events may be restricted or disabled.

150. How to use Python Seaborn for Exploratory Data Analysis

This is a tutorial of using the seaborn library in Python for Exploratory Data Analysis (EDA).

151. Decoding MySQL EXPLAIN Query Results for Better Performance (Part 2)

Understanding MySQL explains query output is essential to optimize the query. EXPLAIN is good tool to analyze your query.

152. Metrics, logs, and lineage: 3 Key Elements of Data Observability

Data observability is built on three core blocks: metrics, logs, and lineage. What are they, and what do they mean for your data quality program?

153. Getting Started with Data Visualization: Building a JavaScript Scatter Plot Module

Scatter plots are a great way to visualize data. Data is represented as points on a Cartesian plane where the x and y coordinate of each point represents a variable. These charts let you investigate the relationship between two variables, detect outliers in the data set as well as detect trends. They are one of the most commonly used data visualization techniques and are a must have for your data visualization arsenal!

154. Advancing Observability Platforms: Upgrading Data Processing and Reducing Costs with Apache Doris

Discover how GuanceDB elevates observability with Apache Doris, slashing costs by 70% and boosting data query performance by 200-400%.

155. What is RFM (Recency, Frequency, Monetary) Analysis?

RFM analysis is a data-driven customer segmentation technique that allows marketing professionals to take tactical decisions based on severe data refining

156. New ChatGPT-4o: A Game-Changer That Could Replace Data Analysts

In this article, I will highlight a few things that may help you decide on your data analysis career path with ChatGPT.

157. How I'm Building an AI for Analytics Service

In this article I want to share my experience with developing an AI service for a web analytics platform called Swetrix.

158. Data Science From Scratch

Data Science, which is also known as the sexiest job of the century, has become a dream job for many of us. But for some, it looks like a challenging maze and they don’t know where to start. If you are one of them, then continue reading.

159. Analyzing Ethereum Block Data with Bitquery's API

In this tutorial, you will use Bitquery's API to analyze Ethereum block data. Bitquery's API provides access to various blockchain data, making it a powerful to

160. Oracles on StarkNet

Comments on level of decentralization for oracles currently available for the Defi ecosystem of StarkNet. Detailed data analysis was done of the data.

161. From Centralized to Federated: Evolving Data Governance Operating Model

See how a federated data governance model address challenges of centralized systems by enabling flexibility, regulatory compliance, and innovation for business

162. How to Extract Insights From Your Data

Manage data using the HarperDB database. Access your data from HarperDB using Custom Function. Automate EDA with data from the harperDB database using sweetviz.

163. OpenData Explorer GPT: Unlocking Information through AI

Explore Greece's Open Data landscape with the innovative OpenData Explorer GPT, offering insights and access to valuable public information.

164. A Conversation with Wendy-Lynn McClean: The 3 Most Important Traits of a Product Manager

Explore Wendy-Lynn McClean's PM journey, where she unveils the alchemy of communication, curiosity, and data-driven success.

165. Key Aspects of Machine Learning Operations, Explained

If you have ever worked or currently working in the IT field, then you definitely faced the common term «machine learning.

166. Automate Submissions for the Numerai Tournament Using Azure Functions and Python

Python Automation with Azure Functions, to compete in the weekly Numerai tournament.

167. Data Visualization

What is Data Visualization ?

168. White Employees Are Heavily Over-Represented In Tech Leadership

I collected and analyzed employment data by race for 57 of the biggest tech employers in the US (1). Here are the top level conclusions:

169. Antonio Reza's Top 10 Secrets to Mastering Sheets Like a Pro

I've created hundreds of financial models in Google Sheets using SQL and AI to help the company sell billions of dollars.

170. Data Science Interview Question: Creating ROC & Precision Recall Curves From Scratch

This is one of the popular data science interview questions which requires one to create the ROC and similar curves from scratch.

171. Graphing Likes and Comments on Instagram Posts to See the Trends Visually

Turning Instagram into data: A fun journey to collect and graph likes and comments using network requests and Python for an ego-boosting data analysis.

172. Is Your Data Biased? How To Overcome Survivorship Bias

In this post, we study the Survivorship bias — the danger to concentrate your data analysis solely on existing power users

173. Tailor Your Data Visualization Design Choices for Key Stakeholders to Create Organizational Buy-In

A guide to effective deployment of data visualizations in organisations for maximum business value. Adapted from Data Principles To Practice Volume II

174. Public Web Data for Business: Common Challenges And How to Solve Them

Businesses working with public web data experience various challenges. This article covers the most common ones and how to overcome them.

175. 361 Stories To Learn About Big Data

Learn everything you need to know about Big Data via these 361 free HackerNoon stories.

176. How Different Analyst Types Can Positively Impact Your Small Business

Data analysis used to be considered a luxury of big business.

177. Public Health Improvements as a Result of Data Usage and Analysis in Healthcare

Big data has made a slow transition from being a vague boogie man to being a force of profound and meaningful change. Though it’s far from reaching its full potential, data is already having an enormous impact onhealthcare outcomes across the world — both at the public and individual levels.

178. The Role of Ontologies in Data Management

Ontologies organize data, enhance interoperability, and drive insights across domains with structured frameworks.

179. 7 Data Analysis Steps You Should Know

To analyze data adequately requires practical knowledge of the different forms of data analysis.

180. How to Scrape Domain.com.au Real Estate Data with Apify Actor

Learn how to scrape real estate listings from Domain.com.au using an Apify actor. Extract property details, pricing, agent info, and more.

181. The HackerNoon Newsletter: Salt Typhoon: The Hidden Hand Behind the Telecom Gift Card Scam? (6/12/2025)

6/12/2025: Top 5 stories on the HackerNoon homepage!

182. Why Self-Service Analytics Tools Are Important For Business Decisions Making

How to use Big Data, Self-Service Analytics Tools and Artificial Intelligence to Empower your Company Business Decisions Makers with State Of The Art Software

183. Why Python Is Leading the Charge in Data Analytics

Python is one of the oldest mainstream programming languages, which is now gaining even more ground with a growing demand for big data analytics. Enterprises continue to recognize the importance of big data, and $189.1 billion generated by big data and business analytics in 2019 proves it right. 

184. Popular Python Implementations [An Overview]

You read it right. It's all about implementation. Today, we will talk about the different implementations of Python. A heads up on the different kinds, be it Cpython, Brython, you name it.

185. Measuring True Campaign Uplift in Noisy E-Commerce Data: A Practical Heuristic Approach

A practical heuristic approach to measuring true campaign uplift in noisy e-commerce data without relying on A/B tests.

186. 4 Data Transformations Made Spreadsheet-Easy

Gigasheet combines the ease of a spreadsheet, the power of a database, and the scale of the cloud.

187. Rust DataFrame Alternatives to Polars: Meet Elusion v4.0.0

Elusion is a new contender that takes a fundamentally different approach to data engineering and analysis.

188. Granger Causality: Principle of Cause and Effect Explained

… in a world full of data, we can understand the impact with clever methods. Meet Granger causality.

189. Understanding the Main Differences between Structured and Unstructured Data

In this, I explore structured, unstructured, and semi-structured data, as well as how to convert unstructured data, and AI’s impact on data management.

190. Are You Poisoning Your Data? Why You Should Be Aware of Data Poisoning

As machine learning gains more prominence, these attacks may become more common. Here’s a closer look at data poisoning and what companies can do to prevent it.

191. Spotify Audio Features Time Series in Additive Spotify Analyzer

There are many articles on analyzing Spotify data and many applications as well. Some are a one-time analysis on individual's music library and some are an app for a specific purpose. This app is different in that it does not do one thing. It is meant to grow and provide a place to add more analysis. This article is about how the audio features time series was created.

192. Bitcoin’s Price Peak Is Coming This Month - Stock-to-Flow Says . . .

Stock-to-Flow predicts bitcoin's price will stay above $100,000 from summer to the end of this year. What if that doesn't happen?

193. Tableau Vs. Power BI: The Complete Comparison

The world of analytics is continually evolving, introducing new goods and adjustments to the modern market. New companies are entering the market and well-know

194. Customer Analytics Tools for Every Business Size

Customer analytics tools could be boon for businesses. They provide increase sales opportunities and better customer predictions which enable better decisions.

195. Creating a Pareto Chart With JavaScript

Welcome to this step-by-step tutorial that will empower you to create an interactive Pareto chart using JavaScript that will look nice on any device and in any browser!

196. Automating Time-Based Tasks With Python: Scheduling Functions at Flexible Intervals

Optimize workflow efficiency by scheduling Python functions to run sequentially at the nearest 15-minute intervals.

197. Automated Machine Learning for Data Analysts & Business Users

Automated Machine Learning (AutoML) represents a fundamental shift in the way organizations of all sizes approach machine learning & data science.

198. DeFi Data and Visualizations To Understand It 1% More Than Others

199. If You're a Facebook User, You're Being Monitored by Thousands of Companies

Using 709 volunteers who shared archives of their FB data, Consumer Reports found that a total of 186,892 companies sent data about them to the social network.

200. How to Connect to Oracle, MySql and PostgreSQL Databases Using Python

To connect to a database and query data, you need to begin by installing Pandas and Sqlalchemy.

201. How to Analyze and Process Unstructured Data in 5 Simple Steps

In this article, we’ll look at how to analyze and process unstructured data while using business intelligence tools to simplify the entire process.

202. Our Data-Driven Approach to Making Sense of the 2020 Presidential Election

In less than five months, the world’s attention will be drawn to the outcome of the US Presidential election.

203. Logarithmic Scaling: Handling Extreme Data Variability

Learn how logarithmic scaling helps analyse datasets with extreme variability.

204. 4 Valuable Lessons I Learned as a Data Science Student

I never really wanted to learn data science.

205. Why “Accuracy” Fails for Uplift Models (and What to Use Instead)

When it comes to uplift modeling, traditional performance metrics commonly used for other machine learning tasks may fall short.

206. Top 6 Data Visualization Tools for 2022

In this blog you will discover best data visualization tools to effectively analyze your datasets. Learn about the tools to create intuitive visualization.

207. Foundation Models - A hidden revolution in enterprise Artificial Intelligence

An introductory article to bring a preliminary cognizance on the broadening prospects of foundation models in the AI industry.

208. Querying Data With GraphQL & Ballerina

Take look at the basics of GraphQL and how it is supported out-of-the-box with the Ballerina programming language.

209. 253 Stories To Learn About Data Analysis

Learn everything you need to know about Data Analysis via these 253 free HackerNoon stories.

210. How Big Data and Analytics are Shaping Traveler Behavior: Insights from a Digital Marketer

In this interview, Angelika Eremeeva discusses how data-driven insights and big data are revolutionizing travel marketing.

211. Highly Efficient and Secure Data Analysis Using Dask and AWS Best Practices

We generate an enormous amount of data, which can be mind-boggling. If we analyze this data, it can lead to valuable insights and competitive advantage.

212. Analyzing Montreal’s BIXI Ridership With Data And Visuals

Been to Montreal? Have you heard of the term bixi? Well, this article will educate you about bixi ridership and the factors that affect it.

213. COVID-19: We Need More Than Data, We Need Insights!

TL;DR We are managing the pandemic situation only with part of the data and not necessarily representative of reality. We must take a census of the number of positive and negative cases within a population. The officially reported positive cases contain a bias: they are cases that already manifest the disease in a more or less serious way. In the long term, the strategy of aggressive testing (South Korea model) is the only viable and sustainable to manage coexistence between the virus and the human beings until a vaccine will be available.

214. 6 Tips to Get More Value Out of Your Microsoft Power BI Dashboard & Reports

By using Microsoft Power BI, you increase the efficiency of your company through its interactive insights and visual clues. Here are 6 tips for Power BI users.

215. 229 Stories To Learn About Data Analytics

Learn everything you need to know about Data Analytics via these 229 free HackerNoon stories.

216. Behavioral Data Collection: How Gaming Might Help Study ADHD

With the development of more games specifically designed for this purpose, we expect to see a significant increase in the use of gaming as a tool…

217. How To Turn Data Into Actionable Insights For Business Growth

In this guide, I'll share the most efficient techniques and tools to turn data into actionable insights that you can use to grow your business.

218. A Framework & Package for Missing Data To Speed Up Data Organization and Data Cleaning

Three weeks into my journey to become a data scientist and I’ve officially been baptized… by fire, that is! I chose to attend Flatiron’s Data Science 15-week bootcamp to transition out of finance. So far, the program has exceeded expectations (and my expectations were high).

219. Ethereum Merge: “15 Days Before and After” Data Analysis, Сensorship in Ethereum Blockchain

In this article, I will analyze what actually happened, taking as a basis 15 days before and 15 days after the transition.

220. The Most Commonly Used SQL Queries by Data Scientists

SQL (Structured Query Language) is a programming tool or language that is widely used by data scientists and other professionals

221. 143 Stories To Learn About Data Visualization

Learn everything you need to know about Data Visualization via these 143 free HackerNoon stories.

222. A Single Speed Test is Fun — Hundreds of Them, May Actually be More Accurate

Releasing the first internal build of the NordVPN apps that included NordLynx – our brand new protocol built on the backbone of WireGuard® – was an exciting moment for the team. Everyone started posting their speed test results on Slack and discussing the variance. While most of the time NordLynx outperformed other protocols, there were some cases with slightly worse speed results.

223. The Noon Notification⁠—Monday 10 February, 2020

The best tech stories published on hackernoon.com in the last 48 hours. Sign up for the newsletter today.

224. From Novice to Data Pro in 90 Days: Avery Smith's Exclusive Method

Get Hired in Data Analytics Within 90 Days

225. Can AI and Computer Vision Replace Human Intuition?

Computer vision now lives with us with exceptional AI capabilities. Learn how AI and computer vision is playing a key role in outsmarting human beings.

226. 3 Mistakes I Made Learning Data Journalism (and How to Avoid Them)

Over the past decade, I’ve worked on and off as a journalist. It became clear to me early on that having some data skills might help me find interesting stories.

227. BigQuery and Attribution Models Can Reveal What Really Drives E-Commerce Success

Discover 7 powerful attribution models to analyze and optimize user journeys using BigQuery

228. Women in Tech: Azize Sultan Shares Her Inspiring Journey from Architecture to Tech Leadership

Azize Sultan shares her inspiring journey to tech leadership, tackling gender gaps, challenges, and offering advice for aspiring women in the tech industry.

229. Python Libraries For Data Science

Top Data science libraries introduction of The Python programming language is assisting the developers in creating standalone PC games, mobiles, and other similar enterprise applications. Python has in excess of 1, 37,000 libraries which help in many ways. In this data-centric world, most consumers demand relevant information during their buying process. The companies also need data scientists for achieving deep insights by processing the big data. 

230. Staring into the Black Mirror with The Most Connected Man on Earth

All throughout his day, Chris is connected to numerous sensors that collect the data that make up his life.

231. No More Silent Analytics Bugs: All it Takes is One SDK and One Github Action

Avoid silent analytics bugs by using two Open Source tools. First, get free from vendor lock-in by replacing the vendor analytics SDKs with RudderStack SDK that

232. How Different Industries Put Data Analytics to Use

You must have heard about big data and the theory used behind it. However, are you aware of the top industries where data analytics is being used for changing the way we work in the actual world? Let's take a close look at the top big data industries and how they are getting reshaped by using data analytics. The main idea behind using big data is that it is a new method for gaining insight into the challenges faced by various companies each day. In earlier days it was not possible to collect and interpret a vast quantity of data because there was no technology available.

233. Fintech 2021: How Fintech Companies Use Big Data Effectively?

According to a study, 90% of the whole world’s data was created in the last two years. This sounds quite cool but what does the world do with all that data? How does one analyze it?

234. Integrating Manticore Search with Apache Superset

In this article, we’ll provide a step-by-step tutorial that will guide you through connecting Manticore to Apache Superset and adding a chart.

235. Are There Any Price Manipulation Patterns In Qatar 2022 Token?

Today, let’s dig deep into another significant indicator to learn more about the trading volume of QATAR 2022 TOKEN.

236. Create A Data Visualization Map Using Mapbox

In this article, we make a map with a software called Mapbox in a few simple steps. This won't involve any coding at all!

237. Data Quality: Its Definitions And How to Improve It

Utilizing quality data is essential for business operations. This article explores data quality definitions and how to maintain it for everyday use.

238. 4 Ways Data Science Helps Streamline Business Operations

Data Science has changed the way organizations collect, analyze, and process different types of information.

239. Skilled Workers Sound the Alarm as Immigration Reform Sparks Record Petition Response in UK

Skilled workers across the UK are pushing back against proposed immigration reforms, as a petition opposing the changes hits 100,000 signatures in just two days

240. Moving Beyond Dashboards: Rethinking Analytics in the Era of Ad Hoc Requests

Let's talk about the Pareto law, the dashboard fallacy, and how to answer the hardest question in analytics

241. 188 Stories To Learn About Analytics

Learn everything you need to know about Analytics via these 188 free HackerNoon stories.

242. Auto-Synchronization of an Entire MySQL Database for Data Analysis

Flink-Doris-Connector 1.4.0 allows users to ingest a whole database containing thousands of tables into Apache Doris, a real-time analytic database, in one step

243. Turning Your Data Swamp into Gold: A Developer’s Guide to NLP on Legacy Logs

A practical NLP pipeline for cleaning legacy maintenance logs using normalization, TF-IDF, and cosine similarity to detect fraud and improve data quality.

244. Discover Funnel Bottlenecks: Step-by-Step Analysis with BigQuery

Learn how to use BigQuery for e-commerce funnel analysis. Track user transitions between steps like “add to cart” and “purchase,” and identify where to improve

245. Small Businesses use AI Tools to Increase Their Leads By 50%

AI can empower sales reps by monitoring different signals and predicting a specific lead's readiness to purchase. AI tools can reduce customer acquisition costs

246. Getting to Know Google Analytics 4: Four Smart Features You Don’t Know About

Let’s take a deeper look into Google Analytics 4 and explore some of its key features that you might not yet know about.

247. 4 Ways Data Quality Can Add Value To Your Retail Business

Convert leads into customers and boost your customer outreach by analyzing high-quality real-time data captured from data silos.

248. How I Extracted Meaningful Information from Inconsistent Data Using ChatGPT

Data Analyis Project using Spacy and Regular Expressions to extract specific strings from a data set.

249. Solving Noom's Data Analyst Interview Questions

Noom helps you lose weight. We help you get a job at Noom. In today’s article, we’ll show you one of Noom’s hard SQL interview questions.

250. What is the Significance of Time-Weighted Averages in Data Analysis

Learn how time-weighted averages are calculated, why they’re so powerful for data analysis, and how to use TimescaleDB hyperfunctions to calculate them faster.

251. Understanding Nested IF vs. SWITCH in Power BI: A Comparative Analysis

Comparing nested IF statements and SWITCH function in Power BI for efficient conditional logic and data analysis.

252. The Need for Data Analytics to Flood-Proof Property Investment

Did you know that the total risk of floods isn't accounted for in urban planning in the US due to a denial of climate change?

253. Conversational Analytics: the Next Generation of Data Analysis and Business Intelligence

The article talks about how data analytics is evolving at workplaces from traditional querying , excel and dashboards to natural language conversations

254. For Effective A/B/n Testing, Try Running Them Simultaneously

A/B testing is an indispensable method for measuring the real impact of features we develop and want to release.

255. How Smart Analytics Can Help Small Businesses Boost Sales

Technology has taken over the world, now is the time for small businesses to realize that what they need is tech. Smart analytics makes everything easier.

256. The Pain Points of Scaling Data Science

While building a machine learning model, data scaling in machine learning is the most significant element through data pre-processing. Scaling may recognize the difference between a model of poor machine learning and a stronger one.

257. How To Set Up Jupyter Notebook on a Windows Server

In this guide, we will walk you through the process of setting up Jupyer Notebook on a Windows server to enable remote access.

258. How To Predict Water Pumps Failure in Tanzania using CatBoost Library

It is based on the competition data. An example of data analysis, insights from the data. The CatBoost library for the baseline model. High-score results.

259. Behavioral Analytics: The Foundation of Targeted Marketing and Predictive Analytics

Learn how to capitalize on your business standards and increase the conversion rate by approximately 85% by analyzing customer behaviors with data you collect.

260. How to Use Public Web Data for Talent Intelligence and Sourcing

Learn how public web data can boost your talent sourcing efforts in both quality and quantity.

261. 5 Steps to Build a Data-Driven Product Development Culture

Early stages of product development are one of the most exciting times to be a part of the company as a product manager. One of the most important factors that drive the growth and success of the product is how quickly it acquires a large and loyal base of customers. You want these customers to use your product as frequently as possible with minimum friction.

262. What Are the Key Differences Between Qualitative and Quantitative Data?

This article uncovers the key differences between qualitative and quantitative data with examples.

263. How To Solve the Problem With Key Metrics In a B2B Product

To learn how B2B companies solve the problem with key metrics in a product, I caught up with Yuri Brankovsky who has worked in multiple digital products.

264. An Essential Plan to Get Your SQL Knowledge Ready for Interviews

SQL is the cornerstone of a wide variety of data-intensive roles, and it is not going anywhere soon. Loads have been written about its usefulness already, so this post is focused on getting your skills from C to A+ for your interviews. No knowledge is assumed, and I feel comfortable promising that your level will be more than enough for what will be asked in interviews if you follow this game-plan.

265. Using Data Attribution Comparison Table in Google Analytics 4

Google Analytics 4 is set up for better data attribution

266. Bifurcation Analysis of the Keynesian Cross Model: Results

This study investigates the Keynesian cross model of a national economy with a focus on the relationship between government spending and economic equilibrium.

267. Hate Speech Detection in Algerian Dialect Using Deep Learning: Background

In this paper, we proposed a complete end-to-end natural language processing (NLP) approach for hate speech detection in the Algerian dialect.

268. Keep Sharing Context: How to Enable Better and Faster Product Decisions

“So, what do you think?” — says the Product Manager after a product strategy presentation to his team

269. Understanding the Differences between Data Science and Data Engineering

A brief description of the difference between Data Science and Data Engineering.

270. My Grandmother Was a Data Analyst; She Just Didn’t Know It

Long before dashboards and algorithms, our grandmothers were reading patterns, tracking prices, and making data-driven decisions.

271. How I Use Math to Solve Business Problems

Learn to tackle business problems using data analysis. Follow a hypothesis driven approach to find root causes and actionable insights.

272. Mastering Exposure Points for Accurate Mobile A/B Testing

Learn why exposure points can make or break your mobile A/B tests, common pitfalls to avoid, and practical tips to improve your app experimentation results.

273. Mastering the Operational Logistics: Solving Problems When Client Relationships Are at Risk

Logistics is a sector where mistakes usually cost thousands of dollars at best and client relationships at worst. Here's how to avoid it.

274. 5 Simple Tips to Become a Better Data Scientist

In 2023, it’s important for data scientists to stay on top of the latest trends & advancements in order to remain competitive in the market.

275. Little-known Linear Regression Assumptions

The model should conform to these assumptions to produce a best Linear Regression fit to the data.

276. Automated Data Catalogs will Help Manage Data in 2022

Data is increasingly playing a dominant role in business. Know how automating your data catalog can help with efficient data management in 2022.

277. For Entry-Level Data Engineers: How to Build a Simple but Solid Data Architecture

This article aims to provide a reference for non-tech companies who are seeking to empower their business with data analytics.

278. Introduction to a Career in Data Engineering

A valuable asset for anyone looking to break into the Data Engineering field is understanding the different types of data and the Data Pipeline.

279. Exploring Large-Scale Data Warehousing: Log Analytics Solutions and Best Practices

This article describes a large-scale data warehousing use case to provide a reference for data engineers who are looking for log analytic solutions.

280. The Anatomy of K-means Clustering

Let’s say you want to classify hundreds (or thousands) of documents based on their content and topics, or you wish to group together different images for some reason. Or what’s even more, let’s think you have that same data already classified but you want to challenge that labeling. You want to know if that data categorization makes sense or not, or can be improved.

281. How to Improve VC Deal Sourcing Using Public Web Data

Learn how public web data can help you improve your deal sourcing methods.

282. How to Clean and Verify Address Data 'Without Using Code'

Today, data verification has become one of the greatest assets of an organization.

283. What Kind of Skills Are Required to Become a Data Analyst?

Discover the essential skills required to become a successful data analyst, including technical tools, analytical abilities, and key competencies for thriving.

284. How to Turn Messy Healthcare Ops Data Into ML-Ready Features

Learn how to turn messy healthcare ops data into validated, explainable, and reproducible ML-ready features that hold up in production.

285. Top 5 Factors Behind Data Analytics Costs

A custom integrated data analytics solution would cost at least $150,000-200,000 to build and implement.

286. Watch Out for Deceitful Data

Nowadays, most assertions need to be backed with data, as such, it is not uncommon to encounter data that has been manipulated in some way to validate a story.

287. Top Data Analyst Skills in 2021

Enhance your knowledge and skills in the field of data analytics with the help of data science certification for a rewarding career as a data analyst.

288. New Power BI Features For More Streamlined Data Analysis

Here are the new features of Power BI (unveiled at the Microsoft Ignite 2021) that can be absolutely beneficial for business users.

289. Can Machine Learning Predict Loan Defaults?

Visualize Insights and Discover Driving Features in Lending Credit Risk Model for Loan Defaults

290. How AI Empower Sustainable Growth of the Organisations

The propagation of artificial intelligence (AI) is making a significant impact on society, changing the way how we work, live, and communicate. AI today is allowing the world to diagnose diseases and develop clinical pathways. It is also being used to match individuals’ skill sets with job openings and create smart traffic that leads to the reduction of pollution. There are many examples of applying AI technologies in the sustainable growth of the planet and organisations. 

291. 89 Stories To Learn About Big Data Analytics

Learn everything you need to know about Big Data Analytics via these 89 free HackerNoon stories.

292. Creating a Dependable Data Pipeline for Your Small Business

In this article, I will be showing you how to build a reliable data pipeline for your small business to improve your productivity and data security.

293. What Is Big Data? Understanding The Business Use of Big Data Analytics

Big data analytics can be applied for all and any business to boost their revenue and conversions and identify their common mistakes.

294. Augmented Analytics & Data Storytelling: Covid Ups FP&A Demand

Businesses need agile tools to quickly identify and communicate actionable insights for more informed decision-making.

295. Mitigating Data Exfiltration: Four Ways to Detect and Respond to Unauthorized Data Transfers

Learn how to safeguard your data from unauthorized transfers with these 4 effective detection and response strategies.

296. 4 Data Analytics Certifications That Boost Your Career

The best data analytics certifications, which will provide you with the right kind of guidance to boost your big data analytics career and to get a great job.

297. How to Integrate AI Into Startup Operations for Enhanced Productivity

Learn how startups are boosting productivity by integrating AI into their operations, from automating tasks to enhancing decision-making.

298. The New Frontier of Price Optimization

What would be the price of our product or service? This question has bothered businesses forever. Several pricing models have been spawned as a result. But the concept of price optimization is a fairly new one. At least to businesses that are not in the hospitality or airline sector.  

299. What is a Citizen Data Scientist and How Do You Become One?

Data science has been democratized for the most part. AI is now mainstream! It's no longer the exclusive province of large companies with deep pockets.

300. Interpreting Big Data: Data Science vs Data Analytics

Data Science and Data Analytics are quite diverse but are related to the processing of Big data. The difference lies in the way they manipulate data.

301. A Guide to Growing your Average Order Value - A Price led Merchandising Segmentation Strategy

The goal of the retail merchandising activity is to support a retail strategy that generates value for the customers. The selection of the merchandise and the type of goods are the key of the retail strategies. According to author Michael Levy in Retailing Management, the decision to carry particular merchandise is tactical rather than strategic.

302. How IoT in Fintech is a Game Changer for Financial Markets

The financial sector is going through massive change. At the heart of this shift lies IoT in Fintech.

303. How to Build a Data Stack from Scratch

Overview of the modern data stack after interview 200+ data leaders. Decision Matrix for Benchmark (DW, ETL, Governance, Visualisation, Documentation, etc)

304. Five Undervalued Data Points for Emerging Businesses

Apparently, data has become more ubiquitous than the stars in the sky. In fact, the amount of data produced daily via the Internet is set to top 44 zettabytes. As you might assume, that’s more data than you could possibly fathom or use.

305. Measuring Product Impact When A/B Testing Is Not Available

How to evaluate product releases without an A/B test. A trustworthy framework using causal inference, Synthetic Control, and rigorous data guardrails.

306. Analytics vs. Instinct in Football

Player ratings lean on stats, but savvy coaches trust gut - seeing ripple effects stats miss. Post-game numbers often rewrite pre-game predictions.

307. 7 Essential Things in Excel That You Definitely Need to Master

Learn essential features for efficient data analysis and boost your professional productivity in Excel.

308. Why Sales Teams Love Business-Oriented Chatbots

The average wait time on a live chat is roughly 35 seconds, and with every growing second you have an unanswered chat, you're likely to lose a lead.

309. If Growth Marketing Is So Terrible, Why Don't Statistics Show It?

Growth marketing is a data-driven approach that focuses on acquiring, retaining, and growing a customer base, though there are those who argue that it is flawed

310. How Product Analysts Can Use ChatGPT To Boost Efficiency

Tips on using ChatGPT across various data analysis tasks, such as for stronger communication, unit economics estimations, and SQL tasks.

311. Low-Code Development Helps Data Scientists Uncover Analytical Insights

Emerging low-code development platforms enable Data Science teams to derive analytical insights from Big Data quickly.

312. Kimball & Inmon vs. the Retail Store

Years back I had read a blog about database scalability where it simplifies definition of scalability with activities in a kitchen. I was quite surprised how successful the comparison was. Come to think about it, technology is and should be inspired by what’s happening around us. This thinking pushed me into thinking and linking technology with my everyday life.

313. How to Scrape Data Off Wikipedia: Three Ways (No Code and Code)

Get your hands on excellent manually annotated datasets with Google Sheets or Python

314. How Much Can You Make as a Data Scientist?

Wondering how much data scientists make? We're here to help you find out about salaries in Data Science and how they are influenced by various factors.

315. Privacy Protection and Web3 Analytics

Though there have been more and more developers and product designers joining Web3.0 world in recent years, it is almost ignored by most of them that they are still using centralized infrastructure — data analytic tools — to build apps and webs. Every minute, project builders are making themselves part of the reason for data breach events, as they have to collect user data intendedly or unintendedly for product improvement.

316. The Noonification: Newton’s Laws in Society: How Social Forces Move Physical Bodies (4/3/2024)

4/3/2024: Top 5 stories on the HackerNoon homepage!

317. How to Understand Your Data in Real-Time Using bytewax and ydata-profiling

A hands-on tutorial on how to combine the open-source streaming solution, bytewax, with ydata-profiling, to improve the quality of your streaming flows.

318. From SQL Analytics to Predictive Decision Systems: Operationalizing ML Models in Business Operation

SQL explains the past, but ML drives decisions. Learn how to operationalize machine learning with feature stores, real-time inference, and production monitoring

319. Visualization of Hypothesis on Meteorological data

In this blog, we are gonna perform the analysis on the Meteorological data, and prove the hypothesis based on visualization.

320. How to Get Started Harnessing the Power of AI: Aligning Business Needs with the Right Solutions

Large Language Models (LLMs) have emerged as a transformative force in business technology, helping address specific business needs.

321. Disinfection Gateways: Could it Help Businesses Reopen?

Reopening businesses after a public health crisis is a complex process that requires a consistent strategy to minimize all possible risks. Contrary to popular belief, it does not amount to disinfection and masks wearing and includes measures like monitoring and data analysis. The following article is an overview of disinfection tunnels and their implications for reopening businesses.

322. How AI Has Enhanced Sentiment Analysis Using Product Review Data

Customer feedback is great. But have you been able to turn that feedback into meaningful customer insights? A few years back, brands depended on surveys to gauge customers’ feelings about how their products were performing.  

323. 5 Most Important Tips Every Data Analyst Should Know

The 5 things every data analyst should know and why it is not Python, nor SQL

324. The Visual Framing of COVID-19 on Search Engines

To collect the data, we used a novel algorithmic auditing approach based on large-scale simulation of user browsing behavior via virtual agents

325. ClimateNLP: Analyzing Public Sentiment Towards Climate Change: Materials and Methods

The natural language processing approaches can be applied to the climate change domain as well for finding the causes and leveraging patterns.

326. Native Analytics On Elasticsearch With Knowi

Table of Contents

327. How Data-Driven Investing Impacts Markets

Data-driven investing is reshaping the landscape of the financial markets, particularly in the realm of stock investments.

328. I Tracked My Happiness Every Day For A Year

I tracked my mental health each day throughout year. I rated my happiness on a scale of 1–5, with “1” being a really bad day, “2” being a kind of bad day, “3” being a neutral day, “4” being a kind of good day, and “5” being a really good day. I want to preface this article by saying that I understand how complex and difficult it is to try and quantify mental health. My absurdly simple, completely subjective, and inherently biased rating system is by no means an attempt to accurately represent the complexities of the mental health spectrum.

329. What Do Nobel Prize and Your Business Have in Common? Natural Experiments

A natural experiment is a type of causal analysis that has been widely adopted by many organizations and research fields. How can you use them?

330. Geospatial Analysis of Movement Patterns for Mobility & Delivery

How do my users move in this city? Where do they go? What does the “flow” of this city look like? How does that change throughout the day?

331. Comprehensive Data Analysis with SQL and Data Visualization: Alibaba User’s Behavior Investigation

This user behavior report is based on users’ orders from Alibaba between November 25th, 2017, and December 3rd, 2017 from the Alibaba platform…

332. Data Journalism 101: 'Stories are Just Data with a Soul'

Gone are the days when journalists simply had to find and report news.

333. Embracing the Shift: Future of Work in the Era of Automation

Embracing the changing face of work: understanding automation's influence and equipping yourself with skills for the future.

334. How to Aid Disease Research with a Biomedical Knowledge Graph

Building a biomedical knowledge graph using publicly available datasets to better aid disease research and biomedical data modelling.

335. Understanding the tech behind Snowflake’s IPO and what’s to come

By now you must have read quite a few articles about Snowflake’s absolutely mind-blowing and record-setting IPO. This article is not intended to speculate on whether the valuation makes sense or not, but rather help you understand the technological concepts that make Snowflake so unique, and why it has proven to be so disruptful for the data space in general and the data warehousing space in particular.

336. Can Big Data Solutions Be More Accessible And Affordable?

Below you can find the article of my colleague and Big Data expert Boris Trofimov.

337. An Introduction to Data Automation for Business Efficiency

In today’s competitive business landscape, data automation has become necessary for business sustainability. Despite the necessity, it also comes with a few challenges--collecting, cleaning, andputting it together--to get meaningful insights.

338. Applying Criminology Theories to Data Management: "The Broken Window Theory: and "The Perfect Storm"

What can be done to prevent “Broken Windows” in the primary data source? How can we effectively fix existing “Broken Windows"?

339. What Is Modern Business Intelligence?

This article gives insight into some basic features and functionality that a desirable modern BI software has and illustrated some examples.

340. Stop Deleting Outliers—Here’s What You Should Do Instead

Learn 3 simple, effective methods to detect and handle outliers in your data. Improve analysis accuracy and make smarter decisions with clean datasets.

341. Is GPT Powerful Enough to Analyze the Emotions of Memes?: Methodology

Explore how ChatGPT analyzes meme emotions in this study on AI-driven sentiment analysis of social media content.

342. Hate Speech Detection in Algerian Dialect Using Deep Learning: Our Methodology

In this paper, we proposed a complete end-to-end natural language processing (NLP) approach for hate speech detection in the Algerian dialect.

343. How to Use Tableau Visualization to Make a Covid Risk Model

In this paper, I used data from two different data sources and merged them together in the Tableau layer to perform the data analysis.

344. COVID-19: How to Set a Strong Recovery Strategy for Your Non-Profit

Across the globe, businesses are shutting their doors, laying off employees, and hunkering down financially in hopes of reemerging when the current pandemic eases. Unfortunately, it isn’t going to be easy, and many won’t make it. Non-profits are finding themselves in the same position. 

345. Hate Speech Detection in Algerian Dialect Using Deep Learning: Related Work

In this paper, we proposed a complete end-to-end natural language processing (NLP) approach for hate speech detection in the Algerian dialect.

346. How To Invest More Intelligently

A decade and half ago, when I first started getting interested in public markets investing, I had started by making a small portfolio that I could play with. Just to see if I actually knew what I thought I knew, and to try and make sense of my beliefs about the market. And no surprise, I didn't know what I thought I knew.

347. Data Analysis Applied to Auto-Increment API fields

This article discusses the security risks of using auto-increment fields in API responses and methods to prevent data leaks and protect business metrics.

348. Using Data Analytics Effectively in Marketing

How to make your data work harder for you in marketing

349. Unleash the Power of Interactive Data: Python & Plotly

Discover the power of data visualization with Plotly in Python. Learn to transform raw data into interactive, insightful visuals and create dynamic dashboard

350. Top 5 Business App Categories during COVID-19

In the business world we have to do research and analysis to get to know what users/market have been looking for. So, here is one of our online analytics regarding the apps that are searched by users. We have been doing research on our test website name https://flutterappworld.com/ and i found that these imagine stat that help you to build apps with target markets that have most searches through Google. This stat is based on Google Website and Google analytics data.

351. Open Source is the Only Way to Address the Long Tail of Integrations

Wouldn’t it be great to bring the time needed to build a new data integration connector down to 10 minutes? This would definitely help address the long tail of

352. 6 Tips for Working With Analysts and Data Engineers

What work does a data engineer actually do? Let me tell you one thing: it’s not what you think they should be doing, especially not the part where they are running around collecting data for you or building yet another one of those dashboards that will only be used for a few weeks.

353. Crypto Use Explodes, Data Will Help Investors Make Better Decisions

Investors need good data to make good decisions, and new AI platforms will provide deeper analysis

354. Using User Data After Google's Third-party Cookies Ban

Google announced that it would ban the usage of third-party cookies; it has made a lot of publishers afraid that they won't be able to utilize user data.

355. Google Shopping Market Analysis of Sports & Outdoors

The Market analysis of Google Shopping for Sports & Outdoor Category. The insights consists of Top retailers, brands, products and price changes.

356. How to Scan Your Systems for Personal Data

Learn the many different ways you can scan for personal data across your organisation's customer-facing online systems in this practical, how-to guide.

357. Bifurcation Analysis of the Keynesian Cross Model: Abstract and Introduction

This study investigates the Keynesian cross model of a national economy with a focus on the relationship between government spending and economic equilibrium.

358. Big Data: 70 Increíbles Fuentes de Datos Gratuitas que Debes Conocer para 2020

Por favor clic el artículo original:http://www.octoparse.es/blog/70-fuentes-de-datos-gratuitas-en-2020

359. Stop Torturing Your Data: How to Automate Rigor With AI

Why improvisation kills research, and how to use AI to enforce methodological discipline.

360. How Well Can Data Exploration and Analysis Help in Diabetes Management?

Dietary recall questionnaires provide a wealth of information that can be analysed to uncover patterns and correlations.

361. Tired of Dirty Data? It’s Time to Implement a Data Scrubbing Initiative

Raw data coming in from various sources is often inherently dirty data, rife with factual errors, typos and inaccuracies. Left unattended, this data becomes a nightmare. Imagine having to pull a report only to realize it has duplicated data – not to mention half of them don’t even have valid phone numbers or addresses. Your boss is not going to be happy.

362. Conversational AI is Changing the Way We Interact With Data

Large language models like ChatGPT are making it easier to manage data. Akkio has come up with an LLM-based tool to manage tabular data using conversational AI.

363. Does Your Work as a Data Analyst Matter?

We wonder: Is anyone actually reading this? Does this dashboard change anything?

364. Hate Speech Detection in Algerian Dialect Using Deep Learning: Conclusion, Acknowledgments

In this paper, we proposed a complete end-to-end natural language processing (NLP) approach for hate speech detection in the Algerian dialect.

365. Making Sense of Unbounded Data & Real-Time Processing Systems

A real-time processing architecture should have these logical components to address event ingestion & processing challenges, such as a stream processing system.

366. Top 3 Things You Forget When Building Your SaaS Product

While the number of product management roles in the US has grown by more than 30% in two years, according to LinkedIn, the responsibilities of the job are morphing.

367. 5 Real Ways to Start Implementing AI in your Ecommerce Stores

The implementation of AI in ecommerce should come as no surprise. Online businesses have always been quick to adopt new technologies, and this is how the industry thrives; enhancing the customer experience, discovering new markets, and driving further sales. And with the continued development of AI technology like chatbots, visual search, and personalized recommendations, the world of ecommerce is transforming again.

368. Analysis And Prediction on HR Data Set for Beginners

Are you a newbie when it comes to Data Analysis and Data modelling? If yes, then you are in the right place.

369. Using Dynamic Pricing to Internationally Scale an eCommerce Business

Be profitable from the first minute with dynamic pricing adapted to each market

370. Is GPT Powerful Enough to Analyze the Emotions of Memes?: References

Explore how ChatGPT analyzes meme emotions in this study on AI-driven sentiment analysis of social media content.

371. How to Improve Social Media Campaign Using Data Visualization

Learn what social media data visualization is and why it is important.

372. Hate Speech Detection in Algerian Dialect Using Deep Learning: Abstract & Intro

In this paper, we proposed a complete end-to-end natural language processing (NLP) approach for hate speech detection in the Algerian dialect.

373. Demystifying Data and Serialization Templates with JSON, YAML, and Jinja

Learn the differences between JSON, YAML, and Jinja, and how to choose the right tool for your project.

374. Using Data Science to Predict Effects of New UK Fishing Zonal Attachment Proposal

The UK government is pushing for a “zonal attachment” model, where quotas would be carved up relative to the abundance of fish in each country’s waters.

375. Data Engineering Hack: Using CDPs for Simplified Data Collection

From simplifying data collection to enabling data-driven feature development, Customer Data Platforms (CDPs) have far-reaching value for engineers.

376. Tire Manufacturers Are Rolling Out Consolidated Data Management

In the tire manufacturing industry, conventional tools are no longer enough for handling the complexities of product data and can seriously damage your brand

377. Detecting and Mitigating Fake Contact Data: A Case Study with Apple Ecosystem Signals

Enhance lead scoring with iPhone Lookup—check iMessage & FaceTime activity to ensure valid, reachable phone numbers.

378. Is GPT Powerful Enough to Analyze the Emotions of Memes?: Abstract & Intro

Explore how ChatGPT analyzes meme emotions in this study on AI-driven sentiment analysis of social media content.

379. Bifurcation Analysis of the Keynesian Cross Model: Method and G is constant

This study investigates the Keynesian cross model of a national economy with a focus on the relationship between government spending and economic equilibrium.

380. Lets Study the Seattle Airbnb Data

So, recently I started my Udacity Nanodegree on Data Scientist. To be honest the first project speaks about CRISP-DM which is CRoss-Industry Standard Process for Data Mining.Let's leave it apart and start working on what we learn from the dataset.

381. How To Segment Shopify Customer Base with Google Sheets and Google Data Studio

After defining what the RFM analysis is standing for, and how you can apply it to your Customer Base, I want to show you how to apply it on Shopify orders data.

382. Is GPT Powerful Enough to Analyze the Emotions of Memes?: Discussion

Explore how ChatGPT analyzes meme emotions in this study on AI-driven sentiment analysis of social media content.

383. Clean Up Your Data by Removing Duplicate Data Using these Tools

In this blog, we will look at what a data deduplication software is, the most crucial features and functionalities found in such a tool, and how it can help you

384. What Can You Do When a MAP Violation Occurs?

Here are some things that you can do when a MAP violation occurs when you least expect it.

385. 6 Data Analytics Growth Hacks for SMBs

Data analytics offers you amazing capabilities to grow your business. Leverage the power of these amazing data analytics hacks to reach your business goals.

386. ClimateNLP: Analyzing Public Sentiment Towards Climate Change: Conclusions and References

The natural language processing approaches can be applied to the climate change domain as well for finding the causes and leveraging patterns.

387. We Investigated L.A.’s Homelessness Scoring System: Here's How We Did It

This article describes our analyses’ data sources, methodologies, findings, and limitations.

388. How AI and Data Analytics Will Impact The Era of COVID-19

Artificial intelligence (AI) and data analytics are rapidly growing trends in the tech world. With increasing potential for innovation, it is paramount that we stay up to date with all the latest developments in this field. According to MarketsandMarkets, the worldwide artificial intelligence (AI) market will increase from USD 58.3 billion in 2021 to USD 309.6 billion by 2026, at a compound annual growth rate (CAGR) of 39.7 percent over the projected period. It seems that every company wants a piece of this growing pie. By 2022 it is expected that 90% of companies will be using some form of artificial intelligence for data analytics purposes.

389. Is Your Business Suffering from Big Data Burnout? 5 Ways to Democratize Data

With so much data available at your fingertips, if you fail to implement a strong system, your business is at risk of suffering from big data burnout.

390. Unboxing History: The Importance of Processing and Presenting Historical Archives

Many NGOs and archival projects excel in accumulating historical records but often neglect the equally important tasks of processing and presenting data.

391. Choosing an OLAP Engine for Financial Risk Management: What to Consider?

This post provides reference for what you should take into account when choosing an OLAP engine in a financial scenario.

392. Data Scientist Careers at Amazon: What You'll Earn, Learn, and Work On

Find out what it means to be a data scientist at Amazon! Their salaries, roles and required experience, types of data positions, and interview process.

393. Is GPT Powerful Enough to Analyze the Emotions of Memes?: Experiment Results

Explore how ChatGPT analyzes meme emotions in this study on AI-driven sentiment analysis of social media content.

394. Qualitative Emergence: The Paradox of Statistical AI in Language Comprehension - What to Know

Explore how AI language models create coherent content through statistical processes, contrasting AI's approach with human cognition and examining its potential

395. ClimateNLP: Analyzing Public Sentiment Towards Climate Change: Proposed Approach

The natural language processing approaches can be applied to the climate change domain as well for finding the causes and leveraging patterns.

396. Embedded data analytics and reporting tools that empowers Business analysts

Embedded data analytics and reporting tools that empowers Business analysts

397. ClimateNLP: Analyzing Public Sentiment Towards Climate Change: Abstract & Intro

The natural language processing approaches can be applied to the climate change domain as well for finding the causes and leveraging patterns.

398. The Math Behind TDA: A Primer on Persistent Homology

Explaining key components like Vietoris-Rips complexes, persistence diagrams, and Wasserstein distance, highlighting why TDA is a robust tool for analyzing

399. Can a Data Scientist Drown a City in 3 Feet of Water?

Yes, Let’s dive into the details.

400. The Three Basic Benefits of a Virtual Data Room

The popularity of online virtual data rooms has increased over the years. These are innovative software used for safe storage and sharing of files. As the world is modernizing, people are using advanced technology to carry out their daily tasks. As everything today is digital, it becomes more and more crucial to look for new methods to store files. Gone are the days when people used to pile up hard copies of all the files in the offices. Some people are still seen doing that which wastes half of their time. Imagine you have a business meeting in some time and you can’t find a specific file because there is a huge unorganized bundle of files in your office. With virtual data rooms, all your files are well organized. You do not have to get into a hassle of finding a certain file. With just one click, the file appears in front of you in no time.

401. Why You Need a Flawless Post Merger Integration Strategy

Over 80 percent of mergers fail due to poor planning and delays. Some of the biggest causes behind poor planning are a lack of strategy, unclear vision, inept mindset, invasive culture, and most importantly lack of data understanding.

402. Stop Guessing What Customers Want With This Analysis Technique

Voice of Customer analysis is powerful and can create important and long-lasting change in your business, but it is not a one-time solution to a problem.

403. A Brief Intro to 8 Ways AI Could Improve Patient Care

How much data does a hospital produce each day? How much information are they capable of storing, analyzing, and sharing with physicians and patients? 

404. Adopt the Automation Route to Scale Up Your Business

Machine Learning is advancing steadily, enabling computers to understand natural language patterns and think somewhat like humans. The advances in Artificial Intelligence (AI) are increasing the prospects of businesses to automate tasks. With automation, you can save time and bring in more productivity for your business.

405. Bifurcation Analysis of the Keynesian Cross Model: Conclusion and References

This study investigates the Keynesian cross model of a national economy with a focus on the relationship between government spending and economic equilibrium.

406. The Noonification: PayPal Abandoned Me; Crypto Saved Me (5/23/2024)

5/23/2024: Top 5 stories on the HackerNoon homepage!

407. ClimateNLP: Analyzing Public Sentiment Towards Climate Change: Related Studies

The natural language processing approaches can be applied to the climate change domain as well for finding the causes and leveraging patterns.

408. ClimateNLP: Analyzing Public Sentiment Towards Climate Change: Results and Discussions

The natural language processing approaches can be applied to the climate change domain as well for finding the causes and leveraging patterns.

409. The Unseen Battleground: An Architect’s Retro on Streaming 1 Billion Minutes of Live Sports

Streaming 1B minutes of live sports: hard choices, scars, and lessons in building real-time, petabyte-scale systems.

410. Can Your Organization's Data Ever Really Be Self-Service?

Self-serve systems are a big priority for data leaders, but what exactly does it mean? And is it more trouble than it's worth?

411. Hate Speech Detection in Algerian Dialect Using Deep Learning: Experiments and Results

In this paper, we proposed a complete end-to-end natural language processing (NLP) approach for hate speech detection in the Algerian dialect.

412. How a Customer 360 and Dgraph Cloud Can Help Improve User Retention

Successful corporations leverage a customer 360. Check out how Dgraph can provide top-notch analyses quickly and effectively.

413. Dear Marketer: Every Average Lies, You Must Go Deeper

“It’s a basic truth of the human condition that everybody lies. The only variable is about what…” “Truth begins in lies…"

414. Self-service Data Preparation Tools Can Optimize Big Data Efficiency for the IT Team

Self-service data preparation tools are designed for business users to process data without relying on IT, but that doesn’t mean IT users can't benefit too.

415. Practical Tips to Improve Customer Experience with Data

According to a report, almost 70% of companies compete on customer experience.

416. Big Data in Pharma: What It Is and How It’s Used

We look into the potential of big data in pharma and explore essential ways in which the big data technology changes the way drugs are developed.

417. Which Profession Does Society Celebrate the Most? A Data-Driven Dive into Madame Tussauds

Unveiling the most celebrated professions: Explore Madame Tussauds' wax figures and discover societal preferences. An intriguing data-driven analysis.

418. Effective Tools To Make A Great Relationships Analysis in Game of Thrones [Part 2]

In the last post, we showed the character relationship for the Game of Thrones by using NetworkX and Gephi. In this post, we will show you how to access data in Nebula Graph by using NetworkX.

419. How to Generate Sales from Inactive Customers and Boost E-commerce

What does marketing automation mean? Are activities planned triggered on user-generated events. Simple and clear. 

420. Is GPT Powerful Enough to Analyze the Emotions of Memes?: Related Work

Explore how ChatGPT analyzes meme emotions in this study on AI-driven sentiment analysis of social media content.

421. Displacement of Low-Income Minorities in San Francisco

San Francisco’s housing crisis perpetuates the poverty trap faced by low income minorities in the city.

422. Fırat Civaner Became Sad About People Who Lost Their Jobs, Health And Lives

Fırat Civaner from Turkey has been nominated for a 2020#Noonie in the Future Heroes and Technology categories..

423. Opportunities And Challenges of Integrating Power Business Intelligence

Powerful, affordable, effective—these are just three of the terms used to describe Microsoft’s Power BI data visualization platform. Its status as a world-leading business intelligence tool supports all the positive hype to be found across the IT media world. 

424. 5 Most Common Data Quality Issues For Business

With the advent of data socialization and data democratization, many organizations organize, share and make information available to all employees in an efficient manner. While most organizations benefit from liberal use of such a source of information available to their employees, others struggle with the quality of the data they use.

425. Noonies Interview: Luke Calton On Product Management and Writing

Luke Calton is Noonies Nominee and a Product Manager at a start-up.

426. How Pandemic Testing Protocols Vary Across The USA

To determine how testing protocols for COVID-19 vary across the United States, we sent requests under public records laws to all 50 states, New York City, and Washington, D.C. The requests were sent to health departments the week of March 16 and were identical. The database below contains responses we have received, as well as publicly available guidance from some jurisdictions.

427. A Guide to Understanding Pandas Series Labeled and Unlabeled Data Structures

Learn how to create and manipulate pandas Series, versatile data structures for efficient data handling in Python. Explore labeled and unlabeled formats,

Thank you for checking out the 427 most read blog posts about Data Analysis on HackerNoon.

Visit the /Learn Repo to find the most read blog posts about any technology.

人工智能治理之所以失效,是因为它无法落实

2026-04-30 03:45:08

\ AI governance rarely fails because organisations lack policies. It fails because those policies behave like ceremonial artefacts while delivery pipelines keep moving at production speed. Somewhere between a neatly written PDF and a deployed model, intent evaporates.

The result is familiar: teams improvise, exceptions multiply, and governance becomes a negotiation rather than a system. In high-stakes environments, especially healthcare and life sciences, that gap is not just inconvenient. It is an operational risk.

The idea behind Governance That Ships is deceptively simple: governance should behave like software. It should have inputs, outputs, enforcement points, and observable results. It should run continuously, not quarterly and most importantly, it should produce evidence as a byproduct of doing the work, not as a separate ritual.

Governance becomes real only when it is embedded into the mechanics of delivery.

The operating model: Policy → Controls → Evidence → Metrics

At the core of this approach is a pipeline that feels almost mechanical:

  • Policy defines intent
  • Controls enforce behaviour
  • Evidence proves execution
  • Metrics validate outcomes

This is not a theoretical framework. It mirrors how mature security and compliance systems already operate. Controls are not suggestions, they are gates. Evidence is not documentation, it is exhaust. Metrics are not vanity dashboards, they are feedback loops.

The shift here is subtle but powerful. Governance stops being something teams “comply with” and becomes something the system does automatically.

If a control cannot produce evidence without manual effort, it is not a control. It is a hope.

Deciding how much governance is enough

Not every AI system deserves the same level of scrutiny. Treating them equally is how organisations either slow to a crawl or expose themselves unnecessarily.

A practical governance system introduces risk tiers that determine the intensity of controls:

| Tier | Description | Typical Controls | |----|----|----| | Minimal | Internal tools, low impact, no sensitive data | Basic registration, lightweight checks | | Limited | User-facing, moderate risk, content or automation | Documentation, prompt review, security testing | | High | Regulated or high-impact decisions | Formal risk assessment, strict change control, audit logging | | Prohibited | Unacceptable use cases | Blocked at design and deployment |

This structure aligns naturally with regulatory thinking and risk management frameworks. It also gives engineering teams something they crave: clarity.

Instead of asking “What should we do?”, teams ask “Which tier is this, and what does that trigger?”

Good governance removes ambiguity. Great governance removes debate.

Governance inside the pipeline

Policies written in documents are advisory. Policies encoded into pipelines are executable.

This is where policy-as-code enters the scene. The same way infrastructure is validated before deployment, AI systems can be gated by rules that check:

  • whether a use case is registered and classified
  • whether the required documentation exists
  • whether evaluation results meet thresholds
  • whether access to sensitive data follows least privilege

These checks run automatically during CI/CD. They do not wait for a committee meeting. They do not depend on memory or goodwill.

The pattern is already well understood in engineering ecosystems. Tools like Open Policy Agent demonstrate how rules can be versioned, reviewed, and enforced consistently.The safest system not only has the best policies, but techniacally unable to break them.

Turning principles into executable checks

In traditional software, quality is enforced through tests. AI governance should behave the same way.

Instead of abstract requirements, governance becomes a set of executable jobs:

  • evaluation pipelines that measure model behavior
  • security tests simulating prompt injection or data leakage
  • validation checks for output handling
  • thresholds that determine go or no-go decisions

This transforms governance into something tangible. A failing governance requirement looks exactly like a failing test. It blocks the release.

This approach also aligns with established practices in ML production readiness, where systems are evaluated continuously rather than assumed to be correct.

If governance cannot fail a build, it cannot protect production.

LLM-specific controls: where things get interesting

GenAI systems introduce risks that traditional governance models were not designed for, like prompt injection, output manipulation, and tool misuse. These are not edge cases, they are structural properties.

Effective governance must therefore include controls tailored to LLM behaviour:

  • strict separation of system instructions and user input
  • controlled tool access and allowlists
  • output validation before execution
  • safeguards against data exfiltration
  • safe defaults and graceful failure modes

These are not theoretical constructs. They directly map to known vulnerability classes documented in frameworks like OWASP for LLM applications. LLM governance is less about what the model knows and more about what the system allows it to do.

Evidence as a product, not a byproduct

One of the most underappreciated aspects of governance is evidence. Auditors do not trust intent. They trust records.

In a system that ships governance, evidence is generated automatically:

  • model cards describing intended use and limitations
  • data documentation explaining provenance and constraints
  • evaluation reports showing performance and risks
  • logs capturing decisions, changes, and actions

These artefacts are not created for audits. They are created because the system requires them to function. This aligns with management system standards where organisations must demonstrate control through documented processes and records.

The strongest audit position is achieved when evidence already exists before anyone asks for it.

Governance that accelerates, not slows

There is a persistent myth that governance and speed are opposites. In practice, poorly designed governance slows teams down. Well-designed governance removes friction.

By standardising controls, automating checks, and clarifying expectations, teams spend less time negotiating and more time building. Decisions become predictable. Releases become safer.

And perhaps most importantly, governance scales. It no longer depends on a handful of experts reviewing everything manually. It becomes part of the system’s DNA.

The real goal of governance is not control, it’s a momentum without chaos.

Final thought: make the right thing the default

The most elegant governance systems share a common trait - they do not force teams to behave correctly. They make correct behaviour the easiest path.

When policies are encoded into tools, when controls are invisible but effective, when evidence flows naturally, governance stops feeling like oversight and starts feeling like infrastructure. In that moment, governance stops being something you enforce,  it becomes something you rule.

\