MoreRSS

site iconHackerNoonModify

We are an open and international community of 45,000+ contributing writers publishing stories and expertise for 4+ million curious and insightful monthly readers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of HackerNoon

How ShareChat Scaled their ML Feature Store 1000X without Scaling the Database

2026-02-03 19:00:03

How ShareChat engineers rebuilt a low-latency ML feature store on ScyllaDB after an initial scalability failure — and what they learned along the way

The demand for low-latency machine learning feature stores is higher than ever, but actually implementing one at scale remains a challenge. That became clear when ShareChat engineers Ivan Burmistrov and Andrei Manakov took the P99 CONF 23 stage to share how they built a low-latency ML feature store based on ScyllaDB.

This isn’t a tidy case study where adopting a new product saves the day. It’s a “lessons learned” story, a look at the value of relentless performance optimization – with some important engineering takeaways.

The original system implementation fell far short of the company’s scalability requirements. The ultimate goal was to support 1 billion features per second, but the system failed under a load of just 1 million. With some smart problem solving, the team pulled it off though. Let’s look at how their engineers managed to pivot from the initial failure to meet their lofty performance goal without scaling the underlying database.

\

ShareChat: India’s Leading Social Media Platform

To understand the scope of the challenge, it’s important to know a little about ShareChat, the leading social media platform in India. On the ShareChat app, users discover and consume content in more than 15 different languages, including videos, images, songs and more. ShareChat also hosts a TikTok-like short video platform (Moj) that encourages users to be creative with trending tags and contests.

Between the two applications, they serve a rapidly growing user base that already has over 325 million monthly active users. And their AI-based content recommendation engine is essential for driving user retention and engagement.

\

Feature Stores at ShareChat

This story focuses on the system behind ML feature stores for the short-form video app Moj. It offers fully personalized feeds to around 20 million daily active users, 100 million monthly active users. Feeds serve 8,000 requests per second, and there’s an average of 2,000 content candidates being ranked on each request (for example, to find the 10 best items to recommend). “Features” are pretty much anything that can be extracted from the data:

Ivan Burmistrov, principal staff software engineer at ShareChat, explained:

We compute features for different ‘entities.’ Post is one entity, User is another and so on. From the computation perspective, they’re quite similar. However, the important difference is in the number of features we need to fetch for each type of entity. When a user requests a feed, we fetch user features for that single user. However, to rank all the posts, we need to fetch features for each candidate (post) being ranked, so the total load on the system generated by post features is much larger than the one generated by user features. This difference plays an important role in our story.

\

Why the Initial Feature Store Architecture Failed to Scale

At first, the primary focus was on building a real-time user feature store because, at that point, user features were most important. The team started to build the feature store with that goal in mind. But then priorities changed and post features became the focus too. This shift happened because the team started building an entirely new ranking system with two major differences versus its predecessor:

  • Near real-time post features were more important
  • The number of posts to rank increased from hundreds to thousands

Ivan explained: “When we went to test this new system, it failed miserably. At around 1 million features per second, the system became unresponsive, latencies went through the roof and so on.”

Ultimately, the problem stemmed from how the system architecture used pre-aggregated data buckets called tiles. For example, they can aggregate the number of likes for a post in a given minute or other time range. This allows them to compute metrics like the number of likes for multiple posts in the last two hours.

\ \ Here’s a high-level look at the system architecture. There are a few real-time topics with raw data (likes, clicks, etc.). A Flink job aggregates them into tiles and writes them to ScyllaDB. Then there’s a feature service that requests tiles from ScyllaDB, aggregates them and returns results to the feed service.

The initial database schema and tiling configuration led to scalability problems. Originally, each entity had its own partition, with rows timestamp and feature name being ordered clustering columns. [Learn more in this NoSQL data modeling masterclass]. Tiles were computed for segments of one minute, 30 minutes and one day. Querying one hour, one day, seven days or 30 days required fetching around 70 tiles per feature on average.

If you do the math, it becomes clear why it failed. The system needed to handle around 22 billion rows per second. However, the database capacity was only 10 million rows/sec.

 

Early Feature Store Optimizations: Data Modeling and Tiling Changes

At that point, the team went on an optimization mission. The initial database schema was updated to store all feature rows together, serialized as protocol buffers for a given timestamp. Because the architecture was already using Apache Flink, the transition to the new tiling schema was fairly easy, thanks to Flink’s advanced capabilities in building data pipelines. With this optimization, the “Features” multiplier has been removed from the equation above, and the number of required rows to fetch has been reduced by 100X: from around 2 billion to 200 million rows/sec.

The team also optimized the tiling configuration, adding additional tiles for five minutes, three hours and five days to one minute, 30 minutes and one day tiles. This reduced the average required tiles from 70 to 23, further reducing the rows/sec to around 73 million.

\

To handle more rows/sec on the database side, they changed the ScyllaDB compaction strategy from incremental to leveled. [Learn more about compaction strategies]. That option better suited their query patterns, keeping relevant rows together and reducing read I/O. The result: ScyllaDB’s capacity was effectively doubled.

The easiest way to accommodate the remaining load would have been to scale ScyllaDB 4x. However, more/larger clusters would increase costs and that simply wasn’t in their budget. So the team continued focusing on improving the scalability without scaling up the ScyllaDB cluster.

\

Improving Feature Store Cache Locality to Reduce Database Load

One potential way to reduce the load on ScyllaDB was to improve the local cache hit rate, so the team decided to research how this could be achieved. The obvious choice was to use a consistent hashing approach, a well-known technique to direct a request to a certain replica from the client based on some information about the request. Since the team was using NGINX Ingress in their Kubernetes setup, using NGINX’s capabilities for consistent hashing seemed like a natural choice. Per NGINX Ingress documentation, setting up consistent hashing would be as simple as adding three lines of code. What could go wrong?

A bit. This simple configuration didn’t work. Specifically:

\

  • The client subset led to a huge key remapping – up 100% in the worst case. Since the node keys can be changed in a hash ring, it was impossible to use real-life scenarios with autoscaling. [See the ingress implementation]

  • It was tricky to provide a hash value for a request because Ingress doesn’t support the most obvious solution: a gRPC header.

  • The latency suffered severe degradation, and it was unclear what was causing the tail latency.

To support a subset of the pods, the team modified their approach. They created a two-step hash function: first hashing an entity, then adding a random prefix. That distributed the entity across the desired number of pods. In theory, this approach could cause a collision when an entity is mapped to the same pod several times. However, the risk is low given the large number of replicas.

\

\ Ingress doesn’t support using gRPC header as a variable, but the team found a workaround: using path rewriting and providing the required hash key in the path itself. The solution was admittedly a bit “hacky” … but it worked.

Unfortunately, pinpointing the cause of latency degradation would have required considerable time, as well as observability improvements. A different approach was needed to scale the feature store in time.

To meet the deadline, the team split the Feature service into 27 different services and manually split all entities between them on the client. It wasn’t the most elegant approach, but, it was simple and practical – and it achieved great results. The cache hit rate improved to 95% and the ScyllaDB load was reduced to 18.4 million rows per second. With this design, ShareChat scaled its feature store to 1B features per second by March.

\ \n

However, this “old school” deployment-splitting approach still wasn’t the ideal design. Maintaining 27 deployments was tedious and inefficient. Plus, the cache hit rate wasn’t stable, and scaling was limited by having to keep a high minimum pod count in every deployment. So even though this approach technically met their needs, the team continued their search for a better long-term solution.

\

The Next Phase of Optimizations: Consistent Hashing, Feature Service

Ready for yet another round of optimization, the team revisited the consistent hashing approach using a sidecar, called Envoy Proxy, deployed with the feature service. Envoy Proxy provided better observability which helped identify the latency tail issue. The problem: different request patterns to the Feature service caused a huge load on the gRPC layer and cache. That led to extensive mutex contention.

The team then optimized the Feature service. They:

  • Forked the caching library (FastCache from VictoriaMetrics) and implemented batch writes and better eviction to reduce mutex contention by 100x.
  • Forked gprc-go and implemented buffer pool across different connections to avoid contention during high parallelism.
  • Used object pooling and tuned garbage collector (GC) parameters to reduce allocation rates and GC cycles.

With Envoy Proxy handling 15% of traffic in their proof-of-concept, the results were promising: a 98% cache hit rate, which reduced the load on ScyllaDB to 7.4M rows/sec. They could even scale the feature store more: from 1 billion features/second to 3 billion features/second.

\

Lessons Learned from Scaling a High-Performance Feature Store

Here’s what this journey looked like from a timeline perspective:

\ To close, Andrei summed up the team’s top lessons learned from this project (so far):

  • Use proven technologies. Even as the ShareChat team drastically changed their system design, ScyllaDB, Apache Flink and VictoriaMetrics continued working well.
  • Each optimization is harder than the previous one – and has less impact.
  • Simple and practical solutions (such as splitting the feature store into 27 deployments) do indeed work.
  • The solution that delivers the best performance isn’t always user-friendly. For instance, their revised database schema yields good performance, but is difficult to maintain and understand. Ultimately, they wrote some tooling around it to make it simpler to work with.
  • Every system is unique. Sometimes you might need to fork a default library and adjust it for your specific system to get the best performance.

\

Why Sales IT Data Analysts Are Becoming Critical to Hospitality Revenue in 2026

2026-02-03 18:59:32

In the fast-evolving hospitality landscape of 2026, a new specialist is quietly driving outsized results: the Sales IT Data Analyst. This hybrid professional bridges sales strategy, robust IT infrastructure, and advanced data analytics to optimize revenue streams across hotels, resorts, restaurants, and event venues. Unlike traditional revenue managers or general data analysts, these experts specialize in integrating real-time sales data flows with enterprise IT systems such as property management systems (PMS), CRM platforms, and point-of-sale (POS) tools to deliver hyper-targeted insights that boost bookings, average daily rates (ADR), and guest lifetime value.

Hospitality & Resorts Dashboard in Power BI - PK: An Excel Expert

This role has become increasingly important amid recovering travel demand, intense competition from short-term rentals, and guest expectations for seamless, personalized experiences. By fusing sales acumen with IT proficiency and data rigor, these analysts turn raw transaction logs, occupancy sensors, and customer feedback into actionable revenue strategies that traditional teams often overlook.

Defining the Sales IT Data Analyst Role in Hospitality

Sales IT Data Analysts in hospitality dive deep into multi-source data ecosystems. They track sales pipelines from initial inquiries and website conversions to on-property upsells and post-stay loyalty redemptions. They collaborate closely with IT teams to ensure data integrity across fragmented systems, legacy PMS software, cloud-based revenue management tools, and emerging IoT sensors in rooms or dining areas.

Key responsibilities include:

  • Forecasting demand using historical sales patterns, competitor pricing, and external signals like local events or weather APIs.
  • Optimizing dynamic pricing and promotional campaigns by analyzing conversion rates and customer segmentation in real time.
  • Auditing IT data pipelines for accuracy, identifying bottlenecks in sales reporting, and recommending integrations (e.g., API connections between CRM and booking engines).
  • Measuring campaign ROI across channels, OTAs, direct bookings, and corporate accounts while flagging anomalies like fraud in high-volume group sales.

Their work directly impacts on core hospitality metrics such as RevPAR (revenue per available room), TRevPAR (total revenue per available room), and occupancy rates.

Hotel Revenue Management Power BI Dashboard Templates – DataFlip.co

Core Skills and Tools for Success

Success demands a versatile toolkit:

  • Technical proficiency: Advanced SQL for querying large datasets, Python/R for predictive modeling, and visualization platforms like Power BI, Tableau, or Looker Studio.
  • Hospitality domain knowledge: Deep understanding of KPIs such as RevPAR, ADR, RevPASH (revenue per available seat hour for F&B), and segmentation by guest type (leisure, business, group).
  • IT integration skills: Experience with APIs, ETL processes, and cloud platforms (AWS, Azure) to unify data from disparate sources like Oracle Opera PMS or Salesforce CRM.
  • Emerging capabilities: Basic machine learning for demand forecasting and familiarity with AI-driven personalization engines.

Analysts often leverage tools such as IDeaS or Duetto for revenue optimization, alongside custom scripts that pull live sales data into dashboards.

Hospitality – Hotel Dashboard

Unique to this role is the ability to translate technical outputs into sales team language crafting A/B test recommendations for email campaigns or identifying high-value corporate accounts ripe for upselling premium packages.

Real-World Impact: Driving Revenue Through Integrated Insights

Consider a mid-sized urban hotel chain in 2025-2026 recovering from seasonal dips. A Sales IT Data Analyst might discover through integrated analysis that business travelers booking via corporate portals show 40% higher ancillary spend (spa, dining) when offered AI-curated packages based on past stay data. By partnering with IT to implement real-time API triggers in the CRM, the team launches personalized offers at checkout, yielding a 12-15% uplift in TRevPAR for that segment—far exceeding industry averages from generic promotions.

Another scenario involves restaurants within resorts: Analysts merge POS sales data with reservation systems to predict peak dining hours influenced by hotel occupancy and local events. This informs dynamic menu pricing, staff scheduling via AI tools, and targeted upselling via table-side apps, reducing waste and increasing per-cover revenue.

Data analytics has enabled hotels to increase RevPAR by up to 10% in the first year of implementation in some cases, with advanced segmentation and predictive tools amplifying results further.

Future Trends Shaping the Role in 2026 and Beyond

Looking ahead, Sales IT Data Analysts will increasingly harness AI and predictive analytics for hyper-personalization, generating offers that anticipate guest needs before they arrive, such as room upgrades based on past preferences or local recommendations tied to real-time weather and event data.

Dynamic pricing will react not only to occupancy but also to competitor moves, sustainability scores (e.g., eco-conscious guests paying premiums), and even social sentiment from X or review sites. Integration with emerging technologies like voice-activated booking assistants and AR previews of rooms will create new data streams for analysts to mine.

Sustainability metrics tracking energy use tied to occupancy or waste from F&B sales will become revenue levers, appealing to ESG-focused corporate clients. Automation will free analysts for strategic work, such as scenario modeling for major events or crises.

How to Improve Personalized AI-Driven Hotel Marketing

Career Path and Getting Started

Entry often comes from backgrounds in business analytics, hospitality management, or IT support roles, bolstered by certifications like the HSMAI Revenue Management Certification or Google Data Analytics Professional Certificate. Hands-on experience with hospitality-specific software (e.g., Maestro, Cloud beds) and portfolio projects demonstrating sales data integrations stand out.

Aspiring professionals should build a hybrid skill set: take online courses in Python for data science, practice building hospitality dashboards on public datasets, and seek internships in revenue or sales ops at hotel groups. Salaries in 2026 often range competitively, reflecting the high impact on bottom-line revenue.

Conclusion

The Sales IT Data Analyst represents the next evolution in hospitality operations where sales ambition meets IT reliability and data precision. As AI adoption accelerates and guests demand ever more tailored experiences, organizations that empower these hybrid experts will gain a decisive edge in revenue optimization and loyalty.

Transforming a Hotel Lobby with a Dynamic Digital Canvas

For hospitality leaders, investing in this role isn't just smart, it's essential for thriving in a data driven, hyper-competitive 2026 landscape. Whether you're a hotel operator, aspiring analyst, or tech vendor, the message is clear: the future of sales in hospitality is inextricably linked to intelligent IT and analytics integration.

Essential Due Diligence Steps for U.S. Citizens Selecting Portugal Golden Visa Investment Funds

2026-02-03 18:18:53

The path to a European residency and a more diversified portfolio is attracting a growing wave of U.S. citizens to Portugal’s Golden Visa investment funds. For American investors, the opportunity seems compelling: residency rights in Portugal, visa-free access to the Schengen zone, and exposure to promising sectors within a stable EU economy. Yet, this journey also involves navigating unfamiliar legal, regulatory, and financial terrain.

Unlike domestic investments, entering Portugal’s fund market requires a strategic approach and rigorous due diligence. A methodical, multi-layered process will not only help investors avoid costly pitfalls but also maximize the long-term benefits that the Golden Visa fund route has to offer.

\

Understanding the Stakes

Golden Visa-eligible funds present a wide array of opportunities, from stabilized real estate portfolios to tech startups, private equity, and sustainable energy. For U.S. citizens, however, the challenge goes beyond selecting a profitable fund—it’s about ensuring compliance, tax efficiency, and alignment with American reporting standards.

In this environment, effective due diligence is more than a box-ticking exercise. It is your first—and best—line of defense in a foreign investment landscape.

Step 1: Clarify Your Investment Objectives

The first step in any due diligence process is introspection. Why are you seeking a Golden Visa fund? Is your top priority diversification, wealth preservation, or the pursuit of high returns? Do you see your investment as a family legacy, a means to global mobility, or a temporary move? The answers to these questions will guide your entire search and selection process.

Take the time to define your risk tolerance and preferred sectors. Some investors gravitate toward lower-volatility real estate or infrastructure, while others see Portugal’s tech ecosystem as a growth engine.

Step 2: Assess the Fund’s Track Record and Credibility

Not all Golden Visa funds are created equal. Begin by investigating the fund manager’s reputation, previous experience in Portugal, and relevant sector expertise. Look for verifiable data—historical returns, portfolio performance, and references from other investors.

A good starting point is to consult with top law firms specializing in Portugal’s Golden Visa process. Their expertise helps verify a fund’s legitimacy and assists with background checks on management teams. These professionals are accustomed to working with international clients and understand the nuances facing U.S. citizens.

Step 3: Dive Into the Fund’s Asset Allocation

Understanding what’s inside a fund is just as important as knowing who’s running it. Study the asset mix and geographic spread—does the fund invest exclusively in Portuguese assets or is there a broader EU focus? Is it heavily weighted toward commercial real estate, or does it take on more risk through venture capital?

Transparency is key. Legitimate funds will share detailed investment memoranda, offering insights into both current holdings and pipeline opportunities. If you encounter vague answers or reluctance to disclose, consider it a red flag.

Step 4: Examine Structure, Fees, and Terms

Scrutinize the fund’s structure and fee arrangements. Management fees, performance fees, and other costs can significantly impact your net returns over time. Compare these with industry benchmarks using independent fund comparison resources that provide side-by-side breakdowns of available funds, minimum investment requirements, and associated charges.

Don’t overlook redemption terms and exit strategies. Golden Visa-eligible funds often require a multi-year holding period—typically six years. Ask about liquidity options, early redemption penalties, and how the fund handles capital returns at maturity.

Step 5: Ensure Regulatory and Tax Compliance

For American investors, regulatory compliance doesn’t stop at Portuguese borders. Golden Visa funds must be regulated by Portugal’s CMVM, but you must also consider U.S. tax implications. The Passive Foreign Investment Company (PFIC) rules, FATCA reporting, and IRS foreign asset disclosures can create layers of complexity.

Some funds are specifically structured to facilitate U.S. compliance, offering the reporting needed for Mark-to-Market or QEF elections. Always consult with a cross-border tax specialist and confirm with the fund manager that their structure will not trigger adverse tax consequences.

Step 6: Prioritize Reporting and Transparency

Investor communication is critical, especially when investing abroad. Examine how frequently the fund provides audited accounts, regulatory filings, and performance updates. Robust reporting demonstrates not just transparency but operational maturity.

Utilize third-party research and analysis platforms for independent reviews and up-to-date market intelligence. These resources can help verify claims, compare peer funds, and offer additional context on risk factors or sector performance.

Step 7: Legal Support and Ongoing Compliance

Engage both Portuguese and U.S. legal counsel to review all documentation before committing. Legal professionals with experience in cross-border transactions ensure that your investment complies with both Golden Visa regulations and U.S. tax law.

Ongoing compliance is just as important as initial due diligence. Schedule annual reviews with your advisors to ensure you remain eligible for residency and in good standing with the IRS.

Step 8: Prepare for the Unexpected

Consider political, currency, and regulatory risks. What happens if Portuguese residency rules change? How will currency fluctuations affect returns? Does the fund have contingency plans for adverse market movements?

Strong due diligence includes stress-testing your investment against multiple scenarios. The best-managed funds address these risks up front and communicate their mitigation strategies to investors.

\

Final Thoughts

Selecting a Portugal Golden Visa investment fund as a U.S. citizen is not simply a matter of choosing the highest yield or the most popular option. It is a nuanced process requiring self-reflection, research, and expert advice. By following these due diligence steps—and utilizing the expertise of top law firms, reliable fund comparison tools, and independent research platforms—you can confidently chart a course toward a secure, compliant, and rewarding international investment.

 

:::tip This story was distributed as a release by Sanya Kapoor under HackerNoon’s Business Blogging Program.

:::

\

How FollowSpy Restores Visibility Into Instagram Following Behavior

2026-02-03 18:09:21

Instagram exposes activity but hides sequence. Following lists refresh, reorder, and quietly blur timing. People notice movement yet struggle to place it in time, which turns observation into interpretation. That gap matters because timing changes meaning. A follow from yesterday carries a different weight than one from months ago, but the interface rarely helps you tell the difference.

This is where tools focused on structure step in. Instead of adding speculation or interpretation, they reorganize what is already visible so behavior can be read calmly. FollowSpy approaches Instagram following behavior as a data presentation problem, not a behavioral one, and that framing changes how people respond.

\

Why Instagram Following Behavior Became Hard to Read

Instagram once showed follows in a straightforward order. That is no longer the case. Lists reshuffle, new additions do not reliably appear at the top, and two viewers can see different sequences. Refreshing does not restore confidence. It multiplies doubt.

The result is predictable. Users check more often, compare screenshots, and rely on memory to reconstruct a timeline. Over time, guessing replaces knowing. The platform surfaces activity but removes context, which invites overinterpretation in everyday situations.

When Missing Order Fuels Assumptions

Without sequence, people fill gaps emotionally. This shows up most clearly in personal relationships, but it also affects creators, competitors, and casual observers. The interface creates the conditions for confusion by separating visibility from order.

\

How FollowSpy Reintroduces Chronological Context

FollowSpy focuses on restoring order rather than expanding access. Through this site the emphasis is on making Instagram following activity readable again by placing actions back into time.

At the core, users can view Instagram following lists in chronological order. Once follows are aligned by time, context returns quickly. It becomes possible to easily spot newly followed accounts without endless scrolling or guesswork. Over longer periods, users can detect changes over time who was added recently and whether activity accelerates, pauses, or fades.

This shift removes a key frustration. There is no need to guess based on Instagram’s random order when timing is visible again. The list stops behaving like a puzzle and starts behaving like a timeline.

Clarity Without Interpretation

FollowSpy does not label behavior or suggest intent. It presents sequence and steps back. That restraint matters because it keeps analysis grounded. Users decide what matters once the order is clear.

\

Seeing Patterns Without Being Seen

Observation often changes behavior. On social platforms, even small signals can influence how people act. Direct checking can feel confrontational or intrusive, especially when emotions are involved.

FollowSpy supports discreet tracking without notifying the account. Observation remains passive. There are no alerts, views, or visible traces tied to monitoring. This distance preserves natural behavior and keeps conclusions cleaner.

Why Discretion Improves Accuracy

When people know they are being watched, activity shifts. Quiet observation avoids that distortion. It allows users to see what happens without shaping the outcome, which is critical for understanding patterns rather than provoking reactions.

\

Use Cases Where Visibility Changes Decisions

Once following behavior is structured, its applications expand. Users approach the data with different motivations, but the value of order remains consistent.

Relationship and Personal Contexts

FollowSpy is especially useful for relationship concerns because timing often answers the real question. Seeing when follows happened helps separate coincidence from change. In many cases, clarity lowers emotional intensity before conversations escalate.

Creators and Competitive Observation

Creators use chronological following data to notice shifts in interest, peer behavior, or collaboration patterns. Structured visibility supports analysis without turning observation into interaction.

Long Term Monitoring

Over time, ordered data reveals consistency or disruption. Users can return to the same account and understand whether behavior evolved or stayed stable, which reduces repeated checking.

\

Why Restored Visibility Calms More Than It Reveals

One overlooked effect of chronological clarity is psychological. When users can see sequence, urgency drops. They stop refreshing. They stop filling gaps with imagination. FollowSpy is built for clarity, not assumptions, and that design choice shapes how people think.

Instagram encourages fast interpretation by fragmenting context. A single follow can feel significant when seen alone. Once timing and repetition are visible, many actions lose weight. Others become clearer. In both cases, emotional noise fades.

This does not always lead to action. Often it leads to pause. Users understand what changed, or they confirm that nothing meaningful did. Either outcome reduces the mental loop that Instagram’s interface tends to create.

\

When Order Replaces Guesswork

The value of restored visibility is not constant discovery. It is stability. When people can view Instagram following lists in chronological order and detect changes over time who was added recently, they regain control over interpretation.

FollowSpy addresses a specific weakness in how Instagram presents following behavior. By restoring sequence, it allows users to observe calmly and decide whether further attention is necessary. In a space designed to provoke reaction, the ability to see clearly without guessing often becomes the reason people stop looking altogether.

\

:::tip This story was distributed as a release by Sanya Kapoor under HackerNoon’s Business Blogging Program.

:::

\

Ignacio Brasca on Building Systems That Last

2026-02-03 18:00:07

Systems don’t fail because they are poorly written; they fail because they cannot keep up with the complex and ever-changing needs of a business.

When speaking with Ignacio Brasca, who is a staff software engineer with several years of experience building complex data systems, he emphasized that software behaves much like a biological system—and that the entropy involved in making it grow cannot be underestimated.

As organizations and systems evolve together, developers must plan for long-term change from the start. Ignacio often references Donella Meadows’ idea that “there are no separate systems; the world is a continuum,” arguing that systems only last when they are built not just to work, but to grow.

Determinism of Your Actions

Under the assumption that you don’t depend on services you cannot control, Brasca argues that determinism is one of the most important elements of an effective system.

If you know what you will change and what that change will provoke,” he says, “you can build a system that lasts forever.

This clarity is often what prevents unnecessary abstraction, premature flexibility, and over-engineering. For Brasca, determinism isn’t about rigidity; it’s about having a strong foundation that fosters growth without losing sight of what made your system work well in the first place.

Flexibility vs. Over-Engineering

Flexibility is another area that needs to be approached carefully. Ignacio believes that, in real businesses, flexibility becomes expensive when it’s pursued without a clear understanding of what actual needs will arise over time. Teams often try to design for every possible future scenario, only to discover later that they optimized for the wrong ones.

“The word over-engineered makes us think of unnecessary complexity,” he explains, “but sometimes it’s the opposite. You take the wrong approach to the initial constraints and end up with something that’s simple, robust, and still not functional.”

Brasca relies on a simple framework: “make it work, make it good, make it fast.” Proving that a system solves a real problem comes first. Only then does it make sense to invest in flexibility, performance, or optimization.

Ultimately, flexibility is only worth its cost when it aligns with how the system is expected to evolve. Over-engineering happens when teams design for imagined futures instead of observable change, adding complexity without improving clarity or control.

Maintainability in Practice

Solving for business constraints in theory is one thing; when theory meets practice, things can derail quickly. Ignacio made a pointed observation about this gap:

“Here’s the thing: going wrong is the only part of software engineering that actually makes sense. You never write a program assuming everything will go right. It’s the edge cases that matter.”

Today, with the help of AI, writing code has become relatively easy. The difficult part is maintaining a clean, solid, and understandable codebase while keeping track of all the changes across a horizontal architecture that allow you to follow through when something goes wrong.

Getting that right, Ignacio argues, is what separates systems that survive from those that slowly collapse. Maintainability is about finding clarity under failure, and getting that right means getting everything else right.

Tooling, DX, and Surviving Change

Getting the code right is only the first step of the process. To scale, developer tooling and experience play a critical role in whether a system survives growth, change, and staff turnover. He notes that engineers tend to be incredibly passionate about their tools and craft; sometimes, trying to over-optimize everything can itself become a problem. However, finding balance and using the right tools can be the solution that removes friction and protects the system from human error.

One decision that significantly reduced long-term friction for his teams was investing early in CI/CD pipelines. While often underestimated, strong guardrails create confidence. Automated checks, validation steps, and consistent deployment processes make it possible to move fast without breaking production.

In that sense, tooling becomes an extension of the system’s philosophy. It reinforces determinism, limits unnecessary complexity, and ensures that growth doesn’t come at the cost of reliability.

Conclusion: Building for Change, Not Permanence

After this conversation with Ignacio, I have come to realize that systems don’t last because they’re clever or perfect; they last because they’re designed with change in mind. Being intentional with decisions shapes how systems age. The belief that architectural decisions are permanent crumbles quickly when you realize that decisions will eventually be wrong and decay over time.

The goal, then, isn’t to predict the future perfectly, but to align systems as closely as possible with the problems they’re meant to solve today, while leaving room to adapt tomorrow. Building lasting software is not purely a technical challenge; it is also an organizational one. Systems need to reflect the values, discipline, and clarity of the teams they are built for.

And in an industry obsessed with speed, novelty, and scale, designing systems that can be understood, maintained, and evolved may be the most enduring competitive advantage of all.

\

:::tip This story was distributed as a release by Jon Stojan under HackerNoon’s Business Blogging Program.

:::

\

From Legacy Data Centers to Cloud-Native Platforms: Insights from Large-Scale Enterprise Migrations

2026-02-03 17:58:01

Across the financial technology sector, large institutions are facing a hard reality. Systems that were built decades ago for stability and control are now under pressure to deliver flexibility, cost efficiency, and faster change. Banks and market infrastructure firms continue to depend on legacy data centers to process sensitive trade data, where outages and errors are not an option. Yet rising operational costs and growing demand for agility have made cloud adoption less of a future goal and more of an immediate necessity.

This shift from legacy environments to cloud-native platforms is not a simple technical upgrade. It requires careful decisions around security, data handling, cost, and operational risk. One example of how this transition is unfolding can be seen in the work of Vineet Kumar, a technology professional involved in a large-scale migration within financial market infrastructure.

Vineet works in an environment where systems must remain accurate, secure, and available at all times. His role sits at the intersection of engineering and operational responsibility. Alongside hands-on project work, he built his cloud expertise through industry certifications, including AWS Certified Cloud Practitioner and AWS Certified Solutions Architect – Associate, grounding formal knowledge in real production systems.

A First-of-its-Kind Cloud Move

The most significant initiative in his recent work was the migration of CLSNet, a trade data service, from an on-premises data center to Amazon Web Services. Within CLS Group, CLSNet became the first major system to move to AWS, setting a precedent for other services considering a similar path. The project was not treated as a simple transfer of existing systems, but as an opportunity to reassess how each component would behave in a cloud-native setting.

As part of the migration, serverless components were also introduced where they made sense. For example, each incoming file was automatically tagged and scanned for viruses using a lightweight AWS Lambda function, adding an extra layer of security without increasing operational overhead.

A key focus was FileGateway, a component responsible for receiving trade data in CSV format. In its original design, FileGateway relied on a traditional file system. Moving it directly to the cloud would have resulted in repeated scans of Amazon S3 buckets, driving up operational costs. The professional helped design a new API-based approach that listed objects using a custom algorithm tailored to individual participants. This reduced unnecessary scans while maintaining performance and reliability. “Cloud migration forces you to understand how data is actually accessed,” he noted. “Small design choices can have a big cost impact.”

Significant Results and Operational Gains

The results of the migration were clear. Internal assessments showed cost savings of around 35% after CLSNet moved to AWS. Operational maintenance efforts were reduced by roughly 65%, easing the long-term workload for support teams. Beyond the numbers, the platform became simpler to operate and better prepared for future change, a critical outcome for a service with strict availability requirements.

The path to these results was not without challenges. One of the most complex areas was the database layer. The existing DB2 replication setup did not support a straightforward lift-and-shift to the cloud, requiring deployment on EC2 instances and shifting database management responsibilities to the team. Messaging posed another hurdle, as several components relied on IBM MQ, which did not map neatly to native AWS services. Each integration required targeted redesign and careful validation. “There is rarely a perfect one-to-one replacement,” he said. “You have to work through dependencies one by one.”

Lessons for the Wider Industry

Throughout the migration, Vineet worked closely with both technical and operational stakeholders, highlighting a recurring lesson in large financial system transitions. Success often depends less on sweeping architectural change and more on attention to cost models, data access patterns, and system dependencies.

He also points to emerging practices shaping current migrations. Financial institutions are increasingly adopting open-source messaging platforms such as Apache ActiveMQ, which fit more naturally into cloud environments. Cost and performance tools like AWS Trusted Advisor are being used more actively, while data security services such as Amazon Macie provide better visibility into sensitive information stored in the cloud. “The tooling has matured,” he observed, “but it still requires discipline to use it well.”

Beyond the immediate project, the individual’s work has produced internal design documents and migration frameworks that continue to guide later efforts within the organization. In regulated environments, such tested references often become essential for scaling cloud adoption safely.

The broader takeaway from the CLSNet migration is straightforward. Moving from legacy data centers to cloud-native platforms is not about speed or novelty. It is about careful execution, informed trade-offs, and respect for the complexity of existing systems. When approached thoughtfully, cloud migration can reduce costs and simplify operations. The discussed experience shows how steady, detail-focused engineering can help financial institutions make that transition with confidence.

\

:::tip This story was distributed as a release by Sanya Kapoor under HackerNoon’s Business Blogging Program.

:::

\