2026-01-22 19:00:03
4 Identifying API Privacy-relevant Methods
5 Labels for Personal Data Processing
6 Process of Identifying Personal Data
7 Data-based Ranking of Privacy-relevant Methods
8 Application to Privacy Code Review
Conclusion, Future Work, Acknowledgement And References
\
Our data-based ranking is designed to identify and prioritize privacy-relevant methods in Java and JavaScript applications. This ranking process comprises several stages, as depicted in Fig. 3, using 
\ the Java ranking as an example. By analyzing data from real-world applications, we aim to provide a practical guide for identifying methods that are most relevant for privacy concerns.
To focus our data-based ranking on the most relevant libraries, we selected the top 25 libraries from NPM for JavaScript and Maven for Java, shown below in Table 2. Our selection criteria were based on the libraries’ relevance to personal data processing, as aligned with our set of labels for personal data processing activities. This selection was made through a systematic review of each library’s documentation, specifically targeting functionalities that are related to personal data processing.
We employed static analysis tools to identify method invocations and analyze data flows within the code. For Java, we used Soot [14] to construct call graphs and trace method invocations. In the case of JavaScript, we used ESLint 1 for its capabilities in Abstract Syntax Tree (AST) analysis. Our analysis matched these invocations to our list of native privacy-relevant methods, providing a view of how these methods are used in practice.
To rank privacy-relevant methods, we selected 30 popular open-source GitHub projects with over 100 stars in Java and JavaScript. We focused on applications processing personal data rather than frameworks and libraries. The selection included 15 Java applications such as the e-commerce software Shopizer, and 15 JavaScript applications like the chat application RocketChat. We also included projects predominantly in Java/JavaScript that use other languages like TypeScript for some modules.
\
Criteria were: popularity (applications with high stars, indicating broader relevance), data sensitivity (applications processing personal or sensitive data, highly relevant for privacy reviews), diversity (applications from different domains and languages, showing wide applicability), and public availability (open source code enables reproducibility and transparency). The details of these selected projects are provided in Table 4. 
To make the analysis efficient, we first identified the libraries imported by each application. For standard libraries, we assumed their presence in most applications. For API libraries, we examined import statements and configuration files to narrow down our focus to the top 50 pre-selected libraries, 25 each for Java and JavaScript.
We employed Semgrep to monitor the flow of personal data into privacy-relevant methods invoked by application code. Utilizing Semgrep’s DeepSemgrep 2 capability for cross-file analysis, we were able to comprehensively analyze data flows across entire applications, as opposed to only examining isolated code snippets. This provided a holistic perspective of how personal data propagates across different components. Using Semgrep’s taint analysis and the rules outlined in Section 6, we traced personal data flows to privacy-relevant methods.
\
To assess the practical relevance of our identified privacy-relevant methods, we introduce the following usage-based metrics, presented in Table 3: We ranked privacy-relevant methods by analyzing their usage in the 30 popular GitHub projects introduced above, with an average of 358 application methods processing personal data per application. This varied by language and type: Java applications averaged 288 methods, while JavaScript had 363. The higher average in JavaScript was likely due to its more diverse front-end processing, reflecting the complexity and multifaceted nature of these applications. 
To better focus our approach, we calculated the proportion of application methods that both invoke a privacy-relevant method and process a concrete flow of personal data (there is confirmed personal data flow into the method). This is relative to the total number of methods in the application. This metric indicates the level of focus in identifying privacy-relevant methods, allowing developers to narrow their efforts to a more relevant subset of the code. In essence, our approach aims to minimize the code sections that need scrutiny, saving both time and resources. For more details on these proportions in selected open-source Java and JavaScript/TypeScript applications, see Table 4.

Our study reveals that, on average, only 4.2% of the total codebase is made up of methods that are privacy-relevant and involved in personal data processing. This result highlights the precision of our approach in pinpointing privacy-relevant methods in applications.
Usage Patterns of Privacy-Relevant Methods In Java applications, we observed a more conservative use of privacy-relevant methods, particularly those from popular Maven libraries. Native Java methods, along with methods from Apache Commons and the Spring framework, were frequently used for handling personal data. Libraries such as slf4j for logging and auth0 for authentication were also commonly used, indicating their importance in the flow and protection of personal data.
\ In contrast, JavaScript applications exhibited a diverse range of library usage. While lodash was commonly used, frameworks like Angular, React, and Vue.js played a significant role in personal data processing, particularly in front-end applications. Table 5 presents the top five packages in both Java and JavaScript that contain methods relevant to privacy concerns.

Categories of Privacy-relevant Methods We categorized privacy-relevant methods into types to gain insights into their roles in personal data processing. Our analysis identified several Java classes and categories that are frequently involved in personal data processing. For example, common Java classes like org.slf4j.Logger and auth0.client.Auth0Client are often used in operations that handle personal data.
\ In terms of categories, Data Processing and Transformation, Network Communication, and Logging Methods were most prevalent. These categories indicate areas where privacy-relevant methods are most commonly used, suggesting that they are key to understanding how personal data is processed in codebases (Table 6). Identity and Access Management, Data Encryption and Cryptography, and Data Storage and Database Management were also highly involved in personal data flows, with involvement percentages of 92%, 78%, and 85%, respectively.
\ Conversely, categories like Data Processing and Transformation, Network Communication, and Logging Methods were less involved, with percentages of 67%, 44%, and 28%. Table 7 lists Java classes that are frequently involved in personal data processing, serving as key indicators for identifying privacy-relevant methods in applications.
:::info Authors:
:::
:::info This paper is available on arxiv under CC BY-NC-SA 4.0 license.
:::
\
2026-01-22 17:45:02
Gowtham Reddy Kunduru is a lead software engineer with a successful career spanning healthcare, FinTech, and cloud architecture. His success wasn’t always assured; in fact, as he puts it, his “story began with setbacks.”
In college, I was written off as someone who would never amount to much. Being detained twice was humiliating, but it became the turning point in my life. I realized that if I didn’t take control, no one else would. So, I rebuilt myself with discipline, consistency, and a refusal to quit,” Kunduru says.
That determination and dedication to course-correcting his path led him not only to graduate but also to build a career marked by solving problems others considered impossible.
\
Kunduru’s college years were difficult for him, and he admits to struggling with discipline and direction.
“Many people around me, including my own family, believed I would follow the same path as my father, who never found success,” Kunduru explains.
It was his discovery of software engineering that inspired him to change his life. He says he was intrigued by building something from “nothing but logic and determination.” Technology was his reset button, and he pushed it.
\

After graduating, Kunduru joined an agriculture-focused startup. As an associate software engineer, he developed impactful technical solutions for the company. He was awarded Employee of the Year and received the Innovation of the Year Award in 2013.
“That job became my true foundation. Because it was a startup, I had to do everything: front-end, backend API, database, deployment, and support. It forced me to grow rapidly and taught me the value of hard work and responsibility,” Kunduru says.
Kunduru would later work on high-impact projects for Innova Solutions. Again, his hard work and dedication led to his being promoted twice within four years. He became the principal software engineer, leading a team of 12 engineers.
After Kunduru moved to the United States in 2020 to work with leading healthcare clients, he was approached by a former client seeking to hire him for his tech leadership and delivery record. He led NLP (natural language processing) and OCR (optical character recognition) initiatives in large-scale healthcare projects. His work processed over 158 million health records to generate insights for entire patient cohorts, revealing patterns that informed care decisions across populations.
“I’ve been fortunate to build a career defined by curiosity, continuous learning, and solving complex engineering challenges,” Kunduru says.
\
Kunduru decided to shift his career into the FinTech industry in 2022 when he joined M&T Bank. His hard work, dedication to continuous learning, and results were quickly recognized, leading to his becoming an SME and Tech lead within a year.
As a team leader of 6 engineers, he oversaw the creation of enterprise-grade microservices and the delivery of an Adobe ColdFusion migration from 2016 to 2023 that improved performance by 33% and reduced server load by 30%, enabling faster, more reliable service for 2.5+ million customers. His engineering leadership earned him second place in the M&T Cybersecurity Secure Coding Tournament and third place in the Secure Coding Championship, competing against engineers across the entire technology organization. \n “I became the first engineer to successfully establish Kerberos authentication between on-prem Windows servers and Azure COLO at M&T Bank, a feat even Adobe told us was impossible,” Kunduru says.
\
Kunduru aspires to remain a technology leader, driving innovations that impact millions of people worldwide. He wants to mentor future engineers who come from “humble or challenging backgrounds” as he did and show them that success is a “decision, not a privilege.”
“What makes me stand out is not just the technical capability, it’s the resilience. \n I went from being labelled a failure to becoming someone recognized for solving problems others give up on. My journey shows that your background doesn’t limit your potential, your perseverance does,” Kunduru says.
\
:::tip This story was published under HackerNoon’s Business Blogging Program.
:::
\
2026-01-22 15:11:02
How are you, hacker?
🪐Want to know what's trending right now?:
The Techbeat by HackerNoon has got you covered with fresh content from our trending stories of the day! Set email preference here.
## The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting
By @zbruceli [ 18 Min read ]
A deep dive into the Internet Archive's custom tech stack. Read More.
By @drechimyn [ 7 Min read ] Broken Object Level Authorization (BOLA) is eating the API economy from the inside out. Read More.
By @ivankuznetsov [ 9 Min read ] It’s far more efficient to run multiple Claude instances simultaneously, spin up git worktrees, and tackle several tasks at once. Read More.
By @dataops [ 4 Min read ] DataOps provides the blueprint, but automation makes it scalable. Learn how enforced CI/CD, observability, and governance turn theory into reality. Read More.
By @socialdiscoverygroup [ 19 Min read ] We taught Playwright to find the correct HAR entry even when query/body values change and prevented reusing entities with dynamic identifiers. Read More.
By @kilocode [ 6 Min read ] CodeRabbit alternative for 2026: Kilo's Code Reviews combines AI code review with coding agents, deploy tools, and 500+ models in one unified platform. Read More.
By @rahul-gupta [ 8 Min read ] As AI adoption grows, legacy data access controls fall short. Here’s why zero-trust data security is becoming essential for modern AI systems. Read More.
By @praisejamesx [ 6 Min read ] Stop relying on "vibes" and "hustle." History rewards those with better models, not better speeches. Read More.
By @proflead [ 4 Min read ] Ollama is an open-source platform for running and managing large-language-model (LLM) packages entirely on your local machine. Read More.
By @David [ 37 Min read ] History of AI Timeline tracing the road to the AI boom. Built with Claude, Gemini & ChatGPT as a part of the launch of HackerNoon.ai, covering 251 events. Read More.
By @mohansankaran [ 10 Min read ] Jetpack Compose memory leaks are usually reference leaks. Learn the top leak patterns, why they happen, and how to fix them. Read More.
By @mcsee [ 3 Min read ] Set your AI code assistant to read-only state before it touches your files. Read More.
By @ishanpandey [ 5 Min read ] BTCC reports $5.7B tokenized gold volume in 2025 with 809% Q4 growth, marking gold as crypto's dominant real-world asset. Read More.
By @linked_do [ 12 Min read ] As the AI bubble deflates, attention shifts from scale to structure. A long view on knowledge, graphs, ontologies, and futures worth living. Read More.
By @vinitabansal [ 12 Min read ] You’re a reactive leader if you spend most of your time reacting to the things in your environment. Read More.
By @nathanbsmith729 [ 4 Min read ] Final Week Upcoming… Read More.
By @sanya_kapoor [ 16 Min read ] A 60-day test of 10 Bitcoin mining companies reveals which hosting providers deliver the best uptime, electricity rates, and ROI in 2026. Read More.
By @newsbyte [ 7 Min read ] Meet Yuri Misnik, Chief Technology Officer at inDrive. Read More.
By @scottdclary [ 27 Min read ] Real transformation requires your brain to physically rewire itself. Read More.
By @zacamos [ 5 Min read ]
Third-party risk is everywhere in 2026. Here's an overview of current risks and security best practices as we start the new year. Read More.
🧑💻 What happened in your world this week? It's been said that writing can help consolidate technical knowledge, establish credibility, and contribute to emerging community standards. Feeling stuck? We got you covered ⬇️⬇️⬇️
ANSWER THESE GREATEST INTERVIEW QUESTIONS OF ALL TIME
We hope you enjoy this worth of free reading material. Feel free to forward this email to a nerdy friend who'll love you for it.
See you on Planet Internet! With love,
The HackerNoon Team ✌️
.gif)
2026-01-22 15:06:28
Boston, MA, USA, January 21st, 2026, CyberNewsWire/--Reflectiz today announced the release of its 2026 State of Web Exposure Research, revealing a sharp escalation in client‑side risk across global websites, driven primarily by third‑party applications, marketing tools, and unmanaged digital integrations.
According to the new analysis of 4,700 leading websites, 64% of third‑party applications now access sensitive data without legitimate business justification, up from 51% last year — a 25% year‑over‑year spike highlighting a widening governance gap.
The report also exposes a dramatic surge in malicious web activity across critical public‑sector infrastructure. Government websites saw malicious activity rise from 2% to 12.9%, while 1 in 7 Education websites now show active compromise, quadrupling year‑over‑year. Budget constraints and limited manpower were cited as primary obstacles by public‑sector security leaders.
The research identifies several widely used third‑party tools as top drivers of unjustified sensitive‑data exposure, including Google Tag Manager (8%), Shopify (5%), and Facebook Pixel (4%), which were frequently found to be over‑permissioned or deployed without adequate scoping.

“Organizations are granting sensitive‑data access by default rather than exception — and attackers are exploiting that gap,” said VP of Product at Reflectiz, Simon Arazi. “This year’s data shows that marketing teams continue to introduce the majority of third‑party risk, while IT lacks visibility into what’s actually running on the website.”
Key findings include:
64% of apps accessing sensitive data have no valid justification.
47% of applications running in payment frames (checkout environments) are unjustified.
Compromised sites connect to 2.7× more external domains, load 2× more trackers, and use recently registered domains 3.8× more often than clean sites.
Marketing and Digital departments account for 43% of all third‑party risk

The report also introduces updated Security Leadership Benchmarks, highlighting the very small group of organizations meeting all eight criteria. Only one website — ticketweb.uk — achieved a perfect score across the framework.
The 2026 report includes:
The complete 43‑page analysis is available for download:
https://www.reflectiz.com/learning-hub/web-exposure-2026-research/
Reflectiz empowers organizations to secure their websites and digital assets against modern web threats. Its award-winning, agentless platform provides continuous visibility into all client-side activity, detecting and prioritizing security, privacy and compliance risks. Reflectiz is trusted by global enterprises across financial services, e-commerce, and healthcare to protect their data, users, and brand reputation.
VP Marketing
Daniel Sharabi
Reflectiz
:::tip This story was published as a press release by Cybernewswire under HackerNoon’s Business Blogging Program. Do Your Own Research before making any financial decision.
:::
\ \ \
2026-01-22 14:09:00
AI models aren’t actually too big. New research shows nearly 30% of their size is wasted due to outdated storage assumptions—and fixes it without losing accuracy.
2026-01-22 14:05:23
NVIDIA just dropped a production-ready stack where speech, retrieval, and safety models were actually designed to compose.