2025-01-31 05:41:01
At an all hands meeting inside Meta Thursday, Mark Zuckerberg did not address Meta’s $25 million settlement with Donald Trump that will see the company paying $22 million for the eventual establishment of the Trump Presidential Library. But Zuckerberg did say that he had to be increasingly careful about what he says internally at Meta because “everything I say leaks. And it sucks, right?”
Meta made changes to the question-and-answer section of the company all hands meeting because of the leaks, Zuckerberg said, according to meeting audio obtained by 404 Media.
“I want to be able to be able to talk about stuff openly, but I am also trying to like, well, we’re trying to build stuff and create value in the world, not destroy value by talking about stuff that inevitably leaks,” he said. So rather than take direct questions, the company used a “poll” system, where questions asked beforehand were voted on so that “main themes” of questions were addressed.
“There are a bunch of things that I think are value-destroying for me to talk about, so I’m not going to talk about those. But I think it’ll be good. You all can give us feedback later,” he added. “Maybe it’s just the nature of running a company at scale, but it’s a little bit of a bummer.”
In the hour-long meeting, Zuckerberg repeated many things he has said publicly, such as the possibility of replacing software engineers with AI, the fact that he thinks open source AI will soon overtake closed-source AI, and the fact that he believes the company can now work more easily with the Trump administration that he has changed his platforms to align with.
He said “we now have an opportunity to have a productive relationship with the United States government, and we’re going to take that.” He also addressed changes to the company’s diversity, equity, and inclusion policies, but largely mirrored what he has already said publicly on podcasts like Joe Rogan. More of what Zuckerberg said on internal strife at the company was reported by Business Insider, which also obtained a copy of the meeting.
Zuckerberg also spoke extensively about the rise of DeepSeek, which he said will not affect Meta as badly as it has affected the valuations of companies like OpenAI and Nvidia. This is because Meta does not sell access to its own open source large language model, Llama.
“You know, we can not only observe what they did, but we can read about it and implement it. So that'll benefit us,” he said. “We have, like, a model that's like, that's competitive with the best models out there, and we offer it for free. We're not charging $20 or $200 a month or whatever. It's just like right there, and it's free. But now I think that there might be an opportunity to do even more, right?”
Zuckerberg also said that the company does not “have control over what’s going to happen to TikTok,” one of its biggest competitors. “I'm pretty sure whatever happens. Whatever happens, regardless of what happens to TikTok, I'm very confident that Facebook and Instagram reels are gonna continue growing … we have a lot of competitors, but they’re like, they’re an important one.”
2025-01-31 03:36:15
Datasets aggregated on data.gov, the largest repository of U.S. government open data on the internet, are being deleted, according to the website’s own information. Since Donald Trump was inaugurated as president, more than 2,000 datasets have disappeared from the database.
As people in the Data Hoarding and archiving communities have pointed out, on January 21, there were 307,854 datasets on data.gov. As of Thursday, there are 305,564 datasets. Many of the deletions happened immediately after Trump was inaugurated, according to snapshots of the website saved on the Internet Archive’s Wayback Machine. Harvard University researcher Jack Cushman has been taking snapshots of Data.gov’s datasets both before and after the inauguration, and has worked to create a full archive of the data.
Because data.gov is an aggregator that doesn’t always host the data itself, this doesn’t always mean that the data itself has been deleted, that it doesn’t exist elsewhere on federal government websites, or that it won’t be re-hosted elsewhere. Further research will be necessary to determine what has happened to any given dataset, or to see if it turns up elsewhere on a government website. For example, 404 Media found some datasets in Cushman’s analysis that are no longer accessible on data.gov but can still be found on individual agency websites; we also found some datasets that seem to still exist because data.gov links to working websites but give a file-not-found error message when trying to download the file itself.
Disproportionately, the datasets that are no longer accessible through the portal come from the Department of Energy, the National Oceanic and Atmospheric Administration, the Department of the Interior, NASA, and the Environmental Protection Agency. But determining what is actually gone and what has simply moved or is backed up elsewhere by the government is a manual task, and it's too early to say for sure what is gone and what may have been renamed or updated with a newer version.
This is because data.gov doesn’t always host the data that it is indexing. Sometimes the data is hosted directly on data.gov, but other times it links to an individual agency’s website, where the data is actually hosted. This means archiving and analyzing data.gov is not straightforward.
“Some of [the entries link to] actual data,” Cushman told 404 Media. “And some of them link to a landing page [where the data is hosted]. And the question is—when things are disappearing, is it the data it points to that is gone? Or is it just the index to it that’s gone?”
For example, “National Coral Reef Monitoring Program: Water Temperature Data from Subsurface Temperature Recorders (STRs) deployed at coral reef sites in the Hawaiian Archipelago from 2005 to 2019,” a NOAA dataset, can no longer be found on data.gov but can be found on one of NOAA’s websites by Googling the title.
“Stetson Flower Garden Banks Benthic_Covage Monitoring 1993-2018 - OBIS Event,” another NOAA dataset, can no longer be found on data.gov and also appears to have been deleted from the internet. “Three Dimensional Thermal Model of Newberry Volcano, Oregon,” a Department of Energy resource, is no longer available via the Department of Energy but can be found backed up on third-party websites.
Determining what is gone, why it’s gone, and where it went seems like it would be straightforward, and it would seem like you could attribute all of it to malice on the part of an administration that has declared war on climate change and government equity efforts. But archivists who have been working on analyzing the deletions and archiving the data it held say that while some of the deletions are surely malicious information scrubbing, some are likely routine artifacts of an administration change, and they are working to determine which is which. For example, in the days after Joe Biden was inaugurated, data.gov showed about 1,000 datasets being deleted as compared to a day before his inauguration, according to the Wayback Machine.
Because of the overall large number of datasets as well as the way that data.gov works, it is still too early to say what, specifically, has been deleted, though archivists and academics like Cushman are working on triaging the situation. It can reasonably be surmised that climate and environmental research and data, as well as research about marginalized communities and minorities are among the datasets that have been purged. This is in part because the Trump administration deleted huge swaths of climate data during his first term, and because Trump issued an executive order asking all federal agencies to delete anything related to diversity, equity and inclusion.
Data.gov serves as an aggregator of datasets and research across the entire government, meaning it isn’t a single database. This makes it slightly harder to archive than any individual database, according to Mark Phillips, a University of Northern Texas researcher who works on the End of Term Web Archive, a project that archives as much as possible from government websites before a new administration takes over.
“Some of this falls into the ‘We don’t know what we don’t know,’” Phillips told 404 Media. “It is very challenging to know exactly what, where, how often it changes, and what is new, gone, or going to move. Saving content from an aggregator like data.gov is a bit more challenging for the End of Term work because often the data is only identified and registered as a metadata record with data.gov but the actual data could live on another website, a state .gov, a university website, cloud provider like Amazon or Microsoft or any other location. This makes the crawling even more difficult.”
Phillips said that, for this round of archiving (which the team does every administration change), the project has been crawling government websites since January 2024, and that they have been doing “large-scale crawls with help from our partners at the Internet Archive, Common Crawl, and the University of North Texas. We’ve worked to collect 100s of terabytes of web content, which includes datasets from domains like data.gov.”
The Environmental Data & Governance Institute (EDGI) published a report in 2019 detailing “How the Trump administration has undermined federal web infrastructures for climate information,” which included not just deleting datasets but also, in some cases, not deleting datasets but deleting the links to them, changing descriptions of them, or making them much harder to find. For example, during Trump’s first term, the Department of Transportation’s information on climate change was deleted, republished in a different form elsewhere, then deleted again from that new place, the report found.
James Jacobs, a Stanford Libraries researcher who also works with a group called Free Government Information,” told 404 Media in an email that data.gov “has always been kind of a government data junk drawer (I call it that lovingly ;-)). That is, it was a really great effort to get the vast federal apparatus to start to think about collecting and preserving data. But there are no specific regulations that tell agencies that they *have to* use data.gov. Some agencies use it heavily, some put up a few excel spreadsheets and called it a day.”
“I assume some of those datasets in data.gov have bad urls to old agency pages that no longer exist (it’s really problematic when an agency decides to redesign its site and its base domain changes and all the links to important information and data are broken),” Jacobs added. “Some of it is probably link rot and content drift and some of it is no doubt Trump admin policy driven (e.g. anything having to do with DEI).”
Harvard’s Cushman said that, because this is the internet, there are always things that are being added, breaking, changing, or vanishing, and that some of this happens on purpose and some of it happens on accident. So determining what is being purged, when there are so many data points, is not always trivial. “If you want to answer why any given thing is gone, it becomes an individual research question.” Cushman said he is working on compiling this info now and will publish it soon.
All of this is to say that even under the best circumstances, government datasets and research can get lost or deleted, and archiving it is not always easy. When an administration specifically makes a point of deleting research, this already fragile ecosystem is stressed even further. All of these suddenly disappeared datasets must be taken in with the context that we know the Trump administration has ordered agencies to delete and edit specific webpages, and 404 Media’s own reporting has shown targeted deletions of pages relating to diversity, equity, and inclusion as well as climate change.
In a post from this week on Free Government Information, Jacobs explained that “the government information crisis is bigger than you think.”
“There is a difference between the government changing a policy and the government erasing information, but the line between those two has blurred in the digital age,” Jacobs wrote. He explained that before the internet, government documents were printed and were archived by being distributed among many different libraries as part of the “Federal Depository Library Program.” The internet has made a lot of government information more accessible, but it has also made it a lot more fragile.
“In the print era, libraries did a good (but not perfect) job of preservation through inertia (ie collect and catalog a document, put it on a shelf, and leave it there until a patron wanted it),” Jacobs told 404 Media in an email. “In the digital era, that system of distribution/preservation/access has broken down because digital publications are no longer ‘distributed’ to libraries, and government entities a) publish a LOT more on the internet; but b) have no clear regulations or policies regarding preservation.”
It is absolutely true that the Trump administration is deleting government data and research and is making it harder to access. But determining what is gone, where it went, whether it’s been preserved somewhere, and why it was taken down is a process that is time intensive and going to take a while.
“One thing that is clear to me about datasets coming down from data.gov is that when we rely on one place for collecting, hosting, and making available these datasets, we will always have an issue with data disappearing,” Phillips said. “Historically the federal government would distribute information to libraries across the country to provide greater access and also a safeguard against loss. That isn't done in the same way for this government data.”
2025-01-30 23:22:44
In 2015, a federal worker named Katherine Spivey gave colleagues a presentation about how to “write plainly,” so that the general public can more easily understand content on government websites. One of her pieces of advice, among many, was to “use pronouns” such as the word “you” to describe the reader rather than jargon like “beneficiary” or “purchaser.”
“There’s already a great barrier between citizens and the government,” Spivey said. “Remember, your reader is a person, not an entity … use pronouns to speak directly to your readers. It requires a lot less work and it requires a lot less words.”
Spivey’s presentation had nothing to do with gender identity, gender pronouns, diversity, equity, or inclusion. It was about the broad concept of “pronouns,” the part of speech we (a pronoun!) use constantly. And yet, after Donald Trump was inaugurated, the government webpage archiving a video of Spivey’s presentation was first edited to remove a timestamp link that went to the section of the video about “pronouns.” Later, the page archiving the video was deleted entirely (a copy of the video is still available on YouTube and on the Internet Archive).
The tweak is one of hundreds that have been revealed across government via Github’s commit tracking, which shows version changes to code, websites, and other projects managed on the site. Github is also revealing a widespread, scattershot effort to not only change government policies on DEI but also to wholesale nuke language that actually has nothing to do with it and are retroactively changing descriptions of research and events that happened in the past to remove any reference to DEI. The Github pages reveal not only the imprecision with which these changes are being made but also a willingness to literally rewrite and delete history.
Many of the deletions catalogued on Github demonstrate the pettiness and lengths to which the Trump administration is going to seek and destroy anything that it could possibly conceive as being related to DEI. They also show that the government has hundreds of employees and contractors who have been tasked with being the anti-DEI police across the entire government. Many of the changes are frivolous, but many of them are not, and represent the destruction of critical institutions, research, and public data.
There are far more alarming deletions than Spivey’s video, of course.
The Federal Committee on Statistical Methodology, an office of the government that determines how the federal government should carry out statistical research to, for example, determine if a federal program is working, has nuked its page about best practices for researching “sexual orientation, gender identity, and sex characteristics.” This page had years of research about how to best do basic government research about the American people for the Census, the National Institutes of Health, and other government agencies to “allow for better understanding of how sexual and gender minority populations [are faring] relative to the general or other population groups, including economic, housing, health, and other differences. These insights can lead to potential resources and interventions needed to better serve the community. These data meet critical needs to understand trends within larger population groups.”
Similarly, the National Institutes of Health deleted a page about the Sexual & Gender Minority Research Office, which has done critical research about the health and wellbeing of LGBTQ+ people.
It is impossible to catalog everything that has been deleted, tweaked, or scrubbed. But here are some more:
2025-01-30 04:52:21
A declassified World War II-era government guide to “simple sabotage” is currently one of the most popular open source books on the internet. The book, called “Simple Sabotage Field Manual,” was declassified in 2008 by the CIA and “describes ways to train normal people to be purposefully annoying telephone operators, dysfunctional train conductors, befuddling middle managers, blundering factory workers, unruly movie theater patrons, and so on. In other words, teaching people to do their jobs badly.”
Over the last week, the guide has surged to become the 5th-most-accessed book on Project Gutenberg, an open source repository of free and public domain ebooks. It is also the fifth most popular ebook on the site over the last 30 days, having been accessed nearly 60,000 times over the last month (just behind Romeo and Juliet).
“Sabotage varies from highly technical coup de main acts that require detailed planning and the use of specially-trained operatives, to innumerable simple acts which the ordinary individual citizen-saboteur can perform,” the guide begins. “Simple sabotage does not require specially prepared tools or equipment; it is executed by an ordinary citizen who may or may not act individually and without the necessity for active connection with an organized group; and it is carried out in such a way as to involve a minimum danger of injury, detection, and reprisal.”
The guide’s intro was written by William “Wild Bill” Donovan, who was the head of the Office of Strategic Services during World War II, which later inspired the creation of the CIA. The motivating factor for writing the guide, according to a passage within it, is that citizen saboteurs were highly effective at resisting the Nazis during World War II, and the Office of Strategic Services wanted to detail other ways sabotage could be done: “Acts of simple sabotage are occurring throughout Europe. An effort should be made to add to their efficiency, lessen their detectability, and increase their number,” the guide states. “Widespread practice of simple sabotage will harass and demoralize enemy administrators and police,” the guide states, adding that citizens often undertake acts of sabotage not for their own immediate personal gain, but to resist “particularly obnoxious decrees.”
Because it was written during active wartime, the book includes various suggestions for causing physical violence and destruction, such as starting fires, flooding warehouses, breaking tools, etc. But it also includes many suggestions for how to just generally be annoying within a bureaucracy or office setting. Simple sabotage ideas include:
The guide also suggests “general devices for lowering morale and creating confusion,” which include “Report imaginary spies or danger to the Gestapo or police,” “act stupid,” “Be as irritable and quarrelsome as possible without getting yourself into trouble,” “Stop all conversation when axis nationals or quislings enter a cafe,” “Cry and sob hysterically at every occasion, especially when confronted by government clerks.”
It is impossible to say why this book is currently going viral at this moment in time and why it may feel particularly relevant to a workforce of millions of people who have suddenly been asked to agree to be “loyal” and work under the quasi leadership of the world’s richest man, have been asked to take a buyout that may or may not exist, have had their jobs repeatedly denigrated and threatened, have suddenly been required to return to office, have been prevented from spending money, have had to turn off critical functions that help people, and have been asked to destroy years worth of work and to rid their workplaces of DEI programs. Maybe it's worth wondering why the most popular post in a subreddit for federal workers is titled “To my fellow Feds, especially veterans: we’re at war.”
2025-01-30 02:19:01
Dario Amodei, the CEO of the AI company Anthropic, has responded to the current hysteria in his industry and the financial markets around a new and surprisingly advanced Chinese AI model called DeepSeek by saying it proves the United States needs export controls on chips to China in order to ensure China doesn’t “take a commanding lead on the global stage, not just for AI but for everything.”
As I wrote earlier this week, Amodei believes that DeepSeek’s current advantages over American AI companies are overstated and temporary. The true cost of the DeepSeek R1 is not entirely clear and almost certainly much higher than DeepSeek’s paper claims because it is building on previous research published by American companies and DeepSeek’s own previously released V3 model. Additionally, Amodei argues that American companies will be able to recreate the same efficiencies in their model training soon, if they haven’t already, and then gain the lead again when those efficiencies are paired with American companies’ much greater access to more and better. The US already has export controls on chips to China, and Amodei argues that DeepSeek shows that they are “more existentially important than they were a week ago.”
At the same time, Amodei believes that “making AI that is smarter than almost all humans at almost all things will require millions of chips, tens of billions of dollars (at least), and is most likely to happen in 2026-2027.” Multiple American companies, Amodei says, will definitely have the money and chips this requires. The important question, and the reason the US needs export controls on chips, is whether China will be able to get millions of chips in order to do this as well.
In one of his footnotes, Amodei expands on this: “To be clear, the goal here is not to deny China or any other authoritarian country the immense benefits in science, medicine, quality of life, etc that come from very powerful AI systems,” he said. “Everyone should be able to benefit from AI. The goal is to prevent them from gaining military dominance.”
To state the obvious here, it’s not just China that can direct “talent, capital, and focus to military applications of the technology.” OpenAI, arguably the leading AI company in the United States and the world, has already partnered with American military defense technology company Anduril to “deploy advanced artificial intelligence (AI) solutions for national security missions.” the US Military is already purchasing OpenAI software for war, and companies like Amazon, Google, and Microsoft are always competing for US military contracts. AI could have a lot of uses but the military is definitely one of them for US companies. That’s not something only China is doing.
Overall, Amodei piece is pretty diplomatic. It doesn’t vilify DeepSeek and Chinese researchers and respects their contributions to computer science. It acknowledges that societies deserve the benefits of technology even if we disagree with their governments. But the ultimatum Amodei says we are facing is: Do we want to live in a world in which an all powerful US owned AI is dominating the world or do we want to live in a world in which an all powerful China-owned AI is dominating the world.
If I had to choose, I guess I would choose the US AI dystopia over the Chinese AI dystopia. But those aren’t really the only choices available to us. Even if we just accept the assumption that AI will be as powerful as Amodei and other AI company CEOs tell us they are, are we really unable to even imagine a world in which we choose not to weaponize and militarize them in ways that brings humanity to the brink? Would preventing our own homegrown AI companies from doing exactly that not be a good place to start?
2025-01-29 22:43:59
The narrative that OpenAI, Microsoft, and freshly minted White House “AI czar” David Sacks are now pushing to explain why DeepSeek was able to create a large language model that outpaces OpenAI’s while spending orders of magnitude less money and using older chips is that DeepSeek used OpenAI’s data unfairly and without compensation. Sound familiar?
Both Bloomberg and the Financial Times are reporting that Microsoft and OpenAI have been probing whether DeepSeek improperly trained the R1 model that is taking the AI world by storm on the outputs of OpenAI models.
Here is how the Bloomberg article begins: “Microsoft Corp. and OpenAI are investigating whether data output from OpenAI’s technology was obtained in an unauthorized manner by a group linked to Chinese artificial intelligence startup DeepSeek, according to people familiar with the matter.” The story goes on to say that “Such activity could violate OpenAI’s terms of service or could indicate the group acted to remove OpenAI’s restrictions on how much data they could obtain, the people said.”
The venture capitalist and new Trump administration member David Sacks, meanwhile, said that there is “substantial evidence” that DeepSeek “distilled the knowledge out of OpenAI’s models.”
“There’s a technique in AI called distillation, which you’re going to hear a lot about, and it’s when one model learns from another model, effectively what happens is that the student model asks the parent model a lot of questions, just like a human would learn, but AIs can do this asking millions of questions, and they can essentially mimic the reasoning process they learn from the parent model and they can kind of suck the knowledge of the parent model,” Sacks told Fox News. “There’s substantial evidence that what DeepSeek did here is they distilled the knowledge out of OpenAI’s models and I don’t think OpenAI is very happy about this.”
I will explain what this means in a moment, but first: Hahahahahahahahahahahahahahahaha hahahhahahahahahahahahahahaha. It is, as many have already pointed out, incredibly ironic that OpenAI, a company that has been obtaining large amounts of data from all of humankind largely in an “unauthorized manner,” and, in some cases, in violation of the terms of service of those from whom they have been taking from, is now complaining about the very practices by which it has built its company.
The argument that OpenAI, and every artificial intelligence company who has been sued for surreptitiously and indiscriminately sucking up whatever data it can find on the internet is not that they are not sucking up all of this data, it is that they are sucking up this data and they are allowed to do so.