Rss preview of Blog of Gwern Branwen

May 2021 Gwern.net Newsletter

2021-06-11 22:16:22

May 2021’s Gwern.net newsletter is now out; previous, April 2021 (archives). This is a collation of links and summary of major changes, overlapping with my Changelog; brought to you by my donors on Patreon.

Note: I will be in Denver 12–13 June 2021 for a conference.

1 Writings

Proposal: “Choose Your Own Adventure AI Dungeon”; “Decision Transformers: Preference Learning As Simple As Possible”

2 Links

2.1 AI

Matters Of Scale:

Hardware:
- “Podracer architectures for scalable Reinforcement Learning”, Hessel et al 2021 (highly-efficient TPU pod use: eg solving Pong in <1min at 43 million FPS on a TPUv3-2048); “Google details new TPUv4 AI accelerator chips” (2.7× TPUv3 chips; up to TPUv4-4096 pods, yielding >1 ExaFLOPS; public access later in 2021)x
- “ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning”, Rajbhandari et al 2021 (~1 trillion parameters per 16 GPUs/DGX-2-node, scaling to >512 GPUs ~40% efficiency)
- “GSPMD: General and Scalable Parallelization for ML Computation Graphs”, Xu et al 2021 (Google upgrade of GPipe/GShard arch to match MS DeepSpeed: “…50%–62% compute utilization on 128–2048 Cloud TPUv3 cores for models with up to one trillion parameters”)
- “DLRM: High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models”, Mudigere et al 2021 (ZionEX software/hardware platform for training extremely large embeddings—while embeddings aren’t ‘real’ parameters & things like DynamicEmbedding will never learn tricks like GPT-3 no matter how big, they present similar challenges); “RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance”, Gupta et al 2021
“From Motor Control to Team Play in Simulated Humanoid Football”, Liu et al 2021 (curriculum training of a single NN from raw humanoid control to coordinated team-wide soccer strategy; neat to compare with Hill et al 2020 in terms of agent abilities)
“Wav2vec-U: Unsupervised Speech Recognition”, Baevski et al 2021
“Anthropic” public-benefit-corp/startup launched (founded by the Amodeis; $124M investment for scaling “reliable and steerable AI systems”); “Cooperative AI Foundation” (CAIF) launched
“MLP-Mixer: An all-MLP Architecture for Vision”, Tolstikhin et al 2021 (another FC paper removing even more inductive biases—ponies are all you need: “Mixer improves more rapidly with data than ResNets, or even ViT, and the gap between large scale Mixer and ViT models shrinks until the performance is matched on the entire dataset…” The Bitter Lesson truly is the single bitterest lesson in ML, isn’t it? The more people tweet about how MLP-Mixer is overhyped because is −X% worse than the ultra-hand-optimized baseline or requires Y× more FLOPS, the more they demonstrate precisely why this sort of research is so important! And showing, incidentally, that Transformers are still under-researched if such a fundamental fact could have been missed for so long.)
“Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation”, Cheng et al 2021 (CLIP-like performance scaled down to n = 3m using soft labels generated by a Conceptual Captions-pretrained model)
“SR3: Image Super-Resolution via Iterative Refinement”, Saharia et al 2021; “Diffusion Models Beat GANs on Image Synthesis”, Dhariwal & Nichol 2021 (DDPM^1^ finally surpass BigGAN-deep on ImageNet 512px images at similar compute-cost, as expected from their good scaling); “Cascaded Diffusion Models for High Fidelity Image Generation”, Ho et al 2021
“Learning to summarize from human feedback”, Stiennon et al 2020
“Grokking: Generalization Beyond Overfitting On Small Algorithmic Data Sets”, Power et al 2021 (discussion; new scaling effect, ‘grokking’: sudden perfect generalization emerging many epochs after training-set overfitting on algorithmic tasks when training in flat shallow loss landscapes); “Knowledge distillation: A good teacher is patient and consistent”, Beyer et al 2021 (training much smaller models merely requires hundreds of thousands or millions of epochs)
“Scaling End-to-End Models for Large-Scale Multilingual ASR”, Li et al 2021
“The Shape of Learning Curves: a Review”, Viering & Loog 2021
“Reward is enough”, Silver et al 2021 (a DRL manifesto: reward losses enough at scale of compute/parameters/tasks to induce all important capabilities like memory/exploration/generalization/imitation/reasoning)
Scaling Down: lazy: a tool for running processes in idle time (how to train on a GPU without destroying your GUI’s usability! lazy pauses runs briefly while you interact with your desktop, letting you do months-long runs without going crazy or resorting to Colab etc. This enables hobbyists to go after previously-infeasible model sizes); EleutherAI releases a 6b-parameter GPT-3 model, GPT-J (are you still using GPT-2/GPT-Neo? upgrade!); “Aggregating Nested Transformers”, Zhang et al 2021/“Less is More: Pay Less Attention in Vision Transformers”, Pan et al 2021

“ByT5: Towards a token-free future with pre-trained byte-to-byte models”, Xue et al 2021 (character models—not just feasible but desirable; we’ll get our rhyming & pun-making language models yet!)
“Machine Learning Attacks Against the Asirra CAPTCHA”, Golle 2008 (a look back on a decade of CV progress: months of work for 80% cat vs dog with SVM ensembles in 2008; 5min in Fast.ai for 99% accuracy in 2018; for even more perspective, Cireşan 2012)

2.2 Genetics

Everything Is Heritable:

“Bi-ancestral depression GWAS in the Million Veteran Program and meta-analysis in >1.2 million individuals highlight new therapeutic directions”, Levey et al 2021
“The complete sequence of a human genome”, Nurk et al 2021 (media)
“Using DNA to predict intelligence”, von Stumm & Plomin 2021 (review)
“Long read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits”, Beyter et al 2021
“Rapid Sequencing–Based Diagnosis of Thiamine Metabolism Dysfunction Syndrome” (sequence everyone!)

Engineering:

“Sense codon reassignment enables viral resistance and encoded polymer synthesis”, Robertson et al 2021 (“ultra-safe cells”: synthesizing an entire E. coli genome with swapped codons for complete viral immunity)
“In vivo CRISPR base editing of PCSK9 durably lowers cholesterol in primates”, Musunuru et al 2021
Optogenetics: “Partial recovery of visual function in a blind patient after optogenetic therapy”, Sahel et al 2021 (media); “Wireless multilateral devices for optogenetic studies of individual and social behaviors”, Yang et al 2021 (media)
“Retron Library Recombineering (RLR): High-throughput functional variant screens via in vivo production of single-stranded DNA”, Schubert et al 2021
“First genetically modified Oxitec mosquitoes released in the United States”
“Genomic characterization of world’s longest selection experiment in mouse reveals the complexity of polygenic traits”, Palma-Vera et al 2021
“Surrogate broodstock to enhance biotechnology research and applications in aquaculture”, Jin et al 2021
“Utility of polygenic embryo screening for disease depends on the selection strategy”, Lencz et al 2021
“Limit on lab-grown human embryos dropped by stem-cell body: The International Society for Stem Cell Research relaxed the famous 14-day rule on culturing human embryos in its latest research guidelines”
“Useful Mutants, Bred With Radiation” (on atomic gardening)

2.3 Statistics/Meta-Science

“Correlated Failures” in HDDs/SSDs
“How a Publicity Blitz Created The Myth of Subliminal Advertising”, Rogers 1992 (the famous movie-theater/popcorn-sales experiment never happened)

2.4 Politics/Religion

“Clarifying the Structure and Nature of Left-Wing Authoritarianism (LWA)”, Costello et al 2021
“Book Review: The Decline and Fall of the Roman Empire” (excerpts)

2.5 Psychology/Biology

“A connectomic study of a petascale fragment of human cerebral cortex”, Shapson-Coe et al 2021 (“…This “digital tissue” is a ~660,000× scale up of an earlier saturated reconstruction from a small region of mouse cortex, published in 2015 (Kasthuri et al 2015). Although this scaleup was difficult, it was not hundreds of thousands of times more difficult and took about the same amount of time as the previous data set (~4 years)…The rapid improvements over the past few years…argues that analyzing volumes that are even 3 orders of magnitude larger, such as an exascale whole mouse brain connectome, will likely be in reach within a decade." See also “Accelerating progress in brain recording tech”.)
“Neuroimaging evidence for a network sampling theory of individual differences in human intelligence test performance”, Soreq et al 2021; “The neural basis of intelligence in fine-grained cortical topographies”, Feilong et al 2021; “Predicting intelligence from brain gray matter volume”, Hilger et al 2020 (towards the mechanistic reification of g: per P-FIT, it is global efficiency/total cognitive resources which can be spent on learning & orchestrating specialized capabilities); if we consider recent human brain imaging studies, cross-species comparisons, and deep learning as converging, I would offer as a speculation the following:

The Master Synthesis: intelligence is execution of small simplicity-weighted programs, best discovered by search over smooth loss landscapes like that of highly-overparameterized differentiable networks containing lottery-ticket subnetworks which are ensembled/averaged over, approaching Bayes-optimal reasoning in the limit (as nearest-neighbors-like high dimensional interpolation / memorization gives way to algorithmic generalization / interpolation on a more abstract level); this can be implemented by large numbers of similar neurons trained using any of the many approximations to backprop; human intelligence’s g is real but is the overall ‘pool’ of neural resources which derives from overall body integrity because the number of neurons, their density, their myelination, resistance to damage and infection etc, is causally downstream of all body and developmental systems, creating a huge mutational target; the brain regions specialize and differentiate, and their orchestration (or lack thereof) contributes to observed performance on tasks tapping into multiple specialized regions; as tasks rely on fewer regions or approach intrinsic ceiling, g ceases to be observable and task-specific influences matter most.
“MDMA-assisted therapy for severe PTSD: a randomized, double-blind, placebo-controlled phase 3 study”, Mitchell et al 2021 (d = 0.9 over therapy); “Effects of Psilocybin-Assisted Therapy on Major Depressive Disorder”, Davis et al 2021
“Why Animals Don’t Get Lost: Birds do it. Bees do it. Learning about the astounding navigational feats of wild creatures can teach us a lot about where we’re going” (on spectacular but still mysterious feats of animal navigation)
“In The Future Of Collecting, Is Anyone Having Fun?” (on Bobblehead collectors)
“Linking Brain Biology to Intellectual Endowment: A Review on the Associations of Human Intelligence With Neuroimaging Data”, Dizaji et al 2021
“The Best And The Rest: Revisiting The Norm Of Normality Of Individual Performance”, O’Boyle & Aguinis 2012 (performance is log-normal)
“A conserved strategy for inducing appendage regeneration”, Abrams et al 2021 (slight regrowth of damaged mouse limbs by drinking sugar+amino-acid-supplemented water)
“Know Your Amphetamines”, Scott Alexander
“Feeling Small: Exploring the Tactile Perception Limits [of Humans]”, Skedung et al 2013
“The Board Game of the Alpha Nerds: Before Risk, before Dungeons & Dragons, before Magic: The Gathering, there was Diplomacy” (WP; “I still don’t know whom I should have trusted, if anyone. All I know is that I felt stupid, stressed out, humiliated, and sad.”)

2.6 Technology

“I walk the (beta-stability) line: How counting neutrons explains nuclear waste”
“Making is Show Business now”, Alex Danco
“Shop Class as Soulcraft: The case for the manual trades”, Crawford 2006
“Spintronics: Build mechanical circuits”, Kickstarter (followup to Turing Tumble)

2.7 Economics

“RCTs to Scale: Comprehensive Evidence from 2 Nudge Units”, DellaVigna & Linos 2020 (nudge effects overestimated by 6.2× due to publication bias)
“No causal associations between childhood family income and subsequent psychiatric disorders, substance misuse and violent crime arrests: a nationwide Finnish study of >650,000 individuals and their siblings”, Sariaslan et al 2021; “Parental income and mental disorders in children and adolescents: prospective register-based study”, Kinge et al 2021
“Everything You Might Want to Know about Whaling”, Matt Lakeman
Exploding Nash Equilibrium For Trustless Trade

2.8 Fiction

“Love Is the Plan the Plan Is Death”, James Tiptree, Jr. (WP)

2.9 Miscellaneous

“The Strange Story of Dagobert, the Duck Tales Bandit: In the ’90s, a frustrated artist in Berlin went on a crime spree—building bombs, extorting high-end stores, and styling his persona after Scrooge McDuck. He soon became a German folk hero.” (WP; another reminder for Americans—odd as it may seem, Donald Duck is extremely popular overseas; see also the unknown-in-the-USA character John D. Rockerduck or beloved Scandinavian traditionFrom All of Us to All of You who 2020 airing set an all-time record of >4.5m viewers)
List of atmospheric optical phenomena (How many would you recognize from a distance or plane? How many have you even heard of?)
Baron Franz Nopcsa von Felső-Szilvás (noted geologist, paleontologist, anthropologist, homosexual, & skyjacker)
Krishnacore

What is a diffusion model like DDPM? To try to explain it as simply as possible without the math:

DDPM is a neural net which is trained to fix noise in an image: it takes a noisy image and ‘sharpens’ it to produce a new image. You train it by adding dirt to a normal image, and teaching it to turn the dirty version into the original. As it gets better, it learns what the images all tend to look like so it can ‘see through’ ever more noise, to turn smudged hints of the original image into its best guess. Once it’s done training, what happens if you give it a completely dirty photo, which is pure static noise? Well, it produces a slightly less dirty ‘photo’. And if you do it again? it’s a little cleaner still. Now, what if you do this many times? It has to get cleaner each time. The end result: the static noise goes in, and a face pops out! The DDPM has hallucinated a face out of the noise. One little blob of static here turned into a nose, and another blob turned into an ear, and it went from there.

April 2021 newsletter

2021-06-03 23:45:24

April 2021’s Gwern.net newsletter is now out; previous, March 2021 (archives). This is a collation of links and summary of major changes, overlapping with my Changelog; brought to you by my donors on Patreon.

1 Writings

Better Greek Variable Suggestions (use ϰ, ς, υ, ϖ, Υ, Ξ, ι, ϱ, ϑ, or Π instead)

2 Links

2.1 AI

“Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks”, Lee et al 2018; “Perceiver: General Perception with Iterative Attention”, Jaegle et al 2021 (skinny Transformers applied recurrently; given reinvention, one might ask “is attention, getting too much attention?”, especially given how many Transformer tweaks don’t pan out or have antecedents, indicating a gold rush? Probably not: if the marginal return on this research direction had fallen below that of competitors, we would see those neglected directions invade Transformer topics—while we continue to see the reverse, and many applications as yet untouched by all the new approaches, suggesting that we still don’t pay enough attention)
“Z-IL: Predictive Coding Can Do Exact Backpropagation on Any Neural Network”, Salvatori et al 2021 (scaling local learning rules to ImageNet AlexNet/Resnet & ALE DRL at similar compute cost)
“Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates”, Smith & Topin 2017 (the lingering mystery of super-convergence, saving 50–90% compute with LRs as high as 20 (!): what is it, why does it work only sometimes, is there any connection to grokking & can it work for large models like GPT-3 given the tunneling hypothesis?)
“Rip van Winkle’s Razor, a Simple New Estimate for Adaptive Data Analysis” (an unusual approach to estimating generalization—by quantifying the information-theoretic simplicity of all the powerful DL research discoveries since 2012, into ~1 kilobyte. And yet, what a kilobyte…)
“Ambigrammatic Figures”, Levin & Huang 2020 (making horrifying StyleGAN faces that can be rotated 180° by projection & then gradient-ascent towards an upside-down face)

Matters Of Scale:

Large Models:
- Congratulations to OpenAI on 1 year of GPT-3 & OA API. Has it really only been a year?—it has truly exceeded expectations.
- Naver announces 204b-parameter Korean-language NN, “HyperCLOVA” (KO; unknown arch although apparently dense, or training-compute or benchmark/loss performance; 650b token training dataset. Who knew Naver was even trying? “And we are here as on a darkling plain / Swept with confused alarms of struggle and flight, / Where ignorant armies clash by night.”)
- “PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation”, Zeng et al 2021 (Zh; Huawei’s GPT-3-200b prototype, trained on indigenous Chinese GPU+DL stack; a partial replication, due to incomplete training on ~43b tokens; the 13b-parameter model checkpoint has been released for download, and they are considering releasing the 200b-parameter model… Ding commentary)
- New 𝒪(100b)-parameter Transformer models announced at Google I/O ’2021: LaMDA (EN; chatbot), MUM (multimodal multilingual search/translation/Q&A)
- “PLUG” (Zh): a 27b parameter BERT-like Chinese language model, targeting 200b next (AliBaba followup to StructBERT/PALM)
- “CogView: Mastering Text-to-Image Generation via Transformers”, Ding et al 2021 (another Chinese DALL·E clone, post-M6: n = 30m text-image pairs, 4b-parameter GPT, models to be released)
- “VideoGPT: Video Generation using VQ-VAE and Transformers”, Yan et al 2021; “GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions”, Wu et al 2021 (DALL·E for video on Howto100M: VQ-VAE + sparse attention)
- “Efficient Large-Scale Language Model Training on GPU Clusters”, Narayanan et al 2021 (Nvidia ‘Megatron-LM’ software for scaling up to 3072 A100 GPUs; allows 1t-parameter models at 502 petaFLOP/s or 50% efficiency, cf TPU rival, GSPMD, and note Patterson et al 2021 estimates GPT-3 at ~3.5m V100 GPU-hours, so OA got ~20% efficiency?); “We expect to see multi-trillion-parameter models by next year, and 100 trillion+ parameter models by 2023” —Nvidia CEO Jensen Huang (subtitles)
- Mixture-Of-Experts:
  - BAAI’s “Wudao Wensu”: 1.75-trillion parameters & multimodal! (prologue)
  - “Exploring Sparse Expert Models and Beyond”, Yang et al 2021 (1t-parameter hierarchical Switch Transformer trained on 480 V100 GPUs)
MuZero:
- “MuZero Unplugged: Online and Offline Reinforcement Learning by Planning with a Learned Model”, Schrittwieser et al 2021 (Reanalyze+MuZero; smooth log-scaling of Ms. Pacman reward with sample size, 107–1010, showing that DRL for arcade games parallels board games)
- “Decision Transformer: Reinforcement Learning via Sequence Modeling”, Chen et al 2021
- “Sampled MuZero: Learning and Planning in Complex Action Spaces”, Hubert et al 2021 (MuZero for continuous domains: DM Control Suite/Real-World RL Suite); “Continuous Control for Searching and Planning with a Learned Model”, Yang et al 2020
- “Muesli: Combining Improvements in Policy Optimization”, Hessel et al 2020 (catching up with original MuZero)
- “Visualizing MuZero Models”, de Vries et al 2021 (reimplementing & introspecting a MuZero)
“Scaling Scaling Laws with Board Games”, Jones 2021 (AlphaZero/Hex: highly-optimized GPU implementation enables showing smooth scaling across 6 OOM of compute—2× FLOPS = 66% victory; amortization of training → runtime tree-search, where 10× training = 15× runtime)
“Scaling Laws for Language Transfer Learning”, Christina Kim (Hernandez et al 2021 followup: smooth scaling for En → De/Es/Zh)
“Carbon Emissions and Large Neural Network Training”, Patterson et al 2021 (“…choice of DNN/datacenter/processor can reduce the carbon footprint up to ~100–1000×. These large factors make retroactive estimates difficult.”)
“How to Train BERT with an Academic Budget”, Izsak et al 2021 (BERT in 8 GPU-days—R&D iteration allows finding efficiency; there’s nothing so expensive as demanding research be cheap.^1^)

2.2 Genetics

Everything Is Heritable:

“Precision exercise medicine: understanding exercise response variability”, Ross et al 2019 (“large individual differences in CRF response (range: −33% to +118%) have been observed across the 8 exercise training studies independent of exercise duration”—nothing in psychology, or medicine, makes sense except in the light of individual differences…)

Recent Evolution:

“Analysis of genomic DNA from medieval plague victims suggests long-term effect of Yersinia pestis on human immunity genes”, Immel et al 2021

Engineering:

“China officially bans CRISPR babies, human clones and animal-human hybrids”? (another blow to attempts to project fears & fantasies onto China)

2.3 Politics/Religion

Reflecting Sunlight: Recommendations for Solar Geoengineering Research and Research Governance, National Academies 2021 (media)
“Improving Public Sector Management at Scale? Experimental Evidence on School Governance India”, Muralidharan & Singh 2020
“Jay-Z’s 99 Problems, Verse 2: A Close Reading with 4th Amendment Guidance for Cops and Perps”, Mason 2012

2.4 Psychology/Biology

“Oxylipin biosynthesis reinforces cellular senescence and allows detection of senolysis”, Wiley et al 2021
“Inside the Secret Sting Operations to Expose Celebrity Psychics”
“If I fits I sits: A citizen science investigation into illusory contour susceptibility in domestic cats (Felis silvestris catus)”, Smith et al 2021
“Cetaceans, sex and sea serpents: an analysis of the Egede accounts of a ‘most dreadful monster’ seen off the coast of Greenland in 1734”, Paxton et al 2005 (is that a legendary cryptid in your pocket, or are you just happy to see me?)
“Building the perfect curse word: A psycholinguistic investigation of the form and meaning of taboo words”, Reilly et al 2020
Tarrare

2.5 Technology

“How Developers Choose Names”, Feitelson et al 2021 (“Another example concerned the function ‘arrangeFilesByName(files)’. When asked the return value…one suggested the number of files reordered”)
“Bringing GNU Emacs to Native Code”, Corallo et al 2020 (using libgccjit to make Emacs 2.3× to 42× faster; gccemacs has been merged into Emacs HEAD & will be available soon)
“Hosting SQLite databases on Github Pages (or any static file hoster)” (a revolution in static website technology: eg running a query need download only 54kb of a 670MB database; fulltext site search is just the beginning of the possibilities of this clever use of range requests)
“Fontemon: World’s first video game in a font!” (a Pokemon-like CYOA implemented as an OpenType font file; play in browser or text editor—still not quite Turing-complete but definitely the most impressive thing implemented in a font so far)
- Fontemon is by far the highlight of SIGBOVIK 2021; but also worth noting: “Back to Square One: Superhuman Performance in Chutes and Ladders Through Deep Neural Networks and Tree Search” · “Deep Deterministic Policy Gradient Boosted Decision Trees” · “Lowestcase and uppestcase letters: Advances in derp learning” · “openCHEAT: Computationally Helped Error bar Approximation Tool—Kick-starting Science 4.0” · “The Newcomb-Benford Law, Applied to Binary Data: An Empirical and Theoretic Analysis” · “Inverted Code Theory: Manipulating Program Entropy” (Tenet fans only—possibly inferior to Moravec 1991?) · “Build your own 8-bit busy beaver on a breadboard!”
Incidentally, it’s curious that while STEM fields have entire annual issues, journals, & conferences devoted to satire (SIGBOVIK; Arxiv April Fools papers like Garfinkel et al 2017; Special Topics; the BMJ Christmas issue; the Ig Nobel Prizes & BAHFest), after asking in several places, I have found no instances in the humanities. (I know of many entertaining papers, like Sinhababu 2008 on waifus, but no regular organized publication, with the possible exception of the annual “Latke-Hamantash Debate”.)

2.6 Economics

“The Kelly Criterion in Blackjack Sports Betting, and the Stock Market”, Thorp 2006
“The Performance Pay Nobel” (CEO pay as blackbox optimization problem)
“The Ocean’s Hot Dog: The Development of the Fish Stick”, Kelly 2008 (out of nostalgia, I bought some fish sticks for the first time in decades; better than I remembered, even if I had no tartar handy)

2.7 Philosophy

“The Aesthetics of Smelly Art”, Shiner & Kriskovets 2007; “The Odor Value Concept in the Formal Analysis of Olfactory Art”, Kraft 2019; “Perfumery as an art form”/notes, Qualia Computing 2020 (more: manufacturing: “The Scent of the Nile: Jean-Claude Ellena creates a new perfume”; human smell is better than you think: “Mechanisms of Scent-tracking in Humans”, Porter et al 2006 (video; see also “Poor Human Olfaction is a 19th Century Myth”, McGann 2017); olfactory white; Kōdō, which unexpectedly appears in Knuth. C. Thi Nguyen’s description of the more bizarre & avant-garde perfumes made me curious enough to nose around & order 39 LuckyScent samplers.)

2.8 Miscellaneous

Bog butter
Sarah Bernhardt (Lions. Lots of lions.)

Another thought, looking at ‘Employer Costs for Employee Compensation’ (PDF):
1. “Moore’s Law”: the cost of a transistor halves every ~19 months;
2. “Anti-Moore’s Law”: the cost of a synapse doubles every ~119 years.

March 2021 Gwern.net Newsletter

2021-04-06 23:31:01

March 2021’s Gwern.net newsletter is now out; previous, February 2021 (archives). This is a summary of the revision-history RSS feed, overlapping with my Changelog & /r/gwern; brought to you by my donors on Patreon.

1 Writings

Gwern.net: mobile “popins” are finally enabled! (example); new Wikipedia popups (this 7th implementation enables recursive WP popups)

2 Links

2.1 AI

“Multimodal Neurons in Artificial Neural Networks”, Goh et al 2021 (dissecting CLIP concepts, discovering typographical classification ‘attacks’^1^ and a Stroop effect! Is there anything CLIP can’t do?)
“Evolving Reinforcement Learning Algorithms”, Co-Reyes et al 2021 (evolving eg TD-learning)
“Waymo Simulated Driving Behavior in Reconstructed Fatal Crashes within an Autonomous Vehicle Operating Domain”, Scanlon et al 2021 (blog; hard negative mining—self-driving cars, being inhuman, can learn not just from their mistakes but humans’ mistakes too)
“Debugging Reinforcement Learning Systems Without The Agonizing Pain”, Andy L. Jones; “My Reinforcement Learning Learnings”, Clemens Winter

Matters Of Scale:

“SEER: Self-supervised Pretraining of Visual Features in the Wild”, Goyal et al 2021 (blog; near-SOTA by training 1b-param CNN on 1b unfiltered unlabeled Internet images—another reminder that unsupervised learning is really working!); “‘Learning From Videos’ to understand the world” (rapid FB expansion of self-supervised training to millions of photos/videos/hours-of-speech); “Contrasting Contrastive Self-Supervised Representation Learning Models”, Kotar et al 2021 (Supervised learning from ImageNet is now obsolete for transfer learning, and ImageNet just a contaminated validation set)
“Understanding Robustness of Transformers for Image Classification”, Bhojanapalli et al 2021 (Vision Transformers gain robustness faster than CNNs as dataset size increases)
“Artificial Intelligence Index Report 2021”: technical performance and cost (Ding questions whether this shows China catching up on AI at all, as we are incessantly told it is doing; one question to ask: ignoring fast-following, what, out of the thousands upon thousands of publications flooding out these days, are the last 3 major novel AI breakthroughs coming out of all pure-Chinese labs combined which could be plausibly equated in importance with, say, just OpenAI’s recent output of GPT-3/DALL·E/CLIP?)
OA GPT-3 API: >300 apps, >10k developers, >4.5b words per day
“A mathematical theory of semantic development in deep neural networks”, Saxe et al 2019 (are jumps in NN capabilities to be expected when scaling? see also Viering & Loog 2021’s discussion of phase transitions & averaging of exponentials giving power-laws)
“An early cell shape transition drives evolutionary expansion of the human forebrain”, Benito-Kwiecinski et al 2021 (media; a simple switch for the scaling up of the primate brain)
- “Crows possess higher intelligence long thought primarily human” (the remarkable, yet not extraordinary, crow/raven brain as scaled-up bird brain)

2.2 Genetics

Everything Is Heritable:

“GWAS in almost 195,000 individuals identifies 50 previously unidentified genetic loci for eye color”, Simcoe et al 2021
“Why Do Wealthy Parents Have Wealthy Children?”, Fagereng et al 2021 (I’m always impressed just how difficult it is for rich people to pass on wealth—“shirtsleeves to shirtsleeves in 3 generations” etc)

Evolution:

“Nothing in evolution makes sense except in the light of parasites”, Hickinbotham et al 2021

Engineering:

“The Demise and Potential Revival of the American Chestnut”

2.3 Statistics/Meta-Science

“Broad cross-national public support for accelerated COVID-19 vaccine trial designs”, Broockman et al 2021 (“we can’t do challenge trials with volunteers in February 2020 to save countless thousands of lives because ordinary people might think it unethical”—have you tried asking them, or was that irrelevant because it was just another noble lie?)
“This is the story of how I found what I believe to be scientific misconduct and what happened when I reported it”, Joe Hilgard
“The Revolution in Classic Tetris: How a younger generation used the Internet to master the falling blocks” (how achieving classic Tetris maximum-scores, first done in 2010, became routine thanks to YouTube & online competition for excellence)

2.4 Politics/Religion

“Magic, Explanations, and Evil: The Origins and Design of Witches and Sorcerers”, Singh 2021 (doubtless even cavemen were all “Og: sus.”)
“Self-blinding citizen science to explore psychedelic microdosing”, Szigeti et al 2021 (related to Kaertner et al 2021; a self-blinding study, similar to my old self-blinding protocols, confirms that microdosing is just placebo effect, as I said in 2012, and I’m reminded of DNB studies like Foroughi et al 2016)
The 2019–2020 vaping moral panic over adulterated black-market THC products (depressing to see how irresponsibly reported & alarmist this was, and how everyone attempted to frame nicotine for it2. Naturally, no one involved has apologized or admitted fault—after all, their intentions were good, “won’t someone think of the children”‽ The incompetence and/or dishonesty here emphasizes how 2020–2021 was business as usual, and the only unusual part is that reality happened so fast we saw some of the unseen.)
Mark Hofmann
Alexandra David-Néel (one of those 1800–1900s biographies)
John Harvey Kellogg

2.5 Psychology/Biology

“Can You Ever Be Too Smart for Your Own Good? Comparing Linear and Nonlinear Effects of Cognitive Ability on Life Outcomes”, Brown et al 2021
“The pandemic fallacy: Inaccuracy of social scientists’ and lay judgments about COVID-19’s societal consequences in America”, Hutcherson et al 2021 (highly-inaccurate even retrospectively, typically grossly overestimating)
“Training Working Memory for Two Years—No Evidence of Latent Transfer to Intelligence”, Watrin et al 2021 (fade-out of expectancy/placebo effects)
“Real-time dialogue between experimenters and dreamers during REM sleep”, Konkoly et al 2021
“Leroy’s elusive little people: A systematic review on lilliputian hallucinations”, Blom 2021 (Alice in Wonderland syndrome)
“A Group of Orca Outcasts Is Now Dominating an Entire Sea: ‘Transient’ killer whales that feast on seals and hunt in small packs are thriving while their widely beloved ‘Resident’ siblings are dying out” (I wonder how the third orca type, ‘offshore’, are doing?)
“Estimation of the total saliva volume produced per day in 5-year-old children”, Watanabe et al 1995

2.6 Technology

“The Aesthetic-Usability Effect”, Moran 2017 (“They Might Never Tell You It’s Broken” if it’s pretty enough; see also “The Third User”)
“Cameras and Lenses”, Bartosz Ciechanowski (explorable; followup to “Lights and Shadows”)
“Large Batch Simulation for Deep Reinforcement Learning”, Shacklett et al 2021 (your computer is faster than you think)
“The incredible boxes of Hock Wah Yeo” (unusual video game packaging design)
“Stone Walls That Stay Built: A master waller shares how to dry-lay stone walls that hold their ground for centuries”, Post 2017
Automated storage and retrieval system
Visual cryptography

2.7 Economics

“The Use and Misuse of Income Data and Extreme Poverty in the United States”, Meyer et al 2021 (measurement error in non-registry surveys of population extremes—not quite “lizardman” but similar problem)
“Is economics performative? Option theory and the construction of derivatives markets”, Mackenzie 2006 (the mechanics of how the Black-Scholes model changed markets: Black ran a service printing “paper” estimating optimal prices for all options which traders could consult & use with simple heuristics to try to arbitrage the market)
“Whitewood under Siege: On the front lines of the pallet wars” (the competition between the two ecosystems of shipping pallets: ‘whitewood’ & ‘blue pallet’)
Mautam

2.8 Philosophy

“Coping with mortality: responses of monkeys and great apes to collapsed, inanimate and dead conspecifics”, De Marco et al 2021
Braitenberg vehicle

2.9 Fiction

“Reply of the Zaporozhian Cossacks”

2.10 Miscellaneous

America’s top ace, Major Dick Bong

3 Film/TV

Live-action:

North by Northwest (Hitchcock 1959; for such a extremely respected movie, it felt oddly formless and like it was bouncing through genres as more of a comedic B-movie romp than a serious auteur’s effort—since James Bond started in 1953, with a TV adaptation in 1954, NbN comes off as almost a satire. I mean, really, monkeying around in Presidential noses!)

While interesting, these are ‘attacks’ only in the most generous interpretation possible (since it does know the difference), and the fact that CLIP can read text in images to note the semantic similarity, is to considerable credit. As the CLIP authors note, some queries benefit from ensembling, more context than a single word class name such as prefixing “A photograph of a”, and class names can be highly ambiguous: in ImageNet, the class name “crane” could refer to the bird or construction equipment; and the Oxford-IIIT Pet dataset labels one class “boxer”. (CLIP is still vulnerable to regular adversarial examples, of course.)↩
It couldn’t’ve been nicotine because people had been vaping for a decade and a half without widespread near-instantaneous lung-related fatalities! It had to be a new adulterant, and as soon as the first few black-market THC links surfaced, that meant the problem had to be THC-products-only because how would the same adulterant simultaneously get into the different supply chains? And yet, every article, health official, and activist did their paternalist best to suggest otherwise to pin the blame on regular vaping, no matter how many tests turned up clean, and it was the nicotine vaping products which got summarily banned…. One must assume many of those laws are still on the books, inasmuch as the shipping bans keep expanding.↩

February 2021 Gwern.net Newsletter

2021-03-13 23:18:44

February 2021’s Gwern.net newsletter is now out; previous, January 2021 (archives). This is a summary of the revision-history RSS feed, overlapping with my Changelog & /r/gwern; brought to you by my donors on Patreon.

1 Writings

Gwern.net: popups: can now be moved, stickied, and full-screened (another step towards our ambition of Windows-95-in-the-browser!)

2 Links

2.1 AI

“Controllable Neural Text Generation”, Lilian Weng; “Recent Advances in Language Model Fine-tuning”, Sebastian Ruder (review)
- “Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm”, Reynolds & McDonell 2021 (original 10-shot Fr → En translation can be beaten by the better 0-shot prompt: “French: XYZ / English:…”; this is “true of most worst-performing prompts…”); “Calibrate Before Use: Improving Few-Shot Performance of Language Models”, Zhao et al 2021 (huge boost from calibrating unstable prompts; both demonstrate, as always, that “sampling can prove the presence of knowledge but not the absence.”)
“TransGAN: Two Transformers Can Make One Strong GAN”, Jiang et al 2021 (Transformer-only GAN: attention is all you need)
“PACT: Proof Artifact Co-training for Theorem Proving with Language Models”, Han et al 2021 (GPT-f for Lean)
“Towards End-to-End In-Image Neural Machine Translation”, Mansimov et al 2020 (sure why not)
Brains:
- “Artificial Neural Nets Finally Yield Clues to How Brains Learn” (short overview of biologically-plausible backprop: feedback alignment, target propagation, predictive coding, & attentional feedback; also of recent interest, VS-ML; given their increasing success in training while respecting more biological constraints, the increasing power of backprop-trained ANNs and the neurological success of ANNs in predicting & imitating brain signals, it is increasingly clear that brains really do do backprop in some sense)
- “NSD: A massive 7-tesla fMRI dataset to bridge cognitive and computational neuroscience”, Jean et al 2021 (“…The availability of NSD thus opens the door to using brain activity to directly guide the optimization of deep neural networks.”)
- “Brain2Pix: Fully convolutional naturalistic video reconstruction from brain activity”, Le et al 2021 (reconstructing Dr. Who)
- “High-performance brain-to-text communication via imagined handwriting”, Willett et al 2020
- “Brain-computer interface for generating personally attractive images”, Spape et al 2021 (many ways to improve this…)

Matters Of Scale:

“Scaling Laws for Transfer”, Hernandez et al 2021 (“We find that pre-training effectively multiplies the fine-tuning dataset size”; a shot across the bow of anyone floating on a proprietary-dataset moat: large models can drop data requirements by orders of magnitude overnight, even surpassing you)
“ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision”, Jia et al 2021 (see also CC-12M; CLIP-like w/EfficientNet trained on 1.8 billion images on a TPUv3-1024—DM argues that fancier cross-modal Transformers are better, nevertheless, ‘TPUs go brrr’. Given DALL·E, CLIP, ALIGN, VDVAE, CW-VAE, AIPO et al, are GANs already dead, and just don’t realize it yet? Or at least soon to be relegated to only DRL-like uses as a final finetuning phase to sharpen up a self-supervised model?); “WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training”, Huo et al 2021
“DALL·E: Zero-Shot Text-to-Image Generation”, Ramesh et al 2021 (original blog); “M6: A Chinese Multimodal Pretrainer”, Lin et al 2021 (Chinese DALL·E: 1.9TB images/0.29TB text for 10b-parameter dense/100b-parameter MoE Transformer; shockingly fast Chinese replication of DALL·E/CLIP)
“Explaining Neural Scaling Laws”, Bahri et al 2021/“Learning Curve Theory”, Hutter 2021 (Rohin Shah commentary; more on the manifold hypothesis)

2.2 Genetics

Everything Is Heritable:

“Phenotypic covariance across the entire spectrum of relatedness for 86 billion pairs of individuals”, Kemper et al 2021
“Genetic variation, brain, and intelligence differences”, Deary et al 2021
“Pathfinder: A gamified measure to integrate general cognitive ability into the biological, medical and behavioural sciences”, Malanchini et al 2021 (not the focus, but the IQ PGS is a slight improvement over Allegrini et al 2018 due to less phenotype measurement error?)
“Polygenic burden has broader impact on health, cognition, and socioeconomic outcomes than most rare and high-risk copy number variants”, Saarentaus et al 2021
On candidate-genes & COMT

Recent Evolution:

Engineering:

First Black-Footed Ferret cloned

2.3 Statistics/Meta-Science

“Lessons from Gerolamo Cardano’s The Book of My Life” (progress studies; see also Newton’s anthropic argument, Bakewell & inventing progress, The Autobiography of Benvenuto Cellini)
“How Many Microcovids Would You Spend on a Burrito?” (on the microCOVID Project Calculator)
“On the enfeeblement of mathematical skills by ‘Modern Mathematics’ and by similar soft intellectual trash in schools and universities”, Hammersley 1968 (Knuth highlights as also amusing: “A Note on Piffles”, Smith 1967; “A rebuke of A. B. Smith’s paper, ‘A Note on Piffles’”, Farlow 1980)
“Artifact and Recording Concepts in EEG”, Tatum et al 2011 (on the EEG signals of Jell-O, or, the importance of negative controls)

2.4 Politics/Religion

“The Logic of Fashion Cycles”, Acerbi et al 2012; “Fashion and art cycles are driven by counter-dominance signals of elite competition: quantitative evidence from music styles”, Klimek et al 2019; “The hipster effect: When anti-conformists all look the same”, Touboul 2019; “Right Is The New Left”, Scott Alexander (see also Han et al 2010, Downs 1972/Gupta & Jenkins-Smith 2015, Lorenz-Spreen et al 2019/Candia et al 2019, Loury 1994)
“What can we learn from the lunar pandemic that never was?” (NASA’s lunar quarantine was a sham intended to mollify the public as they covered up repeated major failures & lab leaks both before & after—had there been any dangerous lunar organisms, they would have escaped easily)
MrBeast (the new aristocracy of prestige? Borrowed plumage, perhaps, but effective…)
“Russia’s new Lysenkoism”, Kolchinsky et al 2017

2.5 Psychology/Biology

Semaglutide: “Once-Weekly Semaglutide in Adults with Overweight or Obesity”, Wilding et al 2021; “Effect of Subcutaneous Semaglutide vs Placebo as an Adjunct to Intensive Behavioral Therapy on Body Weight in Adults With Overweight or Obesity: The STEP 3 Randomized Clinical Trial”, Wadden et al 2021

A longer-acting version of the insulin/appetite peptide liraglutide, semaglutide greatly reduces weight, fat, blood sugar, cholesterol etc, with an upcoming oral version; background: Kushner et al 2020, Aroda et al 2019, Nauck & Meier 2019, O’Neil et al 2018, Blundell et al 2017, Nauck et al 2016, Lau et al 2015.
“Lessons from the host defences of bats, a unique viral reservoir”, Irving et al 2021 (bat-borne viruses; previously, Trevor Klee)
“Beneficial & Detrimental Effects of Reactive Oxygen Species on Lifespan: A Comprehensive Review of Comparative & Experimental Studies”, Shields et al 2021 (antioxidants still aren’t the fountain of youth, and may be harmful; animal studies still frequently inconsistent)
“Positive expectations predict improved mental-health outcomes linked to psychedelic microdosing”, Kaertner et al 2021 (placebo)
“The Effects of Fluoride in Drinking Water”, Aggeborn & Öhman 2021
“Sleep & Sex: What Can Go Wrong? A Review of the Literature on Sleep Related Disorders and Abnormal Sexual Behaviors & Experiences”, Schenck et al 2007

2.6 Technology

New X-Prize: $100m in prizes for Carbon Removal
Wringing gauge blocks (“With their precisely-flat metal faces, gauge blocks can be stuck together non-magnetically via a process calling ‘wringing’, requiring substantial effort to separate. Scientists are still uncertain exactly how wringing works.”)
Armored train

2.7 Economics

“Why did renewables become so cheap so fast? And what can we do to use this global opportunity for green growth?”, Max Roser (specifically, why such an extreme experience curve?)
“IQ, trading behavior, and performance”, Grinblatt et al 2012; “Genetic Endowments and Wealth Inequality”, Barth et al 2020 (why, despite notorious setbacks, did Isaac Newton & LTCM’s founders die wealthy? Why, in general, are more intelligent people so much better investors? ‘The indifference of the indicator’: it’s not one thing, it’s everything—more intelligent people have lower discount rates, save more for longer & are less risk-averse, more accurately predict future growth or inflation, are more likely to participate in +EV opportunities like the stock market, to use low-fee rather than high-fee (and thus, underperforming) mutual funds, succumb less to biases like herding as they trade better & at better times, trade less, and harvest losses more efficiently when trading poorly.)

2.8 Philosophy

Are ethics experts more ethical? “The Behavior of Ethicists”, Schwitzgebel & Rust 2016 (most recently: “The moral behavior of ethics professors: A replication-extension in German-speaking countries”, Schönegger et al 2019; given moral licensing & activism, perhaps we should be surprised we don’t hear about more ethicists doing things like posting enemy lists or trying to dox reviewers. “Woe to you Pharisees!”)
“Meta-analysis on belief in free will manipulations”, Genschow et al 2021 (another noble lie turns out to be ignoble)
Gricean maxims of communication

2.9 Fiction

Bunnies & Burrows

2.10 Miscellaneous

“Caesar Lives”, Iggy Pop 1995 (on Gibbon)
Mad honey
Imperial Court System

Jan 2021 Gwern.net Newsletter

2021-02-05 04:23:01

January 2021’s Gwern.net newsletter is now out; previous, December 2020 (archives). This is a summary of the revision-history RSS feed, overlapping with my Changelog & /r/gwern; brought to you by my donors on Patreon.

1 Writings

“Danbooru2020: A Large-Scale Crowdsourced and Tagged Anime Illustration Dataset”
This Anime Does Not Exist.ai (TADNE) (discussion)
Gwern.net: +return-to-top floating button; popups: can now be disabled (use the ‘gear’ icon); final reimplementation (dynamic JS now; memoizing the recursive inlining, however clever & elegant, turns out to have painful edge-cases & still not be efficient enough—web browsers really don’t like loading hundreds of kilobytes of extra HTML)

2 Links

2.1 AI

Matters Of Scale:

Scaling up:
- “DALL·E: Creating Images from Text”, OpenAI (GPT-3-12.5b generating 1280 tokens → VQ-VAE pixels; generates illustration & photos); “CLIP (Contrastive Language-Image Pre-training): Connecting Text and Images”, OpenAI (Radford et al 2021: zero-shot image understanding via text description—useful for much more than just ranking DALL·E samples by quality)
  
  Further blessings of scale: simple contrastive training on n = 400m leads to remarkable generalization & combinatorial flexibility of image generation by DALL·E, and CLIP learns to reach image classification SOTA by zero-shot on many datasets, with more human-like errors & less degradation out of samples than rivals, while costing the same to train. OpenAI released their smallest CLIP model (the “ViT-B/32”-equivalent) and people are discovering it seems able to do just about anything without any further training—the paper notes that it does everything from “fine-grained object classification, geo-localization, action recognition in videos, and OCR”, but there’s so much more, and you can use it to generate image captions/descriptions, classify your anime images, pull a specific target image description by gradient ascent or out of another neural network such as an ImageNet BigGAN or TADNE StyleGAN2-ext (or, why not, synthesize images images embodying abstract concepts like emoji or words like “nightmare fuel” or “confusion”!), search your image datasets by embedding, find mislabeled images (eg by using “upside down” as the prompt)… One wonders, like GPT-3, how much better the largest CLIP (“L/14-336px”) is and how many ways of using it (or DALL·E) remain to be found? And why prediction losses work so well in one place, but then contrastive elsewhere?
  
  For perspective: there are newly-minted PhDs going on the job market who got excited about deep learning because of these new “resnet” things; undergrads who applied to grad school because BERT et al were blowing open NLP & extending neural supremacy to natural language would not yet have passed quals; and it has been only 1 academic semester since GPT-3 was announced. Or to put it quantitatively, for just sequence modeling: it has been 8,478 days since LSTM RNNs were published; 3,045 days since AlexNet’s ImageNet scores were released; 1,880 days since residual networks were published in a paper; 1,330 days since “Attention Is All You Need” hit Arxiv; 844 days since BERT’s paper was published; 718 days since GPT-2 was announced; 353 days since SimCLR, and 249 days since GPT-3 was; and 27 days since CLIP/DALL·E.^1^ Spring is coming. (Some still insist we need not worry about “overpopulation on Mars” for >18,264 more days…)
- “Meta Pseudo Labels”, Pham et al 2020 (90% on ImageNet by pretraining a meta-learning teacher using JFT-300M on a TPUv3-2048)
- “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity”, Fedus et al 2021 (1.57t-parameter GShard followup; the mixture-of-experts approach, while scaling stably, starts showing its limits)
Scaling down:
- “DeiT: Training data-efficient image transformers & distillation through attention”, Touvron et al 2020 (scaling Transformer classifiers down to ImageNet+1-GPU); “BoTNet: Bottleneck Transformers for Visual Recognition”, Srinivas et al 2021/“Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet”, Yuan et al 2021 (hybrids); “not-so-BigGAN: Generating High-Fidelity Images on Small Compute with Wavelet-based Super-Resolution”, Han et al 2020/“VQGAN: Taming Transformers for High-Resolution Image Synthesis”, Esser et al 2020 (training >1024px Transformer GANs on just 2 GPUs)
  
  Transformer supremacy in image-related tasks continues, and GANs are becoming increasingly hybridized. Do pure-GANs have a future, now that VAEs and autoregressive models are making such inroads into both the highest-quality & lowest-compute sample generation? To take the GAN/DRL analogy seriously, perhaps they were they ultimately a dead end, akin to trying to learn everything from rewards, and an adversarial GAN loss ought to be only the cherry on the cake of a large unsupervised/semi-supervised generative model.
- “ZeRO-Offload: Democratizing Billion-Scale Model Training”, Ren et al 2021 (partial CPU training for 13b-parameter models on 1 V100 GPU, scaling to 128 GPUs)
- “Prefix-Tuning: Optimizing Continuous Prompts for Generation”, Li & Liang 2021 (could the PET & CLIP trick of averaging multiple embeddings to yield much better performance be reused for GPT-3 prompts to greatly improve prompting? The fact that the prefix-tuning, by directly optimizing the prompt embeddings, yields better performance than even single optimized text prompts, suggests so. The user could provide 3 or 4 similar prompts, and synthesize them into a single super-prompt to better program GPT-3…)
- “Scaling down Deep Learning”, Greydanus 2020 (cute: parametric simplified-MNIST for rapid iteration on tiny NNs: experiments in lottery-ticket & meta-learning of LRs/activations)
- “The neural network of the Stockfish chess engine” (very lightweight NN designed for incremental recomputation over changing board states)
“Transformers in Vision: A Survey”, Khan et al 2021
OpenAI departures: Dario Amodei, Sam McCandlish, Tom Brown, Tom Henighan, Chris Olah, Jack Clark, Ben Mann, Paul Christiano et al leave—most for an unspecified new entity (“the elves leave Middle Earth”?)

And the rest:

“2020 AI Alignment Literature Review and Charity Comparison”, Larks
“Grounded Language Learning Fast and Slow”, Hill et al 2020
“DeBERTa: Decoding-enhanced BERT with Disentangled Attention”, He et al 2020 (SuperGLUE falls)
“Solving Mixed Integer Programs Using Neural Networks”, Nair et al 2020
“Towards Fully Automated Manga Translation”, Hinami et al 2020
“UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers”, Hu et al 2021
“FERM: A Framework for Efficient Robotic Manipulation”, Zhan et al 2021 (contrastive semi-supervised learning + data augmentation for sample-efficiency)
“XMC-GAN: Cross-Modal Contrastive Learning for Text-to-Image Generation”, Zhang et al 2021

2.2 Genetics

Everything Is Heritable:

“Nurture might be nature: cautionary tales and proposed solutions”, Hart et al 2021
“A genetic perspective on the association between exercise and mental health in the era of genome-wide association studies”, de Geus 2020; “Evidence for shared genetics between physical activity, sedentary behaviour and adiposity-related traits”, Schnurr et al 2020
“Antidepressant Response in Major Depressive Disorder: A Genome-wide Association Study”, Pain et al 2020
“Genome wide analysis of gene dosage in 24,092 individuals shows that 10,000 genes modulate cognitive ability”, Huguet et al 2020 (yep, still polygenic)
“GWAS of three molecular traits highlights core genes and pathways alongside a highly polygenic background”, Sinnott-Armstrong et al 2021
“Genome-scale sequencing and analysis of human, wolf and bison DNA from 25,000 year-old sediment”, Gelabert et al 2021 (incredible this is possible)
“Disentangling sex differences in the shared genetic architecture of PTSD, traumatic experiences, and social support with body size and composition”, Carvalho et al 2021 (LCV)

Recent Evolution:

“African genetic diversity and adaptation inform a precision medicine agenda”, Pereira et al 2021; “The influence of evolutionary history on human health and disease”, Benton et al 2021; “Local adaptation and archaic introgression shape global diversity at human structural variant loci”, Yan et al 2021
“Genome scans of dog behavior implicate a gene network underlying psychopathology in mammals, including humans”, Zapata et al 2021
“Natural Selection in Contemporary Humans is Linked to Income and Substitution Effects”, Hugh-Jones & Abdellaoui 2021
“The diversity and function of sourdough starter microbiomes”, Landis et al 2021 (crowdsourced sourdough show little trace of geographic origins?)

Engineering:

“In vivo base editing rescues Hutchinson-Gilford progeria syndrome in mice”, Koblan et al 2021
“From Genotype to Phenotype: polygenic prediction of complex human traits”, Raben et al 2021

2.3 Statistics/Meta-Science/Math

“The Quantum Field Theory on Which the Everyday World Supervenes”, Carroll 2021 (“…we have reason to be confident that the laws of physics underlying the phenomena of everyday life are completely known” because all unknown particles/fields are constrained to being extremely rare/weak, eg by Adelberger et al 2009)
“How accurate are citations of frequently cited papers in biomedical literature?”, Pavlovic et al 2020 (includes original author’s evaluation of whether a citation of their work is correct)
“Energy-Efficient Algorithms”, Demaine et al 2016 (reversible computing asymptotics: constant-factor stacks/arrays, 𝒪(log n) time/energy AVL trees, 𝒪(n) space sorts, & various 𝒪(Vertex+Edge) time/space/energy graph searches)
“The Optimizer’s Curse: Skepticism and Postdecision Surprise in Decision Analysis”, Smith & Winkler 2006 (regression to the mean is everywhere; another example of why Bayes & decision theory are two great flavors that go great together)

2.4 Politics/Religion

“The Mechanisms of Cult Production: An Overview”, Xavier Marquez 2020 (see previously his blog roundup)
“When Prophecy Fails and Faith Persists: A Theoretical Overview”, Dawson 1999
“Why We Fight Over Fiction”, Robin Hanson
The All-Woman Supreme Court

2.5 Psychology/Biology

“Still Alive”, Scott Alexander (announcement of SSC return as Substack newsletter ‘Astral Codex Ten’ & launching a low-cost psychiatry clinic ‘Lorien Psychiatry’)
“The Temporal Dynamics of Opportunity Costs: A Normative Account of Cognitive Fatigue and Boredom”, Agrawal et al 2020
“A unified framework for association and prediction from vertex-wise grey-matter structure”, Couvy-Duchesne et al 2020 (more morphometricity)
Common phenomena: “Sounds from seeing silent motion: Who hears them, and what looks loudest?”, Fassnidge & Freeman 2018 (on ‘visual ear’; previously: Saenz & Koch 2008, Fassnidge et al 2017)
“Predicting Mental Health From Followed Accounts on Twitter”, Costelli et al 2021 (Registered Report: who you choose to follow says a lot about you—everything is correlated)
“No evidence for general intelligence in a fish”, Aellen et al 2021
Delirium tremens
“Microbiome connections with host metabolism and habitual diet from 1,098 deeply phenotyped individuals”, Asnicar et al 2021
“Universal DNA methylation age across mammalian tissues”, Lu et al 2021; “Whole-body senescent cell clearance alleviates age-related brain inflammation and cognitive impairment in mice”, Ogrodnik et al 2021
“BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data”, Kostas et al 2021 (towards brain imitation learning)
Parker-Hulme murder case; The Slender Man stabbing (paracosms?)
Correction: Programming competition skills do not inversely correlate with job performance after all

2.6 Technology

Natural nuclear fission reactors (Oklo)
“Baffles and Bastions: The Universal Features of Fortifications”, Keeley et al 2007
The Corrupted Blood incident
Footnote 36: “Redisturbed”: a unicase font experiment

2.7 Economics

“Businesses Aim to Pull Greenhouse Gases From the Air. It’s a Gamble”
"Does Advertising Actually Work?" (what could be more obvious than “advertising works”, and trivial to confirm with correlational data? Yet, the tedious saying “correlation ≠ causation” stubbornly insists on being true); “Digital Paywall Design: Implications for Content Demand and Subscriptions”, Aral & Dhillon 2020 (NYT nag-paywall caused −9.9% reading; in line with all the other results)
“Who Gains and Who Loses from Credit Card Payments? Theory and Calibrations”, Schuh et al 2010 (a compelling case for getting a rewards credit card if you’re a debit card user—why subsidize them so much?)
“Squeezing the bears: cornering risk and limits on arbitrage during the ‘British bicycle mania’, 1896–1898”, Quinn 2019

2.8 Fiction

“On Venus, Have We Got a Rabbi!”, William Tenn 2016
“St Martin’s Four Wishes”, Anonymous medieval poet (trans. Dubin 2013)

2.9 Miscellaneous

But it’ll still be too many days ’till we say we’re sorry.

December newsletter

2021-01-11 01:31:06

Please see the canonical version of the December 2020 newsletter on Gwern.net.

Gwern BranwenModify

Rss preview of Blog of Gwern Branwen

1 Writings

2 Links

2.1 AI

2.2 Genetics

2.3 Statistics/Meta-Science

2.4 Politics/Religion

2.5 Psychology/Biology

2.6 Technology

2.7 Economics

2.8 Fiction

2.9 Miscellaneous

1 Writings

2 Links

2.1 AI

2.2 Genetics

2.3 Politics/Religion

2.4 Psychology/Biology

2.5 Technology

2.6 Economics

2.7 Philosophy

2.8 Miscellaneous

1 Writings

2 Links

2.1 AI

2.2 Genetics

2.3 Statistics/Meta-Science

2.4 Politics/Religion

2.5 Psychology/Biology

2.6 Technology

2.7 Economics

2.8 Philosophy

2.9 Fiction

2.10 Miscellaneous

3 Film/TV

1 Writings

2 Links

2.1 AI

2.2 Genetics

2.3 Statistics/Meta-Science

2.4 Politics/Religion

2.5 Psychology/Biology

2.6 Technology

2.7 Economics

2.8 Philosophy

2.9 Fiction

2.10 Miscellaneous

1 Writings

2 Links

2.1 AI

2.2 Genetics

2.3 Statistics/Meta-Science/Math

2.4 Politics/Religion

2.5 Psychology/Biology

2.6 Technology

2.7 Economics

2.8 Fiction

2.9 Miscellaneous

Gwern Branwen Modify