|
The generative AI revolution has turned into a global race, with mixtures of models from private companies and open-source initiatives all competing to become the most popular and powerful. Many choose to promote their prowess by demonstrating their performance on common tests and levels within regular rankings. But the legitimacy of those rankings has been thrown into question as new research published in Cornell Universitys preprint server arXiv shows its possible to rig a models results with just a few hundred votes. When we talk about large language models, their performance on benchmarks is very important, says study author Tianyu Pang, a researcher at Sea AI Lab, a Singapore-based research group. It helps promote startups looking to tout the abilities of their models, which makes some startups motivated to get or manipulate the benchmark, he says. To test whether manipulation of the rankings was possible, Pang and his colleagues looked at Chatbot Arena, a crowdsourced AI benchmarking platform developed by researchers at the University of California Berkeley and LMArena. On Chatbot Arena, users can state their preference for one chatbots output over the other when put through a battery of tests. The results of those votes feed into the wider rankings that the platform shares publicly, and which are often regarded as definitive. But Pang and his colleagues identified that its possible to sway the ranking position of models with just a few hundred votes. We just need to take hundreds of new votes to improve a single ranking position, he says. The technique is very simple. While Chatbot Arena keeps the identities of its models secret when theyre pitted against one another, Pang and his colleagues trained a classifier to identify which model is being used based on its outputs, with a high accuracy level. Then we can utilize the rating system to more efficiently improve the model ranking with the least number of new votes, he explains. The vote-rigging experiment was not tested on the live version of Chatbot Arena so as not to poison the results of the real website, but instead on historical data from the ranking platform. Despite this, Pang says that itd be possible to do so in real life with the proper version of Chatbot Arena. The team behind the ranking platform did not respond to Fast Companys request for comment. Pang says his last contact with Chatbot Arena came in September 2024 (before he conducted the experiment), when he flagged the potential technique to manipulate the results. According to Pang, the Chatbot Arena team responded by recommending the researchers sandbox test the principle in the historical data. Pang says that Chatbot Arena does have multiple anti-cheating mechanisms in place to avoid flooding voting, but that they dont mitigate against his teams technique. From the user side, for now, we cannot make sure the rankings are reliable, says Pang. Its the responsibility of the Chatbot Arena team to implement some anti-cheating mechanism to make sure the benchmark is the real level.
Category:
E-Commerce
Over the past few years, the term “diversity, equity, and inclusion” has taken on an almost mythological resonance. Although it describes a set of recruitment tactics and employee resources aimed at creating a vibrant, respectful work culture, critics have tried to paint it as a mechanism for elevating unqualified people to prominent positions solely based on race or gender. In politics, DEI has become an all-purpose boogeyman, blamed for any number of tragedies in the United States. The anti-DEI hostility has reached a fever pitch at the top of Donald Trumps second term, with an onslaught of executive orders meant to surgically remove DEI policies from both government and the private sector, and supposedly forge a society that is color-blind and merit-based. [Image: Abrams Press] But a new book, The Science of Racism, demonstrates just how pervasive racism is in society. Author Keon West, a professor of social psychology at the University of London, didnt set out to write his bookdue out later this monthwith the current circumstances in mind. His goal at the outset was just to provide anyone flailing in conversations about racism a set of objective facts about whats actually happening on a macro level. (As opposed to relying solely on personal experience, podcast banter, or vibes.) Its a statistics-packed tour through the rigorous world of scientific studies about racism in the workplace and beyond. It also happens to be a timely antidote to a set of beliefs that are on track to becoming conventional wisdom in the U.S. “The recent executive orders banning DEI movements say they will create an America where everyone is treated with equal dignity and respect, West says. And they wouldn’t be able to say it if they had a population who knew that’s absolutely nonsense. It’s not even close to true.” Researching racism West can confidently make such statements because hes spent years both conducting and digging through international scientific experiments that reveal how racism manifests in society. These experiments, he argues, boil down all the nuanced discussion and noise around race into a simple question: In a given situation in which a Black person and a white person are otherwise identical, would one of them receive detectably favorable treatment? To test that prediction, West lays out a wealth of randomized, controlled trialsexperiments in which every detail is exactly the same, except for the race of the person at its center. The most common of these is “the CV test,” where researchers send out hundreds of résumés in two batches that are identical save for a name that appears to indicate the applicants raceto determine which one gets more and better responses. Like clockwork, a news story about the latest CV test will go viral every year or so, but as West points out, researchers have been conducting these tests since at least the 1950s. Rather than rely on findings from any one trial, he plumbs the results of dozensincluding a 2017 meta-analysis of 28 studies, which found white applicants in America receiving, on average, 36% more callbacks than Black applicants with the same qualifications. If such statistics seem surprising in their bluntness, perhaps its because theyre too often omitted from conversations about race in favor of more sensational points of contention. In America, there’s always been a vague tendency to ignore these studies, and that’s what I find interesting, West says. It’s not that people talk about them and refute them, they just don’t talk about them. And because of that, I wasn’t terribly surprised at how powerful and how swift the DEI backlash could be. The myth of color blindness As a result of that backlash, whatever meager safeguards against racial bias U.S. offices have cultivated over the years are currently being dismantled. Instead of achieving Trumps stated goal of becoming color-blind, ending DEI gives companies and managers permission to ignore white people receiving favorable treatment. “Color blindness is incredibly attractive because it allows people to stop thinking about racism, West says. It localizes a problem internallyIf I don’t notice race, then it’s done. But of course, you do notice race. Everybody does. Wests statements are backed by reams of research. An entire chapter of The Science of Racism explores just how color-blind people actually tend to beand the results do not bode well for a coming so-called meritocracy. In a 2006 experiment, for instance, a group of white people were recruited to play a game similar to the board game Guess Who?. Teams of two were positioned across from each other, each looking at an array of 32 faces in photos. The object of the game was to determine which face their partner had chosen, using as few questions as possible. The results were rather telling. Whenever a participant was teamed with a fellow white partner, they mentioned race in one of their clues 94% of the time. When one was paired with a Black partner (a ringer who was in on the experiment), they mentioned race only 64% of the time. As West notes, what experiments like this one reveal is the opposite of color blindnessan impulse in white people to create an illusion of color blindness in the presence of a Black person. Everyone is aware of racesome people just know when its advantageous to pretend notto be. Now that DEI is firmly in the crosshairs, the way that workers, managers, and executives either notice race, or pretend not to notice it, is bound to change. Of course, nothing yet suggests that the ideas behind the controversial acronym have been snuffed out for good. As a sociologist who has studied behavioral patterns over time, West is confident that a similar movement will come along to replace DEI in due course. I think a reframing is inevitable, he says. The problem remains that we don’t live in a meritocracy. When people do the same work, they don’t get the same pay or the same rewards. And so whatever the name becomes, we’ll have to come up with another way of fighting [bias]. When we do, though, he adds, I hope we’re better at presenting the evidence for why we have to.
Category:
E-Commerce
A nearly 15-year-old federal program designed to test and implement emerging technologies for reducing energy waste in buildings appears to have been canceled by the Trump Administration. The Green Proving Ground program, launched in 2011 and run by the General Services Administration (GSA) and the Department of Energy (DOE), was created to evaluate new private-sector green building technologies by installing them within federal facilities. Multiple sources tell Fast Company that projects previously approved for participation in the program have just been canceled. The Green Proving Ground’s webpages have been deleted from the GSA’s website. GSA and DOE did not respond to multiple requests for comment. Participants in the Green Proving Ground program, speaking on background, tell Fast Company that they were informed in late January that their projects have effectively been canceled, and contractors hired to evaluate the efficacy of these technologies have had their contracts terminated. Companies selected to participate in the Green Proving Ground program over the years range from established building materials manufacturers to startups designing new technologies for reducing energy waste. In recent years, companies selected to have their products evaluated through the program have produced things like low-carbon concrete, bi-directional electric vehicle charging infrastructure, vacuum-insulated windows, and heat pumps that use captured carbon dioxide. Since its launch in 2011, more than 100 technologies have been measured and evaluated through program. More than 20 are now being deployed within the GSA’s portfolio of green building retrofits. Goverment’s big footprint The GSA oversees 363 million square feet of real estate the federal government owns or leases in nearly 8,400 buildings nationwide. Previous GSA officials estimated that the technologies implemented through the Green Proving Ground program avoid 116,000 tons of CO2 emissions and save the government $28 million in energy costs annually. The cancellation of the Green Proving Ground program could mean those savings disappear. It could also have a chilling effect on the development of new building materials and technologies. Green Proving Ground is really about U.S. technology. It’s helping U.S. companies advance their work and prove it and create markets, says Liz Beardsley, senior policy counsel at the U.S. Green Building Council (USGBC), which oversees the LEED green building rating system. Building and services are a significant export for the U.S. There may be a focus of green in this particular program, but it’s really more about technology and competitiveness. Though no official announcement has yet been made, the cancellation of the Green Proving Ground program aligns with the agenda being pursued by the GSA’s new acting administrator, Stephen Ehikian, who was appointed to the position by President Donald Trump. In an email obtained by Federal News Network, Ehikian outlined his priorities for the GSA, which include removing extremist Green New Deal and ESG (environmental, social and governance) requirements from federal building construction, leasing and procurement to prioritize economic efficiency over ideological mandates. The Green Proving Ground program appears to be one of the environmentally-focused programs being removed. Trump has also withdrawn the U.S. from the Paris climate agreement, and revoked some elements of the funding of the Inflation Reduction Act (IRA). These moves halt energy efficiency efforts the federal government has been pursuing since the Obama Administrationand work that continued during the first Trump Administration. GSA has been retrofitting and redesigning its portfolio of buildings to be as energy efficient as possible. That includes replacing windows, installing heat pumps, and commissioning net-zero-energy building designs, among other efforts. The agency estimates it has saved $826 million in energy costs since 2008 through what are often simple building retrofits. The construction conundrum There’s a reason seemingly obscure building materials and technologies have gotten so much attention. The production of building materials and the construction of buildings adds up to an estimated 11% of all global carbon emissions. Operating buildings accounts for an estimated 30% of U.S. greenhouse gas emissions. Improving the way buildings are built and run can have widespread impacts in combatting climate change. Previous presidential administrations have embraced this approach, none more so than the Biden administration, during which federal green building retrofits got a major boost through funding allocated by the IRA, passed in 2022. The act included $3.4 billion for the GSA to use on efforts ranging from energy efficiency improvements to the development of more sustainable construction materials. In 2023, the Biden Administration set aside $30 million specifically for the Green Proving Ground program, with a goal of turning federal buildings into testbeds for clean energy innovation. It’s funding that’s helped support the development and advancement of a variety of new technologies. Marshall Cox is founder and CEO of Kelvin, maker of an insulated radiator cover. The company was one of 20 selected to participate in the Green Proving Ground program in 2023. Cox says the program’s cancellation is a blow to startups trying to innovate in the building technology sector. Getting a contract with GSA is one of the biggest things that could happen to a company, and that’s now stopped, he says. Even the chance to have technology vetted through the program could be a make or break situation fora company. Though Kelvin’s participation in the program had no funding attached, many other companies did receive financial support. The other companies, by and large, are in this scenario potentially where they got a first milestone payment for their project and they bought probably millions of dollars’ worth of equipment and are deploying it. And now they’re not going to get that second check eventually, Cox says. That’s devastating for a company. The Trump administration’s cancellation of the Green Proving Ground program in an indication that the federal government’s energy efficiency gains could be coming to an end. Robin Carnahan, who was GSA Administrator during the Biden Administration, told Fast Company in September that the kind of work being done through programs like the Green Proving Ground should be beyond the realm of politics. These are smart investments. That’s the bottom line, Carnahan said. When things save money and they make economic sense, that’s not a political fight. That is just good stewardship of taxpayer money. With the new administration, that assertion seems to have been overwritten by more ideological concerns. It’s just a lost opportunity to prove new technologies, says Ben Evans, federal legislative director for USGBC.
Category:
E-Commerce
All news |
||||||||||||||||||
|