Experts warn about the ‘crumbling infrastructure’ of federal government data


Article by Hansi Lo Wang: “The stability of the federal government’s system for producing statistics, which the U.S. relies on to understand its population and economy, is under threat because of budget concerns, officials and data users warn.

And that’s before any follow-through on the new Trump administration and Republican lawmakers‘ pledges to slash government spending, which could further affect data production.

In recent months, budget shortfalls and the restrictions of short-term funding have led to the end of some datasets by the Bureau of Economic Analysis, known for its tracking of the gross domestic product, and to proposals by the Bureau of Labor Statistics to reduce the number of participants surveyed to produce the monthly jobs report. A “lack of multiyear funding” has also hurt efforts to modernize the software and other technology the BLS needs to put out its data properly, concluded a report by an expert panel tasked with examining multiple botched data releases last year.

Long-term funding questions are also dogging the Census Bureau, which carries out many of the federal government’s surveys and is preparing for the 2030 head count that’s set to be used to redistribute political representation and trillions in public funding across the country. Some census watchers are concerned budget issues may force the bureau to cancel some of its field tests for the upcoming tally, as it did with 2020 census tests for improving the counts in Spanish-speaking communities, rural areas and on Indigenous reservations.

While the statistical agencies have not been named specifically, some advocates are worried that calls to reduce the federal government’s workforce by President Trump and the new Republican-controlled Congress could put the integrity of the country’s data at greater risk…(More)”

Path to Public Innovation Playboo


Playbook by Bloomberg Center for Public Innovation: “…a practical, example-rich guide for city leaders at any stage of their innovation journey. Crucially, the playbook offers learnings from the past 10-plus years of government innovation that can help municipalities take existing efforts to the next level…

Innovation has always started with defining major challenges in cooperation with residents. But in recent years, cities have increasingly tried to go further by working to unite every local actor around transformational changes that will be felt for generations. What they’re finding is that by establishing a North Star for action—the playbook calls them Ambitious Impactful Missions (AIMs)—they’re achieving better outcomes. And the playbook shows them how to find that North Star.

“If you limit yourself to thinking about a single priority, that can lead to a focus on just the things right in front of you,” explains Amanda Daflos, executive director of the Bloomberg Center for Public Innovation and the former Chief Innovation Officer and director of the i-team in Los Angeles. In contrast, she says, a more ambitious, mission-style approach recognizes that “the whole city has to work on this.”

For instance, in Reykjavik, Iceland, local leaders are determined to improve outcomes for children. But rather than limiting the scope or scale of their efforts to one slice of that pursuit, they thought bigger, tapping a wide array of actors from the Department of Education to the Department of Welfare to pursue a vision called “A Better City for Children.” At its core, this effort is about delivering a massive array of new and improved services for kids and ensuring those services are not interrupted at any point in a young person’s life. Specific interventions range from at-home student counseling, to courses on improving communication within households, to strategy sessions for parents whose children have anxiety. 

More noteworthy than the individual solutions is that this ambitious effort has shown signs of activating the kind of broad coalition needed to make long-term change. In fact, the larger vision started under then-Mayor Dagur Eggertsson, has been maintained by his successor, Mayor Ein­ar Þor­steinsson, and has recently shown signs of expansion. The playbook provides mayors with a framework for developing their own blueprints for big change…(More)”.

Data Stewardship as Environmental Stewardship


Article by Stefaan Verhulst and Sara Marcucci: “Why responsible data stewardship could help address today’s pressing environmental challenges resulting from artificial intelligence and other data-related technologies…

Even as the world grows increasingly reliant on data and artificial intelligence, concern over the environmental impact of data-related activities is increasing. Solutions remain elusive. The rise of generative AI, which rests on a foundation of massive data sets and computational power, risks exacerbating the problem.

In the below, we propose that responsible data stewardship offers a potential pathway to reducing the environmental footprint of data activities. By promoting practices such as data reuse, minimizing digital waste, and optimizing storage efficiency, data stewardship can help mitigate environmental harm. Additionally, data stewardship supports broader environmental objectives by facilitating better decision-making through transparent, accessible, and shared data. In the below, we suggest that advancing data stewardship as a cornerstone of environmental responsibility could provide a compelling approach to addressing the dual challenges of advancing digital technologies while safeguarding the environment…(More)”

Data Governance Meets the EU AI Act


Article by Axel Schwanke: “..The EU AI Act emphasizes sustainable AI through robust data governance, promoting principles like data minimization, purpose limitation, and data quality to ensure responsible data collection and processing. It mandates measures such as data protection impact assessments and retention policies. Article 10 underscores the importance of effective data management in fostering ethical and sustainable AI development…This article states that high-risk AI systems must be developed using high-quality data sets for training, validation, and testing. These data sets should be managed properly, considering factors like data collection processes, data preparation, potential biases, and data gaps. The data sets should be relevant, representative, error-free, and complete as much as possible. They should also consider the specific context in which the AI system will be used. In some cases, providers may process special categories of personal data to detect and correct biases, but they must follow strict conditions to protect individuals’ rights and freedoms…

However, achieving compliance presents several significant challenges:

  • Ensuring Dataset Quality and Relevance: Organizations must establish robust data and AI platforms to prepare and manage datasets that are error-free, representative, and contextually relevant for their intended use cases. This requires rigorous data preparation and validation processes.
  • Bias and Contextual Sensitivity: Continuous monitoring for biases in data is critical. Organizations must implement corrective actions to address gaps while ensuring compliance with privacy regulations, especially when processing personal data to detect and reduce bias.
  • End-to-End Traceability: A comprehensive data governance framework is essential to track and document data flow from its origin to its final use in AI models. This ensures transparency, accountability, and compliance with regulatory requirements.
  • Evolving Data Requirements: Dynamic applications and changing schemas, particularly in industries like real estate, necessitate ongoing updates to data preparation processes to maintain relevance and accuracy.
  • Secure Data Processing: Compliance demands strict adherence to secure processing practices for personal data, ensuring privacy and security while enabling bias detection and mitigation.

Example: Real Estate Data
Immowelt’s real estate price map, awarded as the top performer in a 2022 test of real estate price maps, exemplifies the challenges of achieving high-quality datasets. The prepared data powers numerous services and applications, including data analysis, price predictions, personalization, recommendations, and market research…(More)”

Building Safer and Interoperable AI Systems


Essay by Vint Cerf: “While I am no expert on artificial intelligence (AI), I have some experience with the concept of agents. Thirty-five years ago, my colleague, Robert Kahn, and I explored the idea of knowledge robots (“knowbots” for short) in the context of digital libraries. In principle, a knowbot was a mobile piece of code that could move around the Internet, landing at servers, where they could execute tasks on behalf of users. The concept is mostly related to finding information and processing it on behalf of a user. We imagined that the knowbot code would land at a serving “knowbot hotel” where it would be given access to content and computing capability. The knowbots would be able to clone themselves to execute their objectives in parallel and would return to their origins bearing the results of their work. Modest prototypes were built in the pre-Web era.

In today’s world, artificially intelligent agents are now contemplated that can interact with each other and with information sources found on the Internet. For this to work, it’s my conjecture that a syntax and semantics will need to be developed and perhaps standardized to facilitate inter-agent interaction, agreements, and commitments for work to be performed, as well as a means for conveying results in reliable and unambiguous ways. A primary question for all such concepts starts with “What could possibly go wrong?”

In the context of AI applications and agents, work is underway to answer that question. I recently found one answer to that in the MLCommons AI Safety Working Group and its tool, AILuminate. My coarse sense of this is that AILuminate posts a large and widely varying collection of prompts—not unlike the notion of testing software by fuzzing—looking for inappropriate responses. Large language models (LLMs) can be tested and graded (that’s the hard part) on responses to a wide range of prompts. Some kind of overall safety metric might be established to connect one LLM to another. One might imagine query collections oriented toward exposing particular contextual weaknesses in LLMs. If these ideas prove useful, one could even imagine using them in testing services such as those at Underwriters Laboratory, now called UL Solutions. UL Solutions already offers software testing among its many other services.

LLMs as agents seem naturally attractive…(More)”.

Why Digital Public Goods, including AI, Should Depend on Open Data


Article by Cable Green: “Acknowledging that some data should not be shared (for moral, ethical and/or privacy reasons) and some cannot be shared (for legal or other reasons), Creative Commons (CC) thinks there is value in incentivizing the creation, sharing, and use of open data to advance knowledge production. As open communities continue to imagine, design, and build digital public goods and public infrastructure services for education, science, and culture, these goods and services – whenever possible and appropriate – should produce, share, and/or build upon open data.

Open Data and Digital Public Goods (DPGs)

CC is a member of the Digital Public Goods Alliance (DPGA) and CC’s legal tools have been recognized as digital public goods (DPGs). DPGs are “open-source software, open standards, open data, open AI systems, and open content collections that adhere to privacy and other applicable best practices, do no harm, and are of high relevance for attainment of the United Nations 2030 Sustainable Development Goals (SDGs).” If we want to solve the world’s greatest challenges, governments and other funders will need to invest in, develop, openly license, share, and use DPGs.

Open data is important to DPGs because data is a key driver of economic vitality with demonstrated potential to serve the public good. In the public sector, data informs policy making and public services delivery by helping to channel scarce resources to those most in need; providing the means to hold governments accountable and foster social innovation. In short, data has the potential to improve people’s lives. When data is closed or otherwise unavailable, the public does not accrue these benefits.CC was recently part of a DPGA sub-committee working to preserve the integrity of open data as part of the DPG Standard. This important update to the DPG Standard was introduced to ensure only open datasets and content collections with open licenses are eligible for recognition as DPGs. This new requirement means open data sets and content collections must meet the following criteria to be recognised as a digital public good.

  1. Comprehensive Open Licensing:
    1. The entire data set/content collection must be under an acceptable open licence. Mixed-licensed collections will no longer be accepted.
  2. Accessible and Discoverable:
    1. All data sets and content collection DPGs must be openly licensed and easily accessible from a distinct, single location, such as a unique URL.
  3. Permitted Access Restrictions:
    1. Certain access restrictions – such as logins, registrations, API keys, and throttling – are permitted as long as they do not discriminate against users or restrict usage based on geography or any other factors…(More)”.

Leveraging Crowd Intelligence to Enhance Fairness and Accuracy in AI-powered Recruitment Decisions


Paper by Zhen-Song Chen and Zheng Ma: “Ensuring fair and accurate hiring outcomes is critical for both job seekers’ economic opportunities and organizational development. This study addresses the challenge of mitigating biases in AI-powered resume screening systems by leveraging crowd intelligence, thereby enhancing problem-solving efficiency and decision-making quality. We propose a novel counterfactual resume-annotation method based on a causal model to capture and correct biases from human resource (HR) representatives, providing robust ground truth data for supervised machine learning. The proposed model integrates multiple language embedding models and diverse HR-labeled data to train a cohort of resume-screening agents. By training 60 such agents with different models and data, we harness their crowd intelligence to optimize for three objectives: accuracy, fairness, and a balance of both. Furthermore, we develop a binary bias-detection model to visualize and analyze gender bias in both human and machine outputs. The results suggest that harnessing crowd intelligence using both accuracy and fairness objectives helps AI systems robustly output accurate and fair results. By contrast, a sole focus on accuracy may lead to severe fairness degradation, while, conversely, a sole focus on fairness leads to a relatively minor loss of accuracy. Our findings underscore the importance of balancing accuracy and fairness in AI-powered resume-screening systems to ensure equitable hiring outcomes and foster inclusive organizational development…(More)”

Mindmasters: The Data-Driven Science of Predicting and Changing Human Behavior


Book by Sandra Matz: “There are more pieces of digital data than there are stars in the universe. This data helps us monitor our planet, decipher our genetic code, and take a deep dive into our psychology.

As algorithms become increasingly adept at accessing the human mind, they also become more and more powerful at controlling it, enticing us to buy a certain product or vote for a certain political candidate. Some of us say this technological trend is no big deal. Others consider it one of the greatest threats to humanity. But what if the truth is more nuanced and mind-bending than that?

In Mindmasters, Columbia Business School professor Sandra Matz reveals in fascinating detail how big data offers insights into the most intimate aspects of our psyches and how these insights empower an external influence over the choices we make. This can be creepy, manipulative, and downright harmful, with scandals like that of British consulting firm Cambridge Analytica being merely the tip of the iceberg. Yet big data also holds enormous potential to help us live healthier, happier lives—for example, by improving our mental health, encouraging better financial decisions, or enabling us to break out of our echo chambers..(More)”.

The Attention Crisis Is Just a Distraction


Essay by Daniel Immerwahr: “…If every video is a starburst of expression, an extended TikTok session is fireworks in your face for hours. That can’t be healthy, can it? In 2010, the technology writer Nicholas Carr presciently raised this concern in “The Shallows: What the Internet Is Doing to Our Brains,” a Pulitzer Prize finalist. “What the Net seems to be doing,” Carr wrote, “is chipping away my capacity for concentration and contemplation.” He recounted his increased difficulty reading longer works. He wrote of a highly accomplished philosophy student—indeed, a Rhodes Scholar—who didn’t read books at all but gleaned what he could from Google. That student, Carr ominously asserted, “seems more the rule than the exception.”

Carr set off an avalanche. Much read works about our ruined attention include Nir Eyal’s “Indistractable,” Johann Hari’s “Stolen Focus,” Cal Newport’s “Deep Work,” and Jenny Odell’s “How to Do Nothing.” Carr himself has a new book, “Superbloom,” about not only distraction but all the psychological harms of the Internet. We’ve suffered a “fragmentation of consciousness,” Carr writes, our world having been “rendered incomprehensible by information.”

Read one of these books and you’re unnerved. But read two more and the skeptical imp within you awakens. Haven’t critics freaked out about the brain-scrambling power of everything from pianofortes to brightly colored posters? Isn’t there, in fact, a long section in Plato’s Phaedrus in which Socrates argues that writing will wreck people’s memories?…(More)”.

What’s the Goal of the Goal?


Chapter by Dan Heath: “…Achieving clarity on the way forward is not an incremental victory. It is transformative. It can mean the difference between stuck and unstuck.

A group of federal government leaders experienced this transformation several years ago when they rethought the goal of a program that served people with disabilities, including veterans. Some context: Anyone with a “total permanent disability” can, by law, have their federal student loans discharged. But thousands of veterans didn’t take advantage of the program. This was a disappointment to many government leaders, whose goal was simple: Make it easy for veterans to apply for the benefits they deserve.

What was holding back participation in the program? To some extent it was knowledge: Many simply didn’t realize they were eligible for forgiveness. Others got derailed by the cumbersome application process.

The stakes were high: Some of these borrowers were actually in default—potentially having their social-security-disability payments garnished to make loan payments. The government was reaching into their pockets to claim money for loans that they shouldn’t have owed!

So what could be done? In 2016, a team at the Department of Education thought: Rather than make the borrowers responsible for discovering this benefit, let’s proactively tell them about it!

They hatched a plan that led them to compare the databases at several agencies, including the Department of Education and the Department of Veteran Affairs (VA). The Department of Education database could tell you: Who has student loans? The VA database could tell you: Which veterans are permanently disabled? Anyone who matched both databases was eligible for a loan discharge…(More)”