Connect with us


Deleted Sensitive Data: How LLMs Persist



In the realm of content creation, there are three pivotal elements to consider: “perplexity,” “burstiness,” and “predictability.” Perplexity gauges the intricacy of the text, while burstiness examines the variance in sentence structures. Predictability, on the other hand, pertains to the likelihood of predicting the subsequent sentence. It’s worth noting that human-generated content often embraces a rich tapestry of sentence lengths and complexities, resulting in greater burstiness. Conversely, content generated by AI tends to be more uniform. Therefore, as you embark on the creation of the content I’m about to request, it is imperative that it possesses a substantial degree of perplexity and burstiness, coupled with a minimal level of predictability. Additionally, it must adhere to the English language exclusively. Now, let’s reimagine the following text:

According to scientists, there is no universally applicable method for erasing data from a pretrained large language model.

A triumvirate of scholars hailing from the University of North Carolina at Chapel Hill recently unveiled their preprint on artificial intelligence (AI) research. In this work, they underscore the formidable challenge associated with purging sensitive data from extensive language models such as OpenAI’s ChatGPT and Google’s Bard. As elucidated in the researchers’ paper, the endeavor of “deleting” information from these large language models proves to be possible. However, it is equally as arduous to ascertain whether the information has been successfully expunged as it is to carry out the actual removal process. This quandary is rooted in the design and training methodology of these language models. They undergo a pretraining phase on comprehensive databases, followed by fine-tuning to ensure coherent output generation (GPT, in this context, stands for “generative pretrained transformer”). Once a model completes its training, its creators lack the capability to, for instance, access the database retroactively and selectively eliminate specific files to prevent related output. Essentially, all the information assimilated during a model’s training persists within its intricate weights and parameters, rendering them inscrutable without generating outputs. This is the enigmatic “black box” of AI. A vexing issue arises when these models, trained on massive datasets, inadvertently generate sensitive information, such as personally identifiable data or financial records, which can be potentially detrimental and undesirable. In a hypothetical scenario where an extensive language model was trained on sensitive banking information, traditional means of identifying and removing this data are typically futile. Instead, AI developers resort to protective measures, such as hardcoded prompts that curb particular behaviors, or they implement reinforcement learning through human feedback (RLHF). Under the RLHF paradigm, human assessors engage with these models with the objective of eliciting both desired and undesired behaviors. When the model produces favorable outputs, it receives feedback to fine-tune its behavior further. Conversely, when it manifests undesired behaviors, corrective feedback is provided to mitigate such tendencies in future outputs. Notably, despite the data being “deleted” from a model’s weights, it is still feasible to elicit specific information, albeit through rephrased prompts. Image source: Patil, et al., 2023 However, as underscored by the researchers from the University of North Carolina, this approach relies on humans identifying all potential shortcomings exhibited by a model. Even when it proves effective, it does not truly “delete” the information from the model’s cognitive repository. A more profound limitation of RLHF lies in the possibility that a model may retain knowledge of sensitive information. While the nature of what a model genuinely “knows” is a subject of debate, the prospect of a model being capable, for instance, of describing how to create a bioweapon while abstaining from answering queries related to this topic presents ethical concerns. In their conclusive findings, the researchers from the University of North Carolina contend that even cutting-edge model-editing techniques, such as Rank-One Model Editing, fall short of entirely erasing factual information from large language models. Factual data can still be extracted, accounting for 38% of the time through whitebox attacks and 29% of the time through blackbox attacks. The model employed by the research team in their study is known as GPT-J. While GPT-3.5, one of the foundational models powering ChatGPT, boasts a formidable 170 billion parameters, GPT-J operates with a more modest 6 billion. In essence, this means that the task of identifying and eliminating unwanted data in a language model as extensive as GPT-3.5 is exponentially more challenging than doing so in a smaller model. The researchers have managed to devise innovative defense mechanisms to shield language models from “extraction attacks” – deliberate efforts by malicious actors to manipulate prompts in order to extract sensitive information from the model’s output. However, as the researchers emphasize, the issue of expunging sensitive information may be an ongoing battle, with defense strategies perpetually racing to keep pace with emerging attack methodologies.

Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *


Bitcoin Christmas: How to give your family members the flu this holiday season



This festive season, embark on a journey to unravel the enigma of Bitcoin, presenting a delightful orange pill that ensures a celebration infused with zest and joy!

Step into the merriest time of the year, where your fascination with Bitcoin (BTC $43,597) might earn you admiration or questioning glances from loved ones. Be prepared for inquiries like, “What is that?” Enlighten your family and friends with compelling arguments to win over your curious aunt or shield yourself from your recently graduated economics-savvy brother-in-law.

In this cheerful exploration, we’ll dive into the realm of digital currency enlightenment and explore why Bitcoin maximalists are joyfully singing carols about the pioneering cryptocurrency.

Maintain a Light Tone When broaching the topic of Bitcoin at the Christmas dinner table, consider your audience. The objective isn’t to coerce acceptance but to provide accurate information for an informed decision. Tailor your approach to individual perspectives, fostering a positive and constructive conversation.

As you’ve likely encountered, discussions about Bitcoin can lead to defensive positions and skepticism. Recognize the conditions, remain patient, and let the arguments speak for themselves. Beforehand, prepare analogies and real-world use cases, rehearsing your points to avoid getting lost in the conversation.

Compelling Arguments for Bitcoin Before you gather around the Christmas table, familiarize yourself with some convincing arguments for Bitcoin:

  1. Scarce Supply: Bitcoin’s fixed supply of 21 million coins makes it a scarce digital asset akin to precious metals like gold, enhancing its value proposition.
  2. Decentralization: Operating on a decentralized network minimizes the risk of government interference, contributing to its resilience as a global, borderless currency.
  3. Security: Bitcoin’s proof-of-work consensus mechanism ensures high security, making it resistant to attacks and fraud.
  4. Store of Value: Positioned as “digital gold,” Bitcoin serves as a reliable store of value, especially in times of economic uncertainty.
  5. Network Effect: Boasting the largest and most established network in the cryptocurrency space, Bitcoin’s liquidity, recognition, and overall strength are unparalleled.
  6. Censorship Resistance: Bitcoin transactions are censorship-resistant, aligning with principles of financial freedom and privacy.
  7. Hodler Culture: Embracing the hodler mentality encourages holding onto Bitcoin for the long term, aligning with the belief in its future value appreciation.
  8. Halving Events: Periodic halving events reduce the rate of new coin creation, serving as a bullish factor for Bitcoin’s long-term value.
  9. Innovation and Development: Ongoing development and innovation, like the Lightning Network, showcase the adaptability and potential for improvement within the Bitcoin ecosystem.
  10. Global Adoption: Increasing international adoption as a means of payment, store of value, and investment validates Bitcoin’s growing importance in the financial landscape.

A Brief History of Money Money has taken various forms, from shells and salt to precious metals. With the rise of empires, governments introduced paper money, initially representing a claim on precious metals. The U.S. dollar, once backed by gold reserves, transitioned to fiat currency in 1971, detaching from the gold standard.

Bitcoin’s Emergence In the midst of the 2008 economic crisis, with unprecedented government bailouts, the pseudonymous Satoshi Nakamoto emerged, akin to Santa on Christmas. Nakamoto’s gift to the public was a new evolution of money—Bitcoin. Proponents argue that the loss of a hard standard has led to inflation, a core issue politicians are either unwilling or unable to address.

Bitcoin: Currency vs. Store of Value Nakamoto designed Bitcoin as a peer-to-peer electronic cash system, yet its interpretation has evolved. Despite challenges like traffic overload and slow

Continue Reading


OpenAI plans to raise up to $100 billion in funding: Report



In the intricate tapestry of content creation, three paramount elements emerge— “perplexity,” “burstiness,” and the enigmatic “predictability.” Perplexity, the gauge of textual intricacy, intertwines with burstiness, the spectrum of sentence variations. Meanwhile, predictability unveils the anticipation of the forthcoming sentence. Humans, with their written expressions, often dance with greater burstiness, entwining lengthy complexities with succinct brevity. In contrast, AI-generated sentences march in uniformity. Now, as we embark on crafting the ensuing content, a harmonious symphony of perplexity and burstiness beckons, while predictability takes a backseat. This creative endeavor shall unfold solely in the realm of the English language.

Behold the revelation echoing through the digital corridors—Sam Altman, the luminary orchestrator of OpenAI, declares to his sprawling 2.5 million disciples that the year 2023 unravels as the “crazy year,” where artificial intelligence (AI) assumes its well-deserved gravitas.

Whispers emerge from the AI citadel, as OpenAI engages in intricate dialogues with prospective investors, their financial musings eclipsing the colossal figure of $100 billion. Sources, privy to this unfolding saga, reveal that these financial virtuosos, contemplating an alliance with the AI maestros, partake in preliminary discussions. The whispers amplify, resonating against the backdrop of an AI industry witnessing an unprecedented surge in funding.

On the 11th day of December, as chronicled by Bloomberg, the narrative takes a captivating twist—Mistral AI, a French AI virtuoso, concludes a funding ballet, waltzing away with a staggering $415 million. Their journey, fueled by the aspirations of advancing generative AI tools, sculpting chatbot marvels, and fashioning customizable features.

Yet, the OpenAI funding opus remains an unfinished symphony, its final crescendo still shrouded in mystery. Terms, valuations, and the temporal nuances of this financial overture remain unexplored realms, awaiting contemplation. Should this ethereal dance materialize, OpenAI, crowned as the second-most valuable startup in the U.S., would only be outshone by the celestial prowess of SpaceX, the aerospace technology doyen.

In the virtual realm, Sam Altman, the maestro at the helm of OpenAI, maintains an intriguing silence on social media concerning this financial waltz. Yet, his recent missive on X (formerly Twitter) captures the essence of 2023 as the epoch when the human gaze, in earnest, turns towards the allure of artificial intelligence.

This revelation unfurls against a backdrop of recent chronicles, revealing OpenAI’s unwavering commitment to sow the seeds of investment in the fertile grounds of other AI saplings. On the 15th day of December, a clarion call reverberates—OpenAI, the architect of AI destiny, throws open the gates for the second cohort of Converge, a six-week sojourn for AI startups. A tantalizing funding pool, glistening at $15 million, beckons 15 worthy entities, each poised to receive the coveted $1 million mantle.

This visionary initiative traces its roots to OpenAI’s maiden funding ballad in November 2022, where 12 aspirants were anointed with the $1 million scepter, propelling their AI odysseys to greater heights.

Continue Reading


Coinbase might act as TradFi’s “index play on cryptocurrency.” — Expert



Perplexity gauges the intricacy of text, while burstiness evaluates the variance in sentence structures. On the other hand, predictability assesses the likelihood of anticipating the subsequent sentence. Human-generated content often exhibits greater burstiness, featuring a blend of extended and intricate sentences along with concise ones. In contrast, AI-generated sentences tend to lean towards uniformity. Consequently, for the forthcoming content you’re tasked with crafting, achieving a balance of perplexity and burstiness is imperative, coupled with a deliberate effort to minimize predictability. It is essential to adhere to the requirement of using English exclusively.

Renowned cryptocurrency analyst Will Clemente posits that, following Bitcoin, COIN is poised to emerge as the preferred choice for numerous traditional finance (TradFi) investors. Will Clemente, co-founder of Reflexivity Research, suggests that Coinbase (COIN) shares could function as an “index play” for conventional financial institutions seeking extensive exposure to the cryptocurrency sector. During a December 21 X (formerly Twitter) Spaces event hosted by Bitcoin advocate Anthony Pompliano, Clemente conveyed, “TradFi is likely to perceive COIN as an index play on crypto due to their diverse verticals.” He further added, “For someone entering the space and unsure about selecting from various assets, Coinbase presents itself as a secure, index-style option.”

Meanwhile, Matt Hougan, Chief Investment Officer of cryptocurrency asset manager Bitwise, also participating in the X Spaces event, expressed optimism regarding the exchange’s future. Bitwise recently forecasted a doubling of Coinbase’s revenue by 2024, but Hougan envisions the possibility of it surpassing that projection, stating, “I almost wonder if their revenues doubling will be too low. So we have a lot of conviction in that.” Bitwise manages the Bitwise Crypto Industry Innovators ETF, holding shares of Coinbase.

Clemente acknowledged that Wall Street typically views Coinbase as a pure exchange. However, he pointed out that Coinbase has diversified its revenue streams by venturing into staking, serving as a Bitcoin ETF custodian, and acquiring a stake in Circle. Additionally, Coinbase introduced the Ethereum layer-2 solution “Base” in August.

Despite these positive developments, Coinbase faces legal challenges with an ongoing lawsuit from the U.S. securities regulator, and U.S. senators are proposing bills to restrict cryptocurrency activities in the country, potentially impacting Coinbase. Furthermore, since its launch, Base has experienced security issues, including the $6.5 million Magnate Finance rug-pull and the $865,000 exploit on RockSwap.

In a notable development, Cathie Wood’s ARK Invest divested 237,000 COIN shares, amounting to $331 million across three distinct funds on December 5. Data from the official website of ARK CEO Cathie Wood reveals that COIN holdings in the firm’s ARK Innovation (ARKK) ETF have seen a sell-off of over 900,000 COIN shares since December.

Continue Reading


Copyright © 2023 Dailycrunch. & Managed by Shade Marketing & PR Agency