🖖 EdgeLLama - An Open Standard for Decentralized AI
We, the GPU poor, have come up with a peer-to-peer network design to enable running Mistral7B and other models which will make AI use more free, both as in beer and as in speech. We believe in e/acc, and we want to make AI abundant. This is the moment in time, when we start taking back control from the few powerful AI companies. Right now, our AI use is a function of expensive monthly subscriptions, rate and usage limits imposed by datacenter-cloud run AI companies. This gives them the power to decide what we can prompt with and how much of AI we even have access to. The immense power they wield also imposes an emotional burden on them, and they are trying to appeal to the government to now impose stifling regulations (a concept called "regulatory capture", see @bgurley's talk). Well, we, a bunch of AI and open network aficionados, want to make their lives' easier and take that power away from them. Think BitTorrent from the early 2000s, when you could make your own computer available and effortlessly share files with each other in an open network. The advent of that technology, which was used by over 100 million people running nodes on their home computers, imposed a forcing function on entertainment business models in general. Better user experiences emerged, providing unlimited access to top-tier content for an insanely low fees. EdgeLLama, and the associated set of products, which lead to a seamless AI consumer-cloud will impose a similar abundant forcing function on AI as well. You can go ahead and cancel your $20/month Pro plans, because this next wave of AI, runs in your laptops, desktops and smartphones, and is good enough. There is an active, growing community of true open AI advocates: releasing foundation models, making them easier to run on regular laptops and curating datasets - we are supplementing that world with the just right peer-to-peer network design and the just right UX along with it. In the earlier post we shared how AI, with community curated datasets leads to better and more credible responses than some of the top closed AI companies. In this post, we outline an early draft of network design and a look at the EdgeLlama node software which you can run on your device - serving inference and vectors to the community. You will simply earn good karma, and nothing more, but you will sleep better at night, knowing that you served AI for some student in Mumbai or reasonable medical advice for someone in Manila..
• What is EdgeLLama ?
It is both an open protocol, and also the name of the software you can install on your computer which will enable you to serve AI for others in either your chosen community (eg, "Stanford" or "astrophysics"), or the wider world, and on the flip side, also benefit from it.
• How is this different from local desktop software already available ?
They are local only. EdgeLLama is a peer-to-peer network. It is going to be necessary for hosting a world of millions of community-finetuned LLMs, running across a shared vector pool (more on that in our next post..).
• How do I run EdgeLlama on my computer ?
It will be available sometime next week. Please join this TG group if you are interested in running a node: https://t.me/edgellama
• Any restrictions on which UIs and models it can work with ?
No, we do not care - our focus here is on UX. We will provide an integration with Collama's web user interface, and also with some recent popular models which can run on your devices (such as Mistral7B, Llama2 and others) - but you should be able to run/modify EdgeLLama as you want. We seek to encourage a diverse set of clients which implement the EdgeLLama protocol.
• Will there be an API ?
Yes. As a developer, you will be able to default to using the EdgeLLama consumer network in co-ordination with your users, instead of paying for GPU in datacenters only.
• Wait - so I can get better results than top AI companies and also get them for free ?
Yes - that is the plan. Using Collama's trusted community dataset approach (think Wikipedia-style millions of finetuned community LLMs) leads to better results, and using EdgeLLama leads to reducing the cost of running AI and provides more freedom.
• Does this use cryptography ?
Yes, we use public key cryptography and other mechanisms for fundamental trusted operation of the system. These are defined in detail in the early draft of the paper below.
• Do you use advanced cryptography like ZKML or FHE ?
We seek to support teams building ZKML sub-networks on EdgeLLama. However, we believe using this needs to be a function of user choice, given the trade-offs in cost and complexity involved. In addition to eventual fully trusted running of models, we anticipate few years from now with advanced cryptography concepts like FHE being able to scale, the security and ability to work on fully encrypted data at scale will only grow and reach viability. However, keep in mind that even though ZK proofs were invented in 1984, they weren't even used for Bitcoin to start with. Sometimes the most exciting technology has the longest arc, and what people want are practical solutions they can start using today.
• Is this decentralized ?
Yes. Over a period of time, a fully permissionless protocol also becomes available for those who choose it.
• Does this use a blockchain ?
No, while this is decentralized, and uses a mechanism to reasonably filter out malicious nodes, it does not use a blockchain.
• Will there be a blockchain protocol ?
Yes, think Bitcoin, but for AI. We expect 99% of the EdgeLLama node providers here will simply run the non-crypto base software, while 1% of the node providers might choose to participate in an optional staking protocol to provide a higher quality of service for block rewards. That 1% will fund the infrastructure and continuous development of the entire network.
• What are you: an AI or a blockchain project ?
We are building the world's largest decentralized AI network which uses 21st century economics to incentivize the servicing of that network, beyond just a feeling of social good. Advent of BitTorrent, Bitcoin, and the economic principles of Ethereum have taught us useful lessons. We view both blockchains and AI as simply data compression engines and replicated state machines, which need to entirely run on regular user devices. The peer-to-peer network here amongst laptops, desktops and smartphones is going to enable a new type of a supercomputer, on which a variety of society advancing technologies will run, some yet to be invented. What we seek to build here is an open movement. These are the earliest, chaotic, raw days - and perhaps the best time to start collaborating together.
----------------------------------------------------------------------------------------------------------------------
📷 EdgeLLama: An Open Standard For Distributed Inference In Federated Networks
Here is a look at an early snippet of the EdgeLlama paper. The phase 1 of this decentralized AI network uses community servers for co-ordination between nodes and phase 2 will provision a fully peer-to-peer, permissionless protocol.
1 - Introduction
EdgeLLama is an open standard designed to allow any supported machine to serve as an inference provider. It's a stateful application layer communication protocol that offers HTTP and WebSocket APIs to securely transmit inference requests in JSON format across two primary layers: the CommunityServer-Edge Communication Layer and the CommunityServer-CommunityServer Communication Layer.
2 - CommunityServer System Model
2.1 Community Cluster
The model presupposes an asynchronous distributed system. Within this system, nodes are interconnected through a network. It's acknowledged that this network might exhibit certain inefficiencies: it might not deliver messages, could delay, duplicate, or even rearrange them. To safeguard the integrity of message transmission, we employ cryptographic measures. These measures include the integration of public-key signatures, message authentication codes, and digests generated by collision-resistant hash functions. It is of utmost importance to ascertain that a received message genuinely emanates from a specified node, ensuring it hasn't been tampered with or fabricated by malevolent entities. For achieving this on a Community Server, there are two potential implementations: stateful and stateless. Under a stateful approach, an identity session is sustained for each EdgeLlama node on the Community Server. This implies the presence of data that mandates replication amongst Community Servers. This in turn necessitates either state machine replication amongst Community Servers or an auxiliary protocol facilitating communication, enabling Community Servers to mutually affirm the veracity of an authentication. Considering a scenario with multiple Community Servers: CS 1, CS 2,... CS n. Should a novel node affiliate with a Community Server, and the identity of this node remains ambiguous to the server, it becomes essential for the Community Server to cross-verify with its peers. This is to determine if any of the other Community Server can vouch for the node's identity. This cross-verification is based on the authentication payload transmitted by the EdgeLlama node. Combining digital signatures with message digests allows nodes to sign the digest of a message so that other nodes can verify both the integrity of the message and its origin without having to maintain any session key that requires a state entry in the Community Server. This basically means that an EdgeLlama node is identified by its public key throughout the network and due to the cryptographical primitives mentioned, it can sign and prove its authenticity throughout the network.
2.2 Model Registry
Upon successfully establishing a connection to a Community Server, an EdgeLlama node communicates its capabilities. This encompasses details like the types of models it can execute (HuggingFace ID), the model's revision, quantization, CUDA support, and other pertinent characteristics. Given this information, the Community Server categorizes the EdgeLlama node, subscribing it to the relevant PubSub topic correlating with the model it operates on. The fortification check is initiated once there have been 'n' successful inferences executed for an identical model. Subsequent to this, the Community Server focuses on maintaining the caliber of nodes associated with that topic. The Community Server prompts the EdgeLlama node with a number alpha of Inference Calls, chosen at random from the preceding 'n' successful inferences. As a reference, the Community Server retains the digests of the outputs from these inferences. Adhering to the protocol standard, the EdgeLlama node, given its subscription to the model's topic, executes the prescribed inference. Upon completion, a comparison is made between the digest of this recent inference and the stored digests within the Community Server. If there's a match, the Community Server is deemed to have successfully passed the fortification check.
2.3 Inference Event
The process of facilitating an inference is not merely a direct function call but a more orchestrated communication between the Community Server and the various EdgeLlama nodes. When the Community Server receives an inbound request for inference, it primarily consists of two main components: • Prompt: A string that delineates the information or data upon which the inference is to be executed. • Model: A ModelConfig object that encapsulates the details of the model to be employed for the inference. The ModelConfig is expected to have specifics like the model's identifier, version, etc. Upon receipt of the inference request, the Community Server's immediate task is to identify the set of EdgeLlama nodes that are capable of fulfilling the request. This is achieved by: • Referencing the ModelConfig object within the request to identify the desired model. • Consulting its internal registry to retrieve a list of EdgeLlama nodes that are subscribed to the identified model topic. With the list of appropriate EdgeLlama nodes at its disposal, the Community Server commences the process of broadcasting the inference request. But before that, a "seed" and a "UUID" for that specific inference request is generated by the Community Server. The Community Server appends the seed and the ID to the inference request and the inference request is disseminated in parallel to a fanout of the identified EdgeLlama nodes. This ensures that multiple nodes have the opportunity to process the request, catering to redundancy and increasing the chances of a prompt response. Each EdgeLlama node, upon receiving the broadcasted request, will initiate the necessary steps to run the inference based on the provided prompt and its local instance of the model specified in the ModelConfig. Given the deterministic execution nature of models in this specific environment, an identical outcome is expected when the same seed and model parameters are provided. The Community Server leverages this characteristic to assure the precision and uniformity of inferences.
2.4 Inference Aggregation
Inference aggregation stands as a cardinal stage in the Community Server's inference process, especially in contexts endorsing distributed inference tasks spread across numerous EdgeLlama nodes. The Community Server, by juxtaposing responses from diverse nodes, can isolate the most recurrent response, thereby amplifying the final outcome's accuracy. When certain EdgeLlama nodes encounter latency, discrepancies, or faults in their output, the strategy of dispersing the inference task across multiple nodes furnishes the Community Server with redundancy and alternative response routes. The employed aggregation methodology pivots around a simple majority vote paired with feedback loops. Viewing an Inference Resolution digest as a vote, the Community Server Protocol calculates a trustworthiness score for each unique response digest. Under standard conditions, a singular distinct response digest is anticipated from the EdgeLlama nodes. This trust score is derived using a weighted voting approach. Here, responses from particular EdgeLlama nodes, which have been historically reliable or have met specific criteria, carry added significance. The preferred response is the one that accumulates the maximum aggregated weight. The implementation of a feedback loop is critical. Should specific EdgeLlama nodes recurrently yield aberrant or erroneous responses, the Community Server can activate a feedback mechanism. This may lead to recalibrations of the node's credibility score or potential flagging of the node for scrutiny. Subsequently, the Community Server compiles the ultimate result, which could encompass: • Amalgamated inference outcome. • Accompanying metadata: this might involve details such as the count of participating nodes, timestamps, signatures, and digests. For diagnostic purposes or ensuring transparency, the distinct outputs from the EdgeLlama nodes — inclusive of their signatures — may also be annexed, offering a comprehensive perspective on the aggregation phase...
----------------------------------------------------------------------------------------------------------------------
📷 Demo See the quick video for a client implementation of EdgeLLama coming out next week. These are the nodes which would provide the foundation of a modern day consumer-run cloud. Please join this TG group if you are interested in running a node: https://t.me/edgellama If you want this vision to happen, at-scale, this is the moment in time we work together and we just might have a chance.
Remember, AI is the new coinage, and we need to control the means of production ourselves'..
It's time to make a dent in the universe.
https://reddit.com/link/17l0k3g/video/l4qn88iuymxb1/player