Wednesday, September 11, 2019

Unitimes AMA | Danger in Blockchain, Data Protection is Necessary

https://i.redd.it/22zrdwgeg3m31.jpg

At 10:30 on September 12, Unitimes held the 40th online AMA about blockchain technologies and applications. We were glad to have Joanes Espanol , CEO and CTO of Amberdata, to share with us on ‘’Danger in Blockchain, Data Protection is Necessary‘’ . The AMA is composed of two parts : Fixed Q&A and Free Q&A. Check out the details below!

Fixed Q&A

  1. Please introduce yourself and Amberdata

Hi everybody, my name is Joanes Espanol and I am co-founder and CTO of Amberdata. Prior to founding Amberdata, I have worked on several large scale ingestion pipelines, distributed systems and analytics platforms, with a focus on infrastructure automation and highly available systems. I am passionate about information retrieval and extracting meaning from data.

Amberdata is a blockchain and digital asset company which combines validated blockchain and market data from the top crypto exchanges into a unified platform and API, enabling customers to operate with confidence and build real-time data-powered applications. 

  1. What type of data does the API provide?

The advantage and uniqueness of Amberdata’s API is the combination of blockchain and pricing data together in one API call.

We provide a standardized way to access blockchain data (blocks, transactions, account information, etc) across different blockchain models like UTXO (Bitcoin, Litecoin, Dash, Zcash...) and Account Based (Ethereum...), with contextualized pricing data from the top crypto exchanges in one API call.  If you want to build applications on top of different blockchains, you would have to learn the intricacies of each distributed ledgers, run multiple nodes, aggregate the data, etc - instead of spending all that time and money, you can start immediately by using the APIs that we provide.

What can you get access to? Accounts, account-balances, blocks, contracts, internal messages, logs and events, pending transactions, security audits, source code, tokens, token balances, token transfers, token supplies (circulating & total supplies), transactions as well as prices, order books, trades, tickers and best bid and offers for about 2,000 different assets.

One important thing to note is that most of the APIs return validated data that anybody can verify by themselves.  Blockchain is all about trust - operating in a hostile and trustless environment, maintaining consensus while continuously under attack, etc - and we want to make sure that we maintain that level of trust, so the API returns all the information that you would need to recalculate Merkle proofs yourself, hence guaranteeing the data was not tampered with and is authentique.

  1. Why is it important to combine blockchain and market data?

Cryptoeconomics plays a key role in the blockchain world.  One simple way to explain this is to look at why peer-to-peer file sharing systems like BitTorrent failed. These file sharing protocols were an early form of decentralization, with each node contributing to and participating in this “global sharing computer”.  The issue with these protocols is that they relied on the good will of each participant to (re-)share their files - but without economic incentive, or punishment for not following the rules, it opened the door to bad behavior which ultimately led to its demise.

The genius of Satoshi Nakamoto was to combine and improve upon existing decentralized protocols with game theory, to arrive at a consensus protocol able to circumvent the Byzatine’s General Problem.  Now participants have incentives to follow the rules (they get financially rewarded for doing so by mining for example, and penalized for misbehaving), which in turn results in a stable system.  This was the first time that crypto-economics were used in a working product and this became the base and norm for a lot of the new systems today.

Pricing data is needed as context to blockchain data: there are a lot of (ERC-20) tokens created on Ethereum - it is very easy to clone an existing contract, and configure it with a certain amount of initial tokens (most commonly in the millions and billions in volume).  Each token has an intrinsic value, as determined by the law of supply and demand, and as traded on the exchanges.  Price fluctuations have an impact on the adoption and usage, meaning on the overall transaction volume (and to a certain extent transaction throughput) on the blockchain.

Blockchain data is needed as context to market data: activity on blockchain can have an impact on market data.  For example, one can look at the incoming token transfers in the Ethereum transaction pool and see if there are any impending big transfers for a specific token, which could result in a significant price move on the other end.  Being able to detect that kind of movement and act upon it is the kind of signals that traders are looking for.  Another example can be found with token supplies: exchanges want to be notified as soon as possible when a token circulating supply changes, as it affects their trading ability, and in the worst case scenario, they would need to halt trading if a token contract gets compromised.

In conclusion, events on the blockchain can influence price, and market events also have an impact on blockchain data: the two are intimately intertwined, and putting them both in context leads to better insights and better decision making.

  1. All the data you provide is publicly available, what gives?

Very true, all this data is publicly available, that is one of the premises and fundamentals of blockchain models, where all the data is public and transparent across all the nodes of the network.  The problem is that, even though it is publicly available, it is not quick, not easy and not cheap to access.

Not quick: blockchain data structures were designed and optimized for achieving consensus in a hostile and trustless environment and for internal state management, not for random access and overall search.  Imagine you want to list all the transactions that your wallet address has participated in?  The only way to do that would be to replay all the transactions from the beginning of time (starting at the genesis block), looking at the to and from addresses and retain only the ones matching your wallet: at over 500 million of transactions as of today, it will take some unacceptable amount of time to retrieve that list for a customer facing application.

Not easy: Some very basic things that one would expect when dealing with financial assets and instruments are actually very difficult to get at, especially when related to tokens.  For example, the current Ether balance of a wallet is easy to retrieve in one call to a Geth or Parity client - however, looking at time series of these balances starts to be a little hairy, as not all historical state is kept by these clients, unless you are running a full archive node.  Looking at token holdings and balances gets even more complicated, as most of the token transfers are part of the transient state and not kept on chain.  Moreover, token transfers and balance changes over time are triggered by different mechanisms (especially when dealing with contract to contract function calls), and detecting these changes accurately is prone to errors.

Not cheap: As mentioned above, most of the historical data and time series metrics are only available via a full archive node, which at the time of writing requires about 3TB of disk space, just to hold all the blockchain state - and remember, this state is in a compressed and not easily accessible format. To convert it to a more searchable format requires much more space.  Also, running your own full archive node requires constant care, maintenance and monitoring, which has become very expensive and prohibitive to run.

  1. Who uses your API today and what do they do with it?

A wide variety of applications and projects are using our API, across different industries ranging from wallets and  trust funds (DappRadar), to accounting and arbitrage firms (Moremath), including analytics (Stratcoins) and compliance & security companies (Blue Swan).  Amberdata’s API is attractive to many different people because it is very complete and fast, and it provides additional data enrichment not available in other APIs, and because of these, it appeals to and fits nicely with our customers use cases:

· It can be used in the traditional REST way to augment your own processes or enrich your own data with hard to get pieces of information.  For example, lots of our users retrieve historical information (blocks and transactions) and relay it in their applications to their own customers, while others are more interested in financial data (account & token balances) and time series for portfolio management.

https://medium.com/amberdata/keep-it-dry-use-amberdatas-api-9cdb222a41ba

· Other projects are more in need of real-time up-to-date data, for which we recommend using our websockets, so you can filter out data in real-time and match your exact needs, rather than getting the firehose of information and having to filter out and discard 99% of it.

· We have a few research projects tapping into our API as well.  For example, some of our customers want access to historical market data to backtest their trading strategies and fine-tune their own algorithms.

· Our API is also fully Json RPC compliant, meaning some people use it as a drop-in replacement for their own node, or as an alternative to Infura for example.  We have some customers using both Amberdata and Infura as their web3 providers, with the benefits of getting additional enriched data when connecting to our API.

· And finally, we have also built an SDK on top of the API itself, so it is easier to integrate into your own application (https://www.npmjs.com/package/web3data-js).

We also have several subscriptions to match your needs.  The developer tier is free and gets you access to 90% of all the data.  If you are not sure about your usage patterns yet, we recommend the on-demand plan to get started, while for heavy users the professional and enterprise plans would be more adequate - see https://amberdata.io/pricing for more information.

All and all, we try really hard to make it as easy as possible to use for you.  We do the heavy lifting, so you don’t have to worry about all the minutia and you can focus on bringing value to your customers.  We work very closely with our customers and continuously improve upon and add new features to our API.  If something is not supported or you want something that is not in the API, chances are we already have the data, do not hesitate to ask us ;)

  1. Amberdata recently made some headlines for discovering a vulnerability on Parity client.  Can you tell us a bit more about it?

This is an interesting one.  One of our internal processes flagged a contract, and more specifically the balanceOf(...) call: it was/is taking more than 5 seconds to execute (while typically this call takes only a few milliseconds).  While investigating further, we started looking at the debug traces for that contract call and were pretty surprised when a combination of trace_call+vmTrace crashed our Parity node - and not just randomly, the same call would exhibit the exact same behavior each time, and on different Parity nodes.  It turns out that this contract is very poorly written, and the implementation of balanceOf(...) keeps on looping over all the holders of the token, which eventually runs out of memory.

Even though this is a pretty severe bug (any/all Parity node(s) can be remotely shutdown with just one small call to its API), in practice the number of nodes at risk is probably small because only operators who have enabled public facing RPC calls (and possibly the ones who have enabled tracing as well) are affected - which are both disabled by default.  Kudos to the Parity team for fixing and releasing a patch in less than 24 hours after the bug was reported!

  1. How do you access the data? How do I get started?

We sometimes get the question, “I do not know how to code, can I still use your data?”, and it is possible!  We have built a few dashboards on our platform, and you can visualize and monitor different metrics, and get alerts: https://amberdata.io/dashboards/infrastructure.

A good starting point is to use our Postman collection, which is pretty complete and can give you a very good overview of all the capabilities: https://amberdata.io/docs/libraries and https://www.getpostman.com/collections/79afa5bafe91f0e676d6.

For more advanced users, the REST API is where you should start, but as I mentioned earlier, how to access the data depends on your use case: REST, websockets, Json RPC and SDK are the most commonly ways of getting to it.  We have a lot of tutorials and code examples available here: https://amberdata.io/docs.

For developers interested in getting access to Amberdata’s blockchain and market data from within their own contract, they can use the Chainlink Oracle contract, which integrates directly with the API:

https://medium.com/amberdata/smart-contract-oracles-with-amberdata-io-358c2c422d8a

  1. Amberdata just recently celebrated 2 years birthday.  What is your proudest accomplishment? Any mistake/lesson you would like to share with us?

The blockchain and crypto market is one of the fastest evolving and innovating markets ever, and a very fast paced environment. Having been heads down for two years now, it is sometimes easy to lose sight of the big picture.  The journey has been long, but I am happy and proud to see it all come together: we started with blockchain data and monitoring/alerting, added search, validation and derived data (tokens, supplies, etc) along the way, and finally market data to close the loop on all the cryptoeconomics.  Seeing the overall engagement from the community around our data is very gratifying: API usage climbing up, more and more pertinent and relevant questions/suggestions on our support channels, other projects like Kadena sending us their own blockchain data so it can be included in Amberdata’s offering… all of these makes me want to do more :)

Free Q&A

---Who are your competitors? What makes you better?

There are a few data providers out there offering similar information as Amberdata. For example, Etherscan has very complete blockchain data for Ethereum, and CoinmarketCap has assets rankings by market cap and some pricing information.  We actually did a pretty thorough analysis on the different data providers and they pros and cons:

https://medium.com/amberdata/which-blockchain-data-api-is-right-for-you-3f3758efceb1

What makes Amberdata unique is three folds:

· Combination of blockchain and market data: typically other providers offer one or the other, but not both, and not integrated with each other - with Amberdata, in one API call I can get blockchain and historically accurate pricing data at the same time.  We have also standardized access across multiple blockchains, so you get one interface for all and do not have to worry about understanding each and every one of them.

· Validated & verifiable data: we work hard to preserve transparency and trust and are very open about how our metrics are calculated.  For example, blockchain data comes with all the pieces needed to recompute the Mekle proofs so the integrity of the data can be verified at any moment.  Also, additional metrics like circulating supply are based on tangible and very concrete definitions so anybody can follow and recalculate them by themselves if needed.

· Enriched data: we have spent a lot of time enriching our APIs with (historical) off chain data like token names and symbols, mappings for token addresses and tradable market pairs, etc.  At the same time, our APIs are very granular and provide a level of detail that only a few other providers offer, especially with market data (Level 2 with order books across multiple exchanges, Best Bid Offers, etc).

That's all for the 40th AMA. We should like to thank all the community members for their participation and cooperation! Thanks, Joanes!


No comments:

Post a Comment