All my articles by topic.
Why certify data
Today digital information travels all over the world at the speed of light, all systems that involve citizens’ private, social and working life are interconnected and all information is easily searchable, replicable and shareable.
However, this technological marvel creates new challenges in verifying the reliability, authenticity and source of information and in the protection of intellectual property, in fact it is extremely simple to appropriate the creations of others, modify them only in part and make them your own, just as it is extremely easy to alter or forge information to pursue corporate, social or political objectives.
To protect ourselves from the appropriation and manipulation of information, for a long time we turned to notary services, an institution whose foundation in the West dates back to the early Middle Ages, services that allow citizens and organisations to deposit and certify certain information through a trusted third party, generally authorized by states and governments.
Through notarisation it is possible to demonstrate the validity of a contract, the possession of a certain physical or digital property, and that a certain information exists from a certain moment and has not been subsequently modified.
One of the most interesting use cases of notarial certification is the protection of intellectual property, i.e. the ability to demonstrate having conceived a work, a process, an idea, a technological solution or any other creation of the human mind.
A more sophisticated mechanism that allows the omission or inconsistency of information to be protected from alteration is auditing, which is the validation of interconnected information performed by trusted third-parties and which are generally used in the financial sector.
The combined notarisation and auditing services provide a data certification service and guarantee the highest level of reliability of the data and the organizations connected to it.
Fighting counterfeiting with certification
Counterfeiting affects companies that produce or distribute goods of high quality or high added value, and can be fought effectively using blockchain technologies. The main strategy to combat counterfeiting is to verify the authenticity and uniqueness of an asset through a unique identifier that can not be duplicated.
This can be combined with the tracing of the history of the product, its components, processes, maintenance and owners. For batches of limited quantity / high value product, it is possible to track the number of copies in circulation, allowing all participants of a supply chain, including merchants and buyers, to independently verify all these attributes through a dedicated app.
To make the verification of authenticity and tracing more effective, physical objects can be equipped with identifiers that cannot be duplicated, as anti-tampering tags, which in more sophisticated cases may contain unique physical attributes that identify the object.
Alteration of products in supply chains
Many types of fraud occur when there is an information asymmetry between the buyer and the seller, on the state of wear, maintenance and operation of a physical object (for example the amount of use and maintenance of a machine / vehicle, the operation in hours of an industrial appliance, maintenance work on a property. These frauds can be counteracted by maintaining on the blockchain a temporal sequence (or timeline) of readings of specific attributes of a certain item, which is impossible to alter in retrospect and which is able to add value to the item in the event of a sale. For example by adding a monthly track of the mileage of a vehicle or the hours of use of an industrial machinery, in case of sale the buyer could check the temporal consistency of these traces and have greater confidence (and therefore be willing to pay a higher figure) for the purchased product.
For further information on the topic of blockchain and supply chain, we recommend the article Blockchain and Quality Assessment in Supply Chains
Pains in data certification.
The main problems in resorting to the use of notarisation systems and centralised certification of information are mainly connected to:
- costs: the notation and certification of information through specialized companies is very expensive;
- automation: it is not always possible to include traditional notarisation and certification processes in an automatic flow, often human intervention is needed that limits speed and flexibility;
- reliability: human certifiers are always corruptible, subject to errors and their archives can be destroyed or tampered with by external actors.
- heterogeneity: supply chains provide for interactions between parties that are often distributed globally, with incompatible regulations and systems.
These pains create the preconditions for the emergence of new solutions on a decentralised public infrastructure such as blockchain.
The certification of data on blockchain enables innovative players to use a public, neutral and unchangeable tool to demonstrate the reliability of their operations, using certification and auditing tools. Writing data on blockchain allows different actors to collaborate and contribute to the data of a physical or digital item, in a simple and effective way. In fact, thanks to the blockchain it is possible to simply write information on an indelible global ledger, distributed among thousands of independent nodes, which can be consulted freely and free of charge by customers, investors and partners.
The advantages of using the blockchain as a platform for the certification of information and, at a more advanced stage, also for economic exchanges in commercial operations are manifold:
- neutral and permissionless platform: there is no need to sign commercial agreements in order to access the services of the main public blockchains;
- accessible by all: there are no geographical or political limits to the use of these services;
- economic: the costs for the certification of a datum and the transaction fees are a fraction of the centralised equivalents;
- recognized economic and trust value: the reliability and financial value of the underlying tokens are recognized internationally without guarantee superstructures;
- programmable: all blockchain operations can be integrated with automatic systems.
Main Certification Use Cases
In this section we describe the most interesting thematic areas that many companies are exploring for the certification of data on blockchains.
- Intellectual Property: it is possible to certify an intellectual property associated with a digital or digitizable creativity, it is also possible to track operations on it such as the assignment, payment of rights and notification of the reproduction of a work.
- Anti counterfeiting / Anti manipulation: it is possible to safely identify a material item, for example by certifying some of its unique physical characteristics on the blockchain, and it is possible to use this information to detect and therefore discourage counterfeits and manipulations.
- Product / supply chain: it is possible to track all events relating to raw materials, construction, transport, sale and possession of an item until its disposal.
- Business Transparency: it is possible to track the financial status and the production of certain goods by a certain business.
- Self Sovereign Identity: allows (public and private) organizations to add information collaboratively by associating it with a self-defined digital identity, such as certifications, licenses and educational qualifications.
- Certified Economic Operations: it is possible to certify and automate procedures and contracts that provide for economic exchanges and remuneration between parties, supervised by Smart Contract.
- Public Administration Transparency: it is possible to track the financial activities, contracts and the use of funds of a public administration.
The main decentralised public blockchains, thanks to their characteristics of immutability, unlimited persistence over time, transparency, accessibility and programmability are the optimal solution for the notarisation and certification of information.
However, writing and verifying data on blockchain is complex for the average digital user: it is necessary to be familiar with information representation formats, to be familiar with cryptographic primitives, to know how to manage token wallets and private keys, to know how to interact with smart contracts, to know purchase tokens to pay the writing fees and finally make everything accessible to all interested parties.
That’s why many companies and startups are working on the development of high-level platforms that simplify blockchain interaction in data certification.
One of the solutions with greater innovation potential is Themis, a platform dedicated to the certification of linked and structured data on blockchain, designed to be general purpose. This platform provides, among other things, a tool for the certification of semi-structured textual data, through a user experience similar to that of a social microblogging.
Any digital information can be converted into a single digital fingerprint. This fingerprint looks like a alphanumeric fixed length code capable of summarizing arbitrarily sized data, it can only be generated from the content of the original information, the same content always generates the same fingerprint, and in no way allows external observers to reconstruct the original digital information, thus ensuring complete privacy. However, given the original information, it is always possible to verify the correctness of the fingerprint. Technically a digital fingerprint is obtained by applying a function hash.
The immutable writing of fingerprints on blockchain is called timestamping.
Writing a digital fingerprint of a data on blockchain, together with any metadata (such as the name of the author of the information, the underlying company, the digital signature created with certificates issued by third parties), is sufficient to conclusively demonstrate that a certain information , in possession of a specific entity, existed at a specific time, not necessarily by publishing the content of the information itself.
Thanks to the fact that the data on the blockchain is accessible to the public, all verifications can be carried out independently by interested and authorized third parties, after accessing the original data, if they are limited.
The combination of the digital fingerprint of a data and its timestamp on the blockchain allow to demonstrate that these data existed at a given moment, while the possession of the data (if private) or the presence of a digital fingerprint affixed to the timestamp metadata allows to conclusively demonstrate that this information was produced by a particular person or organization.
Finally, the historicization of a public key of a person / organization with the relative verification procedure (KYC) allows for non-repudiation of the signature itself.
Notarisation and certification modes
The timestamping of data by entities identified with asymmetric keys is called notarisation on blockchain or simply (notarisation in the context of this article).
The notarisation of data by certified entities is called data certification.
The mind map below illustrates the concepts of Data Notarisation, Identity Certification and Data Certification, their main relationships and properties. In particular, the two Identity Certification methods (self or 3rd party) and the two data certification methods (self or 3rd party) with relative properties are distinguished.
Modalities for data timestamp
Previously we introduced the concept of timestamping and fingerprint, however there are different strategies for certifying data on the blockchain, with different pros and cons. We summarise all the strategies below.
- Fingerprint writing: the fingerprint of a raw data is written on the blockchain using the payload of a transaction.
- > Pros: economic since the data written on the blockchain has a contained dimension regardless of the size of the original data, compatible with personal and business privacy.
- > Cons: the data verifier is not autonomous.
- Aggregate fingerprints writing: multiple data fingerprints are aggregated together to build an overall fingerprint that allows you to further reduce the number and volume of data written on the blockchain.
- > Pros: cheap as it can aggregate arbitrary quantities of fingerprints without compromising demonstrability.
- > Cons: the data verifier is not autonomous.
- Raw data wiring: a raw data that you want to certify is written directly on the blockchain using the payload of a transaction.
- > Pros: the data can be verified independently of any validator and is available forever without the possibility of removal.
- > Cons: the data cannot be deleted in any way and can introduce irreversible privacy problems.
As we have seen previously, data on blockchains can be notarised as fingerprints or in plain (raw) format. The main advantage in writing plain data consists in guaranteeing autonomy to any verifier. Since the process of writing data on blockchain is very expensive, plain writing can only be applied to small and detailed data. Notarisation of data as a fingerprint is the preferred approach, in fact, in addition to keeping the writing costs under control and making them independent of the certified data volume, it allows you to respect implement different levels of privacy while still guaranteeing the provability of the data, in compliance with of regulations such as the GDPR.
How to represent the data so that it is updatable
Representing the certified data on the blockchain in a format that can be human and machine interpretable opens very powerful automation scenarios, especially when it comes to coordinating operations on a product / supply chain.
The representation of the data determines the flexibility of its use. Using linked and structured data provides many benefits outlined below.
- Structured data: the data has a machine readable structure, optionally it can be associated with a schema and therefore have a unique interpretation semantics, for example a data that represents a car could have a set of predefined characteristics, all the representations of a cars could refer to that scheme to make themselves uniquely understandable to any human reader and any machine.
- Linked data: linking information together allows you to express relationships between data, these relationships can also represent the evolution of the same data over time.
The data on the blockchain cannot be modified, therefore the only way to represent an update is to introduce a new data that contains a reference to an existing data, we also define a protocol according to which observers agree that that new data represents a valid update of the previous data. This operation can be validated by a smart contract or by an external protocol defined externally by the blockchain used to trace the information, not subject to the current limitations of throughput and costs.
In the diagram below we see:
- An example of a structured item with attributes and values, which can optionally define a schema.
- An example of a link between Items.
- An example of subsequent updates of an Item that build a timeline.
This data model actually allows us to represent knowledge graphs, it is therefore generic enough to be able to represent arbitrarily complex relationships and scenarios.
Other advantages related to the use of structured data and with a unique semantics are connected to the possibility of carrying out quantitative as well as qualitative consistency checks, verifying facts such as quantity consistency: a certain manufacturer declares to use a certain quantity of certified raw materials for make your own product P. Given the quantity of products P certified by the manufacturer, does it actually appear that this manufacturer has purchased the necessary quantity of certified raw materials?
Platforms for data certification on blockchain
Certifying data on blockchains is an operation that requires a lot of skills, in fact, as illustrated above, it is necessary to define and be familiar with:
- data formats to represent the information in a univocal and interpretable way by all or at least by stakeholders;
- protocols to allow all actors to contribute to this data, in order to distinguish lawful operations;
- management of wallets and tokens and the purchase of these, necessary to pay timestamp operations to the network.
Depending on the volume of data to be certified, a compromise must also be found in terms of frequency and writing volume compatible with the throughput of the underlying blockchain.
It is also possible to automate economic transactions based on operations on certified data, in order to make faster, cheaper and more reliable the financial processes behind the product / supply chain. Although this opportunity is still in an embryonic state, as it is influenced by the maturation of the use of blockchains as B2B payment instruments, it is among the most promising, as it would significantly increase the level of trust in long and distributed product chains.
To achieve the certification functionalities described in this article, many players turn to the use of off-the-shelf platforms. These platforms may have been built with different architectures, with different compromises in terms of reliability (how much we have to trust managers), usability (how simple it is to use them) scalability (how many operations they allow us to do). Let’s see the main categories for these platforms.
- Fully decentralised: all certification and validation operations are performed by on-chain code.
- > Pros: it is completely reliable.
- > Cons: has scalability and usability limits, in terms of cost and flexibility of the operations that can be implemented.
- Semi decentralised: some auxiliary functions such as data aggregation, indexing and some validation aspects are implemented centrally, the data certification functions remain on the blockchain.
- > Pros: has no limits of scalability and flexibility.
- > Cons: there is no complete reliability for all operations and all functional scenarios.
- Centralised: all features are implemented as a centralised service. There is no guarantee of reliability however scalability and usability have no technical limits.
In the information age it is very easy to acquire data but very complex to verify its reliability, it is also more important to protect its ownership and authenticity.
Traditional information protection tools are obsolete and expensive, but today the blockchain enables a new certification paradigm, which is economic and programmable.
It is in fact possible to certify the temporal origin and the authoritativeness of your data thanks to timestamping and identity certification, using different approaches, some of which guarantee full respect for privacy and rules for the protection of data processing (such as GDPR).
However, using the blockchain requires specific skills on how to buy, hold and use tokens, on how to carry out transactions, on how to perform payload writing, on how to represent information.
Many companies are responding to this market opportunity by providing platforms with different degrees of specialization, which simplify the operations of writing and consulting information on blockchain.
We have examined what are the main features of these certification platforms, in terms of data representation, validation, authority, privacy, and described a generic data model that can be used to create the most promising decentralised certification scenarios.
Finally, we believe that the development of certification solutions is only in its infancy and that this industry has the potential to establish itself as an inevitable complementary tool in the trend of digitization of large industrial and logistical processes.
If you’ve made it this far, you’ve probably enjoyed my article. Why don’t you leave me feedback, like a comment or applause? If you’re new to Medium, you probably won’t know that a click on the applause button is only worth 1/50 of the top grade.