Tag Archives: Post-process De-duplication

What is Post-Process De-Duplication?

There are two ways in which target de-dupes can be performed, and they are either as post process de-duplication or inline de-duplication.

The de-duplication that occurs between the source and the destination or target is what is what is termed Inline de-duplication. On the other hand, Post process de-duplication is best defined as the situation whereby data is de-duplicated at programmed time frames when it has already been transmitted by the source, but prior to it getting to the storage device. The channelling of the data can either be through hardware or software based on the case involved. Both the hardware and the software remain in sync with the storage disk. There is also the evaluation of data against the ones already in the storage disk for easy identification and removal of duplicated data, irrespective of the target de-duplication being used.

The enterprise that has proprietary software installed in their system will benefit enormously from the post process de-duplication. But, there is not always a need for modification or redesigning of the source software at the organisation end in order to meet the needs of de-duplication hardware or software. There is no need to be concerned about compatibility issues, as the source system can easily push the data into transmission. There is no need for installation of de-duplication hardware or software at every terminal node in order to permit transmission of data. With the central location of the de-duplication software or hardware, data from all the nodes are automatically channelled via the de-dupe device located on the network.

Lastly, for more effective use of enterprise computing system, the CPU powers can be released when de-duplication load are removed from the client central process unit (CPU). This is where the post-process de-duplication is better than the pre process de-duplication. There is no doubt about the fact that the target de-dupe is quicker when compared with source de-duplication. The data is said to be pushed into the network, making the de-dupe process to operate at the storage end so as to match data quicker and remove duplicates with ease.

With lots of advantages of source process de-duplication, it is not without its flaws. The post process de-duplication is known to be bandwidth intensive. For that reason, if there is an exponential increase in the amount of data in an enterprise, the target de-duplication will not be the best option. Before scheduled post process de-duplication is started, although it might involve additional expenses, large arrays of storage disk will need to be used to create space for storage of transmitted data. This additional cost is among the flaws associated with post process de-duplication.

The need to redesign the proprietary software to accommodate demands of the de-duplication devices and process, installation of de-duplication hardware at all the connecting nodes, and others will contribute to be more cost effective than the use of technologies that are based on target de-duplication. If the cloud service provider partnering with the enterprise determines charges fees based on the bandwidth usage, source de-duplication may further be attractive.

Therefore, companies must determine the particular kind of de-duplication process that will work best for them. Some of the things enterprises need to consider before selecting any of the de-duplication process include: Volume of data, availability of bandwidth, cost of bandwidth, and lots of other important factors. In fact, the exercise involved in determining the best fit for an enterprise is not an easy one.

Saving Space and Money with Data De-duplication

Like every disruptive technology, the cloud is hungrily absorbing and assimilating within itself, a number of minor innovations and utilities. Data de-duplication is one such innovation that has been successfully integrated with cloud technologies to deliver value.

Technology experts are quick to point out that data deduplication is not really a technology. It is a methodology. It is software driven process that identifies and removes data duplicates in a given data set. A single copy of the data is retained in the store, while all duplicates of the data are removed and replaced with references to the retained copy. All files that initially contained a copy of the data, now contains a reference to the data item retained in the store. Whenever the file containing the deduplicated data item is called for, an instance of the data will be inserted at the right place and a fully functional file will be generated for the user. This method of compressing data reduces the amount of disk space that is being used for data storage and reduces costs of storage.

The growing importance of de-duplication can be traced to the growing volumes of data being generated by businesses. As businesses continue to generate data, space becomes a major constraint and financial resources may have to be allocated for acquiring larger storage capacities. Consequently, any technology that allows them to “have the cake and eat it too” is welcome!

Data deduplication can be “in-line” or “post process”.

In line data deduplication is a process that de-duplicates data before it is sent to storage server. This saves on bandwidth and time-to-backup, as the amount of data being transmitted over the Internet is reduced and only the “clean” data reaches the storage server. However, the process of de-duplication at the client end of the system is itself a time consuming process and is extremely resource intensive.

Post-process de-duplication removes duplicates from the data that has been uploaded to the storage server. There is neither saving of time or bandwidth during transmission, but there is certainly a saving of processing time and client hardware resources at the point of transmission of data, since all de-duplication processes happen on the cloud vendor’s server. Modern day backup companies use a combination of the two methods for obvious advantages.

Backup Technology have integrated data-de-duplication with its cloud backup and recovery solutions. The all-in-one suites for cloud computing and online backup automatically provide data de-duplication services to the subscribing clients. The software automatically detects and deletes all duplicate data and creates appropriate references to the data during the backup process. This saves time and money and results in faster time to backup and recover. The extensive versioning that is used in tandem adds to the strength of the software as older versions of any backed up file can be recovered — even if it was deleted from the source computer. For these and other similar reasons, we invite you to try our award winning cloud backup and disaster recovery and business continuity services, powered by Asigra. We are confident that you will be completely satisfied with what we have to offer!

Our Customers

  • ATOS
  • Age UK
  • Alliance Pharma
  • Liverpool Football Club
  • CSC
  • Centrica
  • Citizens Advice
  • City of London
  • Fujitsu
  • Government Offices
  • HCL
  • LK Bennett
  • Lambretta Clothing
  • Leicester City
  • Lloyds Register
  • Logica
  • Meadowvale
  • National Farmers Union
  • Network Rail
  • PKR

Sales question? Need support? Start a chat session with one of our experts!

For support, call the 24-hour hotline:

UK: 0800 999 3600
US: 800-220-7013

Or, if you've been given a screen sharing code:

Existing customer?

Click below to login to our secure enterprise Portal and view the real-time status of your data protection.

Login to Portal