There are two ways in which target de-dupes can be performed, and they are either as post process de-duplication or inline de-duplication.
The de-duplication that occurs between the source and the destination or target is what is what is termed Inline de-duplication. On the other hand, Post process de-duplication is best defined as the situation whereby data is de-duplicated at programmed time frames when it has already been transmitted by the source, but prior to it getting to the storage device. The channelling of the data can either be through hardware or software based on the case involved. Both the hardware and the software remain in sync with the storage disk. There is also the evaluation of data against the ones already in the storage disk for easy identification and removal of duplicated data, irrespective of the target de-duplication being used.
The enterprise that has proprietary software installed in their system will benefit enormously from the post process de-duplication. But, there is not always a need for modification or redesigning of the source software at the organisation end in order to meet the needs of de-duplication hardware or software. There is no need to be concerned about compatibility issues, as the source system can easily push the data into transmission. There is no need for installation of de-duplication hardware or software at every terminal node in order to permit transmission of data. With the central location of the de-duplication software or hardware, data from all the nodes are automatically channelled via the de-dupe device located on the network.
Lastly, for more effective use of enterprise computing system, the CPU powers can be released when de-duplication load are removed from the client central process unit (CPU). This is where the post-process de-duplication is better than the pre process de-duplication. There is no doubt about the fact that the target de-dupe is quicker when compared with source de-duplication. The data is said to be pushed into the network, making the de-dupe process to operate at the storage end so as to match data quicker and remove duplicates with ease.
With lots of advantages of source process de-duplication, it is not without its flaws. The post process de-duplication is known to be bandwidth intensive. For that reason, if there is an exponential increase in the amount of data in an enterprise, the target de-duplication will not be the best option. Before scheduled post process de-duplication is started, although it might involve additional expenses, large arrays of storage disk will need to be used to create space for storage of transmitted data. This additional cost is among the flaws associated with post process de-duplication.
The need to redesign the proprietary software to accommodate demands of the de-duplication devices and process, installation of de-duplication hardware at all the connecting nodes, and others will contribute to be more cost effective than the use of technologies that are based on target de-duplication. If the cloud service provider partnering with the enterprise determines charges fees based on the bandwidth usage, source de-duplication may further be attractive.
Therefore, companies must determine the particular kind of de-duplication process that will work best for them. Some of the things enterprises need to consider before selecting any of the de-duplication process include: Volume of data, availability of bandwidth, cost of bandwidth, and lots of other important factors. In fact, the exercise involved in determining the best fit for an enterprise is not an easy one.