Tag Archives: De-duplication

Using File Versioning to Recover Lost Files

January 20, 2017 Kris Price

When it comes to making sure that your company’s data remains safe and secure, there is no better solution than cloud data backup. Thanks to this increasingly popular and important IT service, businesses of all sizes can now enjoy the peace of mind of knowing that their data is constantly being backed up and preserved for later retrieval whenever the need should arise.

However, the simple backing up of saved files is just the beginning of the myriad advantages that cloud data storage has to offer, with many cloud backup service providers now granting enterprises the amazing ability to go virtually “back in time” in order to retrieve and recover previously saved versions of files that have long ago been deleted from their local systems! Indeed, through a process called “file versioning”, cloud storage providers are now commonly implementing yet another crucial layer of data security, protecting their clients from incorrectly saved files.

With our fast-paced work environments and present-day fondness for digital multi-tasking, accidentally “saving over” a previous version of a file that one may have needed is, unfortunately, is far from a completely rare occurrence. And, in the not-so-distant past, accidentally “saving over” a file meant that you were stuck with the newer version, no matter what. However, by taking advantage of cloud backup and storage services with file versioning feature capability, businesses can now retrieve these lost files from their service provider’s servers with ease!

Through the use of advanced versioning methods, massive amounts of previous file history information is compressed so that disk space on the cloud backup server can be conserved and backup and restore data transmission times minimised. Under this system, each file that is transferred to the cloud backup company’s servers is given a unique name and time stamp, with the first version of the file that is backed up being called (not surprisingly) the “primary file”. Then, the cloud backup company’s software automatically scans to see if this file exists in exactly the same state at any other location on your company’s backup cache and, if so, replaces these duplicate files with a “pointer” toward the original, effectively making it so that one file can virtually exist in two or more places at the same time.

In addition to this, every time a new version of a file with the same name is saved to the cloud backup cache, it is compared to the original, and if they differ in any way, the older one is retained and the newer one is also, of course, saved and given a version name and corresponding time stamp. However, using special algorithms, this newer version of the file only contains the information that has been changed in comparison to the older version, and embedded digital pointers reference the older version for all the information that has not been changed. Although versioned files like these are technically split up in order to save storage space and enable previously saved versions of files to be retrieved when being accessed remotely by the end user, the referenced sections of the files are automatically replaced by the actual information from the original so that all files are intact.

Sound complicated? Well, one of the biggest advantages of leaving your backup and IT concerns to an online cloud storage service provider is that they will keep track of all these details so that you don’t have to, and we here at Backup Technology Limited (BTL) take pride in lifting this burden from the shoulders of all the enterprises whose data we make it our duty to protect. We’ve integrated a robust file versioning system into all of our data storage and recovery solutions, and our powerful backup service, powered by Asigra, constantly monitors your company’s computers for changes made to files and automatically saves these changes for you! In addition to this, BTL uses its latest technology to backup incrementally all unsaved files that are left open for longer periods of time.

BTL believes that file versioning is here to stay for a long-term. Therefore, BTL has assembled the most cutting edge suite of cloud computing tools in its cloud services. With top of the line data protection practises like bank-grade encryption, FIPS, de-depulication, and advanced password security protocols, clients never have to worry about anyone accessing their data, except authorised users on the account.

By backing up your company’s data with BTL, you’ll never have to worry about incorrectly saving a file again. Please stop by our website today to download our free, no-obligation trial software, and see what the BTL advantage is all about! Remember that file versioning is just one of the myriads of advantages that BTL can bring to your SMB. So, speaking of not having to take any risks on the file versioning; why not visit BTL’s website and test-drive today? www.backup-technology.com/cloud-backup

What is Pre-Process De-Duplication?

October 14, 2015 Stewart Parkin

The context of Pre-Process De-duplication is also known as Source De-duplication in most case. The source de-duplication is mostly used for de-duplication of data prior to its transmission to the device for storage. The entire data are channelled via the source de-dupe software or hardware before being ready to be transmitted to the storage device where it will be stored. The major objective of source de-duplication is to prevent sending of duplicated data across the network to the device where it will be stored. There is establishment of connection using the designated storage device as well as evaluation of data prior to the time when it will initiate the de-duplication process. The synchronisation with the target disk upholds all through the process in order to ensure synchronisation of data, removing the files that match at the source. The main advantage of this is that it helps save bandwidth for the user.

In order to identify changed bytes, there is always the need for byte level scans by either the source de-dupe software or hardware. To make recovery easy for the user, the changed bytes are transferred to the destination or target device, pointing it to the original indexes and files updated with the pointer. Indeed, it does not take time to control the entire operation as they happen quickly without compromising the accuracy and efficiency of the process. The process of source de-dupe is light on processing power when compared to post process de-dupe. It has been observed that source de-duplication has the capability to categorise data in real time. The device configurations that are based on policy can classify data at granular levels, as well as filter out data while they pass across the source de-dupe device. There can be addition or removal of files on usual basis of the group, domain, user, owner, age, path, file type, or storage type, or even on the basis of RPO or retention periods.

Having said the advantages of source de-dupe, there are some disadvantages associated with source de-dupe. It is true that source de-duplication helps decrease the bandwidth you need to transmit data or files to the destination or target, however, there is imposition of higher processing load on the clients, as the entire process is involved in the source de-duplication. In addition, the central processing unit (CPU) power consumption of your device will go higher by about 25% to 50% during source de-duplication process, which may not really be favourable for you at all. There may be needs to incorporate source based de-dupe nodes into each of the locations connected. This involves more cost and will obviously be more expensive than the target de-duplication techniques, where all the de-duplications are carried out on one de-duplication device, within the network nodal point.

Lastly, if the existing software does not support de-duplication hardware or algorithms, there may be the need for redesigning of the software. This, however, is not a problem in target de-duplication, where there is isolation of de-dupe hardware and software from the organisation’s hardware or software. Also, there are no changes needed at the source de-dupe.

Effects of Bandwidth in Cloud Computing

October 9, 2015 Rob Mackle

The term “bandwidth” has been used in Electrical Engineering for years to mean “the difference between the upper and lower frequencies in a continuous set of frequencies, measured in hertz”. In the early 1990’s Telcos started to use the term “bandwidth” to describe the volume of data handled and defined it as “the transmission rate of data across a network”. Bandwidth in data transmission is measured in bits per second and it represents the capacity of network connection. Increase in the capacity means improved performance, considering other factors like latency. We will further discuss the effects of utilisation of bandwidth to the challenges associated with cloud computing.

Cloud computing providers usually calculate required bandwidth of customers just by considering the available quantity of bandwidth as well as the mean bandwidth utilisation needed by variety of applications. In addition, cloud computing providers consider latencies in transmission to calculate the required time to upload both the initial backup and all subsequent backups. For that reason, Internet based cloud backup service providers work hard to enhance the overall Internet bandwidth. They also do everything within their power to reduce the amount of data that flows through their pipes. There are many things the cloud service providers do to achieve such goals. They can use incremental backup technologies, link load balancing technologies or even some exceptional binary patching to transmit and extract file’s changes so as to reduce/ balance the transmitted amount of data. In addition, both de-duplication and file compression techniques may be used to decrease the quantity of files that are transmitted over the network.

Information Technology administrators are often advised to determine, as accurately as possible, the quantity of bandwidth that will be needed by the organisation for both data storage and transfer operations in the cloud, and also the latency, expressed in milliseconds. Therefore, they have to consider the number of users and systems that will be responsible for pushing data into the available network space for data storage and other functions at non-peak and peak hours.

The online backup that is built with self-service enabling features, augmented by administrative interface that is user-friendly can easily be built with tools that allow customers to select the things that can be backed up at any point in time. There is also inbuilt filter, which enables users to add or remove files as well as folders from their sets of backups. There may be scheduling of backup sets for upload at different points in time in the Internet based servers. Massive data transfers can be scheduled during off-peak times of the organisation, increasing the speed of the transfer of data to when there is enough bandwidth.

Nevertheless, it is important for you to know that there is interconnection between bandwidth and latency. The connection speed of the network is the function of latency and bandwidth. Latency cannot be decreased drastically, however, bandwidth can be increased at anytime.

High Traffic Environments: Backing up your Server in the Cloud

August 14, 2015 Callum Huddlestone

Server backup can be tricky. The backup cloud must support all kinds of operating systems and applications. It should be remembered that servers operate in a high traffic environments. They are always connected to the remote backup server in the cloud via the Internet and continuously service the data requirements of the end user. The server platform can include all flavours of Windows, Linux, Unix, Mac OS X, AS-400, Solaris or all kinds of applications such as Exchange/ Outlook, Lotus Notes/Domino, Groupwise, SQL, MySQL, PostgreSQL, Oracle, IBM DB2, Sharepoint or more.

Agentless server backup is preferable. The agentless software requires only a single instance installation of the agent on a network. All servers connected to the network can then be identified and the agent can be configured to accept backup requests from the servers. The entire backup process can be managed and controlled from a single console.

IT Administrators can schedule backups and decide on the granularity of the backup. For instance the IT Administrator can decide that Windows and Linux systems must be backed up continuously and other systems may be backed up as per a fixed schedule. A single backup can be the backup of all data in a server, a share or volume or directory. The backup may also be of a single file or registry. Backups can also be scheduled to start and stop within a specific backup window.

Multiple servers can be backed up in parallel. The multi threaded backup application ensures that the software can process large volumes of data and support concurrent backup and restore requests. The only limiting factor would be the amount of bandwidth that is available with your organization.

The data being backed up to the server can be compressed or deduplicated using compression algorithms that have been tested and proven. The lossless algorithms can reduce the size of the raw data being transmitted from the server and common file elimination protocols can ensure that the same file is not transmitted twice. Versions of files may be stored with appropriate version numbers for ease of discovery. Retention policies can be specified to ensure that only relevant and active files continue to be stored in expensive storage media and other files are relegated to cheaper storage systems to save on costs.

Backup Technology’s Backup for Servers is high performance software that is designed to operate efficiently in high traffic environments with many of the features that have been discussed above. For more information please visit, http://www.backup-technology.com

Establishing Successful Cloud Computing Services

May 1, 2015 Sam Richardson

One method of ensuring that parties to a contract are on the same page as regards expectations and their fulfilment is the drawing up of service level agreements (SLAs). These agreements clearly specify what the vendor is willing to deliver and what the customer can expect to receive with reference to a cloud services contract. SLAs form an important management tool and are often formally negotiated and have specific metrics to quantify delivery of agreed services.

Before discussing the “how to” of establishing a successful business relationship in the cloud, let us quickly review the “bare minimum offering” in the cloud:

1. Readily available computing resources are exposed as a service;
2. The economic model is generally a pay-as-you-go service;
3. May or may not process data into meaningful contexts;
4. Limited guarantees on scalability, reliability and high availability;
5. Security systems are designed to be reasonably hacker proof;
6. Supports environmental goals by reducing carbon footprints;
7. Provides monitoring tools and metrics for evaluating services.

A quick think through, on the offerings of short listed cloud vendors, will establish the decision points for the relationship and the drafting of the SLA. The enterprise must have clarity on:

1. Whether the kind of service being offered by the vendor is the kind of service the enterprise needs?
2. Whether the definition of the “unit” measure of service is determined and can be monetised.
3. Whether the enterprise wants the service provider to process the data into meaningful contexts using compression or de-duplication technologies or it wants the data to be stored “as is where is”.
4. Whether the scalability, high availability and reliability can be truly obtained via the service. The enterprise must examine in some detail the technical claims being made by the service provider and feasibility thereof. A quick market research on the reputation of the vendor will also help in decision making.
5. Whether security guarantees are backed by industry best practises and third party certifications of cryptographic algorithms and user acceptance.
6. Whether green computing options are strictly enforced by the vendor and
7. Whether the service monitoring tools provided will truly reflect the level of service being provided by the vendor.

We, at Backup Technology, believe in working with our customers in a trustful relationship. The Service Level Agreements (SLAs) we design is guaranteed to satisfy the most stringent monitoring requirements and reflects the kind of relationship we seek to establish with our customers.

Efficient, Stable De-duplicating Processes

April 15, 2015 Kris Price

Storage needs are ballooning. Data volumes will soon overwhelm organisations like a never receding Tsunami if nothing is done about it. There are very few choices. Organisations must:

1. Get more storage space;
2. Archive / take offline data that is no longer relevant; or
3. Compress the data stored.

While falling disk prices and innovative storage systems have made it possible to hire more disk space or archive data effectively, it is data compression that has received a lot of attention in recent years. Compression not only saves disk space, it also saves bandwidth required for transmission of data over the network. Data compression includes data de-duplication and is relevant both for data storage and data archival.

Disk based “de-duplication systems” compress data by removing duplicates of data across the data storage system. Some implementations compress data at a ratio of 20:1 (total data size / physical space used) or even higher. This may be done by reducing the footprint of the versioned data during incremental or differential backup.

Vendors use a variety of algorithms to de-duplicate data. Chunking algorithms break the data into chunks for de-duplication purposes. Chunking can be defined by physical layer constraints, as sliding blocks or single instance storage algorithms. Client backup de-duplication systems use hash calculations to evaluate similarity between files for removal and referencing of duplicates. Primary and secondary storage de-duplication designs also vary. While primary storage de-duplication is directed towards performance optimisation, secondary storage is more tolerant of performance degradation and hence de-duplication algorithms are constructed with more leeway.

Until recently, data de-duplication was only associated with secondary storage. The increasing importance of the cloud as a primary storage has created a lot of interest in de-duplication technologies for primary storage.

We, at Backup Technology, offer an agentless cloud backup service powered by Asigra. Our software de-duplicates data at source and transmits only the compressed, bandwidth efficient incremented or differentiated ‘bits and bytes’ or blocks of the backup set, over the network. Global level de-duplication technology at source is also available for the hybrid versions of the cloud for local backups. The comprehensive solution is non-invasive and ideally suited for small, medium businesses, as well as enterprises. Why don’t you try out our software before you decide to commit yourself?