Hammer – Page 7 – DragonFly BSD Digest

2011/07/262011/07/25

Pulse-width modulated time-domain multiplexer!

I really just like that phrase and the action movie feeling of using it, like “Watch out! The pulse-width modulated time-domain multiplexer is targeting us!” Sorta like a PU-36 space modulator. It’s actually a recently-committed mechanism to improve write performance in Hammer, but my idea sounds more exciting.

2011/07/222011/07/20

Deduplication real world results

I’ve posted about my own results with Hammer deduplication here before, but Siju George put together results from his workplace using actual files in production. He recovered 138G from a 1T disk, and recovered 20% of space from another disk. Not bad for something that’s nearly automatic, and completely free.

2011/05/30

Trying out deduplication

I moved to DragonFly 2.10 over the past few days, and I tried out deduplication, to see what kind of results I would get. The procedure is outlined below. I’m using /home here as an example, just to reduce the amount of text pasted in.

/pfs/@@-1:00004 966000640 566434576 399566064 59% /home

Move my various Hammer pseudo-file systems to version 5, which supports deduplication.

# hammer version-upgrade /home 5

Issue a deduplication simulate command, to see what it guesses will be the savings:

# hammer dedup-simulate /home Dedup-simulate /home: objspace 8000000000000000:0000 7fffffffffffffff:ffff pfs_id 4 Dedup-simulate /home succeeded Simulated dedup ratio = 1.22

That ratio turned out to be pretty accurate for the actual deduplication. I didn’t time it, unfortunately. I don’t know if the time taken is proportional to the amount of deduplication or the total volume of data, though I suspect the latter.

# hammer dedup /home Dedup /home: objspace 8000000000000000:0000 7fffffffffffffff:ffff pfs_id 4 Dedup /home succeeded Dedup ratio = 1.22 462 GB referenced 378 GB allocated 14 MB skipped 6869 CRC collisions 0 SHA collisions 0 bigblock underflows

The end result?

/pfs/@@-1:00004 966000640 505887504 460113136 52% /home

That data space is shared across all file systems, and it’s a 1TB disk, so it’s 7%, or 70GB. I was hoping for more, but I don’t have any obviously duplicated data (no local mail store, no on-disk backups), so perhaps this is normal. 70GB that I didn’t have before is no bad thing, though.

Incidentally, I was able to upgrade my installed software from pkgsrc-2009Q4 to pkgsrc-2011Q1 entirely using pkg_radd -u <pkgname>. Remarkably quick and painless, though pkgin may have been able to do it even faster since it would pull from the same place.

2011/05/25

Hot-swap and a bad disk

If you follow this thread, it has some discussion on how to handle a multi-disk setup and Hammer. If a disk is going bad, you can try mirroring, though you have to be careful how your pseudo-file systems are set up.

2011/05/122011/05/13

More on Hammer design

I mentioned it before, but Matthew Dillon’s updated his Hammer document, and posted about it. Read on, especially if you like extremely complex plans.

Edit: first link fixed, plus there’s a followup.

2011/05/11

Remember to enable deduplication

I didn’t think of this, but I needed it: if you have an older Hammer system that now can perform deduplication because you upgraded to DragonFly 2.10, make sure to add it to the configuration for that file system, or else it won’t run.

2011/05/10

SSDs and how to portion them

There’s been some discussion on buying a SSD and how to match it with a hard disk, and/or swapcache. Follow the thread for more details.

2011/05/05

Hammer and the future

Matthew Dillon’s been thinking about Hammer, and how to implement clustering well enough to work as a sort of RAID replacement. He’s written up a document describing his plans. Some highlights:

writable history snapshots
quotas and accounting
live rebuilds of data from mirrors
and the same history, mirroring, and snapshots as before.

It’s going to be a while before this “Hammer 2” becomes a finished product, though, so don’t count on it for the next release.

2011/04/22

RAM vs. deduplication

Tomas Bodzar asked about RAM usage with Hammer and deduplication, pointing at this example that shows ZFS requiring… I’m not sure. Lots? Anyway, Matthew Dillon noted that offline deduplication in Hammer would use available RAM/swap for CRCs on all files, but only a limited subset for ‘live’ dedup. For a real-world example, Venkatesh Srinivas described deduplicating about 600G down to 400G, with a machine having only 256M of RAM. Yes, only 256M.

2011/04/19

More Hammer documentation

Thomas Nikolajsen has put together more information on Hammer, including formatting and the new deduplication features, conveniently located in the man pages and some other spots.

2011/04/022011/04/02

Double buffering in Hammer usually useful

Enabling the vfs.hammer.double_buffer=1 sysctl will greatly improve Hammer performance when you’ve exceeded your memory cache (at a possible slight penalty when you have not) and also speed things up when using live deduplication.

Update: Venkatesh Srinivas says:

“double_buffer makes sense when: 1) you want all CRCs to be checked on reads. 2) you’re running live dedup and care about dedup performance rather than say read-heavy performance; 3) you have swapcache but are often running into the vnode limit in what you can cache.”

So, not always useful.

2011/01/11

Hammer speed improvements

Yeah, so those Phoronix benchmarks are crap, but Matthew Dillon went and implemented some things that would speed up Hammer write performance in any case. Read his summary for details.

2011/01/10

New Hammer version

The default Hammer version in DragonFly is now version 5, which is the one that includes deduplication. Enjoy, bleeding-edge users! Otherwise, wait for the next release.

Version 6 is there, but don’t upgrade to it yet; there aren’t significant user-visible changes, and the usual disclaimers for new versions apply.

2011/01/082011/01/08

Phoronix benchmarks for Hammer

A Phoronix test of DragonFly’s Hammer filesystem turned up, via Siju George. It’s not really a benchmark as much as it is a speed test, and it’s not a realistic comparison, but it’s interesting to see numbers.

They need a graph that shows how much historical data can be recovered by each file system, or how long fsck takes after a crash.

Update: Matthew Dillon points out the many ways these tests are wrong.

2010/11/072010/11/07

Deduplication arrives

Ilya Dryomov’s work on deduplication for Hammer has been committed to the tree in an early test form. I guess I need to pay up as part of the code bounty. If you’re wondering how much space it will save, but don’t want to try non-production code yet, there’s a ‘hammer dedup-simulate’ command that will estimate the saving ratio.

This is great news – deduplication is so valuable it adds an extra zero onto the price of any storage device that can do it.

2010/09/162010/09/16

Encrypted HAMMER volumes possible

I haven’t covered this enough: thanks to Alex Hornung, it’s possible to create a HAMMER volume and have it be encrypted. Matthias Schmidt has done just this, and has provided an rconfig(8) script to automate the process. (Or to crib from if you prefer to do it by hand.)

2010/09/052010/09/05

Lazy reading: the return of ACID, SSI, weirdness

A smaller set of links, but still the same volume of reading material.

Samuel Greear linked to this lengthy writeup on how to have both the consistency of ACID and the scaling of NoSQL. Astute observers may notice the similarities between the plan described and the way HAMMER works.
Joerg Sonnenberger pointed out to me, after my works on The BSD Show! that MOSIX is an open source single-system-image implementation, though it appears to be designed for specialized high-speed networks rather than the more general case of DragonFly.
This seems bizarre. (via)

2010/08/22

Updates and improvements for HAMMER, crypto

Matthew Dillon posted a summary of recent bugfixes in HAMMER and kqueue, which means if you are running a version of bleeding edge DragonFly build in the last few weeks, you should update.

He also mentions a “significant improvement in performance” in disk encryption. How significant? Over three times as fast.

2010/08/17

New HAMMER catastrophic recovery tool

Matthew Dillon reports that DragonFly now has a catastrophic recovery tool for HAMMER filesystems, with pertinent details.

2010/08/162010/08/16

Summary of recent kernel work

Matthew Dillon has provided some details about recent kernel work, along with a release forecast.