This week started a bit off with a sore throat (my two highest-ever mileage weeks combined with the SIGCOMM deadline, or coincidence? You decide), but I took it easy for a few days and then flew to the bay area for an ISAT meeting. The meeting was interesting from my side of things (I learned things), but arguably the best treat was getting in 13 mile runs to lovely, hilly trails in the mornings. Run details from Garmin Connect.
Needless to say, I was happy, though I look weird in this photo - might have been a bit dehydrated or something.
And the view of the south bay from the hills was great:
The elevation was not the usual Pittsburgh run profile:
Date Mile 1 Mile 2 Mile 3 Mile 4 Pace HR Pace HR Pace HR Pace HR --------------------------------------------- 9/16/2010 6:48 148 6:55 153 7:02 155 6:59 156 10/21/2010 6:42 149 6:41 152 6:42 152 6:48 153In other words, a month ago I ran 4 miles at a 6:56 average pace with an average heart rate of 153 beats per minute; today, I ran 4 miles at a 6:43 average pace with an average heart rate of 151.5. In fairness, it was cold out today, but that's still a lovely improvement for only a month. The relation to the eating habits is that I've never before been able to sustain the mileage I'm running now. Wish I'd figured out earlier that eating better is good for running. :)
Take a picture of yourself, as you are right now, no changing clothes or makeup or especially photoshop, and post it.
Yeah, it's possible I still look like I'm 14. But the older and creakier I get---and the more suspiciously color-free hairs I notice, ahem---the less I mind that too much. :)
[This post has moved to my new blogspot blog]
I was at the UCSD workshop on Non-Volatile Memories about three weeks ago, and had a surprisingly great time. I say "surprisingly" because I showed up at the reception the first night, realized I didn't know a single person there, and thought "uh-oh." That "uh-oh" turned into "ooh!" the next day -- I learned a surprising amount about the lower levels of contemporary nonvolatile memory technology and met some very cool folks.
Many of the slides from the talks are online (though, as in all things, the hallway conversations were both unrecorded and perhaps as or more useful). But one of the stand-out talks isn't -- Al Borchers talk about Google's experiences with Flash memory. I've jotted down some highlights from the talk that jumped out at me. Caveat: These are filtered through my own interest, and a lot of what really jabbed in my head echos our own experiences with FAWN, and several reinforced things I said in my talk the day before so if something seems odd, it's probably my fault, and not Al's.
Al Borchers is in the platforms group developing system software for Ggogle's server platforms, and has been working on high performance storage devices. Ph.D. in theoretical CS from Minnesota (1996), has been hacking unix and linux device drivers and systems software in industry. Much of the talk he gave involved work with Naveen Upta, Tom Keane, Kyle Nesbit.
Looking at HW devices and how SW could be modified to take advantage of Flash, if necessary: "It has been a rocky experience with flash. We've had difficulties with performance and reliability of devices, and figuring out where we can apply flash in a cost-effective way. Many applications... some obvious some not. Without forcing apps to change too radically.
Application: BigTable [Work from Kyle Nesbit]
Options: Could use flash as a cache, put it on chunkservers, or on bigtable servers. Looking at it most in the chunk servers.
1) Bigger caches almost always better. 16 .. 512 GB cache went from 450 cache misses/second to 130 cache misses/sec. Linear reduction with exponential increase in cache size.
2) CPU overhead
Al then gave some numbers on the relative performance of, e.g., PCI vs SAS vs SATA drives they'd measured. I didn't write all of these down well enough to be confident reposting them. The gist was that access over PCI incurred less CPU overhead than SAS and SATA. NUMA access - when you had to go through a different core - hurt just as much.
CPU use for async/multithread: At high BW, sync multithreaded model uses 2-3x the CPU. They didn't really see that in SATA because it was limited to 31 outstanding requests by NCQ.
Problem 3: Error rates
Q: Which drives did you use?
A: Doesn't matter. Can't say. But all of the devices suffer perf overhead.
Q: Can you comment SLC vs MLC on reliability?
A: Our initial reliability of SLC seemed a little bit better, but we haven't taken them to life and worn them out, but for both we saw a lot of early-life failures.
Q: Comment on pci-express as interface?
A: We like it better, it seems to perform better, lower overhead, ...
.. more about high overhead of going through block layer to get to SSDs at high IOPS.
All in all, it was an excellent talk, and shows that Google has been taking a very serious look at Flash in their datacenters. We're seeing a lot of indicators that Flash is poised -- but not completely ready yet -- to start making huge inroads into the DC.
I just had to, with some embarrassment, leave the following explanation in closing out my service ticket with my (patient and wonderful) ISP, Speakeasy.
The context: My internet service has been bouncing up and down every hour and a half for the last day or so. Very frustrating.
I believe I've diagnosed it, and owe you a thanks for putting up with a customer-caused problem. When on the most recent phone call, I accidentally bumped into the power cable for my NAT box, rebooting the machine, which solved the mystery:
A curtain was brushing against the power cord. Over the last few months, the constant bumping had worked the plug loose to the point where it was just barely in. When the furnace would turn on, about every hour and a half, the forced air would cause the curtain to move -- which would glitch the power to the NAT box. After about 20 minutes, the house would be warm again, and the furnace would shut off... only to repeat an hour and a half later.
Thanks for your patience and help with this one. I'm going to go bonk the owner of said NAT box on the head for not looking at the uptime on his machine before crying wolf. I believe you can close the ticket."