Home

Previous 20

Jul. 16th, 2008

Home-brewed cluster rack

Our FAWN (Fast Array of Wimpy Nodes) cluster is coming together. We've purchased 20 alix3c2 embedded boards, and have them netbooting and racked in our new home-brewed cluster rack:



The alix3c2s are larger than we think the nodes would be if we built them to purpose, but they're a great way to prototype the cluster. We've also found that you can power two alixes per power supply, and operating the power supplies at a higher load makes them run more efficiently, saving about half a watt per node.

(The goal of FAWN is to build a highly power-efficient cluster for doing key-value lookups, of the sort that Amazon and Facebook do when serving catalog requests -- fetch a picture of a book, an entry in someone's facebook page, etc.)

Jul. 11th, 2008

All vacation photos online

Erm - we took a lot of photos. Now online, courtesy of newly purchased space at Google:

And a new look for the blog, along with more prominent links to the RSS and Atom feeds.

Jul. 10th, 2008

Arvind Krishnamurthy talk: Incentives for P2P Systems

I hosted Arvind Krishnamurthy for an SDI seminar talk today. Some notes from his very fun talk, which is based on two of his recent research projects, BitTyrant and One-Hop Reputations for P2P [pdf link].

  • 20% of users use P2P systems. Enormously popular; 50-85% of Internet traffic.
  • Youtube: 1,000 TB of data per day. BitTorrent: 10,000 TB of data per day.
  • Prior P2P systems had very poor incentives: 1% of Gnutella users satisfied 50% of queries, and most Gnutella users didn't share anything.
Enter BitTorrent...
  • Tit-for-tat: Send data to the top k people who send you data. Split your upload capacity equally between these k peers.
  • Each round, pick one other peer to send data to "optimistically"
  • Trust is pairwise, which is robust: Peers trust only what they see directly.

Observation: This system results in considerable amounts of altruistic uploads by fast peers. If a peer has a lot of upload capacity, it splits all of its capacity among the top-k peers. If those peers are relatively slow, it may upload much more than it's able to download.

  • Low-capacity peers are altruistic: They're not fast enough to convince other people to upload to them, so they mostly make altruistic uploads.
  • Many high-capacity peers are matched with sets of lower-capacity peers, and so make altrustic uploads.
BitTyrant:
  1. Determine the exact (minimum) rate that you have to upload to a peer in order to convince it to upload back to you.
  2. Figure out return on investment: How much they upload to you vs. how much you have to upload to them.
  3. Upload to the peers that have the best return on investment.
Result: 25% of the downloads go at least twice as fast.
  • Also uses less upload bandwidth to achieve same or faster download rate.
  • If everyone uses BitTyrant: Overall performance is better, too.
  • But: If BitTyrant stops uploading at point of diminishing returns (doesn't upload more than it must), system-wide throughput is lower. This behavior is closer to block-based tit-for-tat (you give me one chunk, I give you one). BitTorrent by default is like progressive taxation: High-capacity people upload disproportionately more.

Overall BitTorrent system performance is poor:

  • 74% of swarms provide less than 50KB/sec.
  • 100-fold increase in upload contribution provides only a 2.7x improvement.
  • Problems: Lack of incentive to upload lots of data; lack of incentive to stick around after you finish downloading.

One-hop reputations can get most incentive bang for buck, in a way that is still scalable and decentralized:

  • Most popular 2,000 peers interact with 97% of all peers.
  • Use these popular peers as trust intermediates. Each peer remembers how much excess other peers sent to it. If A and B both send data to C, then each accrues a "credit" at C. Later, A can "spend" that credit at C in order to download from B, or vise-versa.
  • Robust to cheating intermediaries as long as some are honest. (handwavy, pointed to details in paper.)

Jul. 5th, 2008

Change of scenery


(Lava hitting the ocean at Volcano National Park)


Evening settling in on the Na Pali Coast, Kaua'i.

Jun. 27th, 2008

Rae Lakes loop 2008 - some initial photos

+ + = :)

Jun. 22nd, 2008

ACM awards ceremony; off on vacation

I just got back from the ACM awards ceremony. I was there because Bindu was one of the finalists for the student research competition for her work on Similarity-Enhanced Transfer. Awesomely, she received first place. Needless to say, I'm happy as a clam for her, and very glad that Michael and I had the chance to advise her on this research.

The awards ceremony was lots of things--very cool, a bit too long, and a nice reminder of how cool computer science can be. In that regard, it was really energizing; I left thinking "damn! gotta work harder!" A few that really jumped out:

  • Daphne Koller received a cool $150,000 with the Infosys Foundation award for her work on machine learning techniques.
  • Vern Paxson received the Grace Murray Hopper Award for his work in the late '90s that set the standards for Internet measurement; a lot of my own subsequent Internet measurement work was directly inspired by Vern's techniques and insistence on taking a careful and scientific approach to network measurement.
  • Randy Wang was recognized for starting the Digital Study Hall Project at MSR India, which is trying to revolutionize teaching materials in the developing world. Interestingly, it's been a great year for MSR India -- the gossip vine has it that Bill Thies, one of the hot job market candidates this year, turned down both Berkeley and Stanford to go work there, presumably to focus on his interests in creating technology for the developing world.(Note: This really is gossip; I haven't heard it from any party involved, so don't quote me!)

Finally, it was almost an embarassment of riches for Carnegie Mellon at the ceremony this year, which I've got to admit was really cool, if perhaps mildly awe-inspiring. It makes you remember that those really nice, unassuming folks I bump into every day in the hallway and in faculty meetings are also, if you'll forgive the phrasing, bomb-ass, famous researchers. Ed Clarke, of course, received the Turing Award; Randy Pausch received the outsanding educator award, Bindu's first place finish was mentioned above, two of the three dissertation award honorable mentions were to CMU people, and Avrim Blum, Randy Pausch, and Donald E. Thomas were all inducted as fellows.

And on that note, I leave for two weeks in the wilderness. A few days hiking the Rae Lakes Loop in the Sierra, and a week in Hawaii (hiking the Na Pali Coast on Kauai, then flying to the Big Island for adventures TBD).

May. 3rd, 2008

The joy of Ruby's Hpricot

Hpricot is a frighteningly useful HTML parser. USENIX requires HTML versions of papers at their conferences, so we're creating versions of Dan's Perspectives and Bindu's Dsync papers. Amar and Vijay started an automated framework based on the Hevea LaTeX-to-HTML translator a while ago, which we've now improved further.

(For some reason, automated TeX-to-HTML translators mostly suck, particularly if you want clean, XHTML-strict output. Hevea appears to be the best of the batch, but its output is still pretty awful.)

Enter Hpricot. Our cleanup.rb script uses it to automatically de-gunk most of the Hevea output. It's awesome for manipulating most any HTML, though. You can write things like:

doc = Hpricot(open("index.html"))
# Nuke credits and comments
doc.search("//meta[@name='GENERATOR']").remove
doc.search("//comment()").remove

and more complicated things, like removing extraneous "font" tags from the bibliography list:

doc.search("//dl[@class='refs']/dt/font").each { |f|
    f.parent.inner_html = f.inner_html
}

(That reads "for every font tag that appears inside a DT item that is inside a DL of class refs, replace the font tag and everything it contains with the stuff it contains.)

The downside is that since I can make Ruby about as functional as I want (minus the static typechecking. argh), and it has awesome libraries like Hpricot, it makes it a little harder to make myself do more serious work in Ocaml...

May. 1st, 2008

Petascale cosmology talk for systems researchers

Rupert Croft from CMU's physics department gave a talk to the SDI seminar today about petascale computational cosmology, targeted to computer scientists. Turns out they have some nifty problems.

Overview: Cosmology: The study of the universe as a whole: beginning, evolution, fate, contents

  • is it expanding, etc? A few #s.
  • contents: 10^80 atoms in observable universe (+other stuff)

Why does this matter? The only way we know that most of the universe is not made up of "normal stuff". Atoms = only 4% of matter/energy density of universe. Rest = dark matter + dark energy.

Facilities:

Data will be made public immediately when taken. Google Sky will be one interface to the data.

  • 6.5 petabytes of data per year
  • Bandwidth: Telescope to base - 10Gbit/sec. Base to archive: 2.5Gbits/sec.

Algorithms:

  • Stacking 10 years of images (one per three days) on top of each other to get higher precision images. Align, integrate, compensate for telescope motion, moving objects, etc.
  • Gravitational lensing: Produces small correlations in galaxy shapes. O(N^2) problem. N is very, very large.
  • Near-realtime transient event alerts - e.g., spotting supernovae between subsequent measurements (supernovae can remain bright for days or weeks, gamma-ray bursts last for just a few seconds, etc. Need gamma-ray telescope that can slew in a few seconds. (!) Custom satellites for this...)
  • Massive-scale simulations about the evolution of the universe: start from initial conditions, and then simulate physics from there. Gravity, gas dynamics, radiative processes, etc. One really nice gravity simulation that they ran for several weeks on Big Ben, a 2068-node Cray XT3 at the Pittsburgh Supercomputer Center

Challenges:

  • Load balancing
  • Some data structures mirrored by each thread.
  • Exploitation of as much parallelism as possible: scaling by running on faster procs and on more procs.
  • Data management
  • Visualization

Apr. 30th, 2008

Two new friends for my office


(Yes, they're squishables. I saw some at Rob and Tomas's place, and it was love at first... er, squish.)

Apr. 27th, 2008

Playing with Camels

Or, ocaml, as the case may be. A colleague at NSDI was assuring me that ocaml is actually a great language for systems programming and that I should look seriously at it for some datapository stuff. So as the equivalent of "hello world", I figured I'd play with it enough to understand how to print the lines of a file in reverse order. Keep in mind that I've never really used ml or ocaml, so an expert will probably cringe.

Ruby

puts IO.readlines("grades.txt").reverse

Ocaml

open ExtLib;;
List.iter (print_endline) (List.rev (input_list (open_in "grades.txt")))

Okay, that's not too bad. Ruby provides a bit more shortcutting (automatic conversion of lists to strings and the 'readlines' function), but they're basically the same, modulo the functional vs. object order of doing things. Oddly, the Ruby one is about twice as fast as the ocamlopt-compiled one, so I suspect I'm not doing things in The Right Way yet.

Apr. 16th, 2008

Wooo! Al Gore speaking at CMU's commencement

Full article. I may be in D.C. that weekend, but if I'm not, I'm going to break my habit of only attending the school of computer science commencement to go see him talk. How cool is that?

NSDI quick notes

I'm at one of my favorite conferences, NSDI (network systems design and implementation) for the next few days. So far, the best papers were announced:

Keynote talk: Ian Pratt

and Ian Pratt is starting his keynote looking back at the Xen virtual machine environment. Choice quote so far, talking about his newfound respect for people who produce real software:

I guess as an academic you think you're smarter than everybody else and you think that people in industry are doing the easy things, but it's not true - building stuff that works is really hard.

Pratt mentioned that in May 2008, HP and Dell will be shipping servers with Xen embedded in Flash on the machines. Cool.

Some nice words on the highly-publicized "just install a hypervisor underneath someone and you can 0wn them without their knowing it" attacks that some people speculated about last year: "[The] 'Hidden hypervisor' attack is a myth, but exploitation of an installed hypervisor is a real and dangerous threat" (just like your OS getting hacked is a real threat). He extolled the virtues of OCaml for systems programming - I (dga) found this interesting because I've been spending some time thinking about which more-functional languages to use for some upcoming projects. I find OCaml's syntax mildly ghastly, but I suppose it might be one of those things you get used to.

Questions to Pratt:

Ken Birman got up and asked about "the tension between adding features to Xen - and that concentrating all these resources on a single platform that it becomes a very attractive target [for hackets]. ... Is there a finite set of features that are 'enough', or are we already seeing features running away and the security lagging?" A: yadda yadda small interfaces, small code, ... K: But you told us about [huge list of complex cool features]; I'm seeing the features pulling. A: The good thing is that most of the features you listed are outside of the core hypervisor. [So if you break them, you probably just break into another VM.]

Gun Sirer: I'd like to remind you that virtualization is not the sole domain of hypervisors; every OS tries to provide virtualization. The difference is the interface - hardware, POSIX, etc. So what has made us unable to provide virtualization at the POSIX level, and why should it be possible if we go to hardware? A: [POSIX] is so broad and high-level and you've got a lot of stuff you need to do; lots of opportunity for bugs, temporal cross-talk, lots of shared resources. Now you can take an approach like the Nemesis OS did with POSIX as a library, but you end up with a very different OS design to what you have today.

Some Notes on Papers

Concrete evidence of in-flight malicious web page modification [project page]. The UW folks provide evidence of in-flight modification by advertising ISPs and by worms doing ARP poisoning, among others. The results are particularly interesting to me, because many of these could be solved by our Perspectives system that uses multiple "views" of a web site or SSH server to increase the chance that you get to the right place. (Caveat: Right now, we've only built Perspectives to work for SSL and SSH, not insecure web pages, but this paper suggests that there's benefit in going beyond just public key checking!)

Read more... )

Mar. 18th, 2008

Incredibly impressive Obama speech

Obama's March 18 speech on race is one of the most impressive speeches I've heard. Regardless of political leaning, it's worth listening to in its entirety. It's almost enough to make you hope.

Mar. 12th, 2008

No Pi update this year, methinks


Small image of a hoodie sweatshirt with Pi on it, written with the numbers of pi
Pi day is Friday (rhyme unintended). I think I may break with my tradition of Pi-day updates to the Pi Searcher; Friday is also a grant deadline and the mobicom submission deadline. Unless anyone has a neat, quick idea for it...




But happy upcoming Pi day to everyone!

Mar. 6th, 2008

Notes from Werner Vogels' talk at CMU

Amazon CTO Werner Vogels gave a fun talk at CMU today. Some snippets:


Vogels: How many of you are amazon customers?
All hands in the room go up.

Screen fades to screen shot from a fashion website; amazon industrial; amazon groceries

Vogels points out that you can subscribe to toilet paper on a monthly basis, and people are extremely passionate about it. Shows comments, "I like it because it's good for the environment, but one side of it is really rough."

More notes from the talk )

Mar. 4th, 2008

Stirling CPU cooler: Cool thinking

From Tweaktown: MSI is developing a Stirling engine-based cooling fan that uses the heat from the device it's cooling (in the photo below, the computer's northbridge chipset) to power the fan. The hotter the component, the faster the fan blows - and it doesn't take any energy to power the fan. It's not going to single-handedly reduce your computer's power consumption by 50W, but it's great thinking: using the computer's waste heat for cooling instead of using more power when things get hot.

(Image from Tweaktown.)

Feb. 18th, 2008

I wish Pittsburgh was in this list

Popular Science's 50 greenest cities in America list.

It's not actually that far fetched to imagine - Pittsburgh has come a long way since its fairly gruesomely polluted industrial past. There are people here who don't drive; the bus system is reasonable, if not up to Boston/NY/SF standards. The city has a surprising number of LEED-certified buildings (CMU is doing almost entirely LEED; our new computer science building is LEED silver (and maybe even gold, if we get lucky). The city's one of the top three for LEED buildings. On the flip side, too many people drive, the downtown "subway" is a joke, and we live in one of the major coal-producing regions of the U.S., so you can guess how clean our major sources of electricity are. Air quality is up considerably, but is still awful - the second worst in the U.S., second only to Los Angeles. (ouch).

Fingers crossed. That new mayor of ours seems to have finally fixed the recycling program (yay! they now take paper in my area), so we may well be moving in the right direction...

Feb. 14th, 2008

Optimal Caffeine Dosing

From Science Blogs: How to Optimally Use Caffeine.
  1. Consume in small, frequent amounts.
  2. Play to your cognitive strengths while wired.
  3. Play to caffeine's strengths.
  4. Know when to stop - and when to start again.
I find the bit about how to use caffeine particularly revealing. In particular: "caffeine only remediates some fatigue-related symptoms." It's not clear that it helps preserve higher-order mental functioning, though it does seem to help speed and recall. I still figure that a reasonable approach is to structure a crazy work day like the medical folks do: Tough stuff first thing, then increasingly rote stuff towards the evening.

Feb. 4th, 2008

CMU Turing awards += 1

In very cool news today, I just spotted on my department head's blog (okay, can I admit here that it's cool to have a department head who keeps a blog? I actually know what's going on these days) that a fellow faculty member at CMU, Ed Clarke, was selected for the ACM Turing Award.

(For the one non-computer-geek who reads this: The Turing award is roughly the equivalent of the nobel for computer scientists--there's no Nobel for CS. It comes with a pile of recognition and a comfortable pile of cash, about $250,000 this year. Prior Turing award winners include some of the only household names from computer science -- Rivest, Shamir, and Adleman, more familiarly known as the initials in the RSA public key algorithm, and Cerf & Kahn, the variously-named parental units of the Internet.)

What I think is particularly cool about the award is the reason it's being given: Ed's receiving the award for his work in formal verification. Basically, he designs ways to automatically check whether hardware and software designs will work properly. The problem is both cool and hard: It's easy to run lots of tests against something and say "well, it doesn't seem to have crashed so far", but how do you know that you've tested everything? Intel has a pretty amazing apparatus for testing their designs, but that didn't prevent the 1994 floating point division bug in the Pentium. Formal verification instead develops mechanisms to prove that a design has some particular property---such as that it always gets the right result when it adds two numbers together.

Formal verification has taken root in hardware designs, but only more recently has it started making (small) inroads into software. It's a tough problem, but as software becomes even more critical (is that possible?) to our lives, I think we'll see a slowly increasing uptake of verification for software. For those of you whose browsers crashed while reading this post, the bad news is that it won't be overnight.

Jan. 20th, 2008

Sleep good for learning, point 500...

Brain Connections Strengthen During Waking Hours, Weaken During Sleep (the "weak" connections enhance learning ability because they're flexible and can adjust in response to new inputs). (Of course, all this is a theory, but if it's correct, it's a good incentive to get enough sleep!)

Mmm, sleep.

Previous 20