Dan Dot Blog

Based on a true story

Modeling Reality

I had started this long post on data, open source development, and greater availability of public records, but it’s sounding like I usually do when trying to forcibly relate 10 different themes I have playing through my brain. I’m sure none of you want to hear my sophomoricly newspaperman pitch of a subject that, while terribly exciting to me, I can hardly claim any authority on. So instead I’m going to start with my experiences, and a challenge I’m having right now between a life of exploratory vs confirmatory science.

My whole life I have really enjoyed structure. When I’m learning something new, I like to try to shake out the underlying concepts then stretch them to their breaking points in an attempt to better understand the deep structure that governs the thing I’m learning about. This approach requires a good combination of both deduction and empiricism. I start with a reasonable postulate handed down by an authority, draw a timid analogy to some perhaps related spec of information, then take one end of my new meme and run as fast as I can in one direction, testing its elasticity and predictive power until it snaps and I am whiplashed back towards my starting point, whereupon I pick up a new rubber idea and run out in a new direction. This approach relies on reasoning and flashes of insight to get anywhere. I love explaining things, even when I’m terribly unsure. Ask me anything and I’ll probably give you an answer, even if my knowledge on the subject is extraordinarily limited. Instead, I treat such inquiries as invitations to ponder, to explore mental space, to simulate, make wild predictions, and arrive at some semblance of truth that is scarcely plausible.

This is what excited and frustrated me about psychology for so long. Whenever preparing talks or presentations, I would be slapped down for speculating about my results, for drawing deep inferences in a shallow pool of data. For me to be excited about my data, it had to make sense, even if the story that explained it was far-fetched or otherwise ill-supported. These games helped me to organize my knowledge in useful ways. Sure, I was gaining relatively little truth, but my new thoughts weren’t pure noise. Indeed, underneath the noise of the signal were kernels of truth, and indeed the noise itself told me something about my thinking. I found that this approach really behooved learning new knowledge. Even tenuous connections to far-flung data give more attachment points, keeping the idea from floating out of my head like a hot air balloon. It seemed that connections were what mattered, not whether they made sense. That would all get sorted out in time.

Then I started playing with and thinking about computational statistics. As computational power increases, our time saving truth-seeking heuristics are obviated. The deductive forces that guide and ration our precious mental resources are more harmful than helpful. Reading a lot of science fiction about a computational singularity, it seemed like our old ways of reasoning made less and less sense. Random walks through understanding started to be a lot more appealing. All of the cleverness, innovation, and gestalt insight that went into their optimization was made obsolete with the promise of the ability to brute force anything.

In my computational statistics class, we learned about Monte Carlo integration. Wikipedia can probably do a better job of explaining it than me, but I’m going to try. If you want to skip my attempt at explanation, I’ll blockquote it so you can just jump over it.

The idea is really neat and beautifully inverts the historic relationship between statistics and mathematics (especially probability). In statistics, when you want to predict the behavior of a random variable, you magically arrive at some of its fundamental properties, namely its probability density function (pdf). From that, you simply find the integral of its distribution multiplied by the function of the random variable you’re interested in the behavior of [take f(x)=x for an easy example], and voila, you have the expectation of that function of a random variable. This all works great on a chalk board, but in reality, integration is hard, even to approximate, when dealing with functions (typically from the pdf) that are at all unusual, and in practice, pdf’s of even vanilla random variables, like a standard normal random variable, can be unfathomably complicated (see below).

It’s enough to make you grimace. But why not turn that frown upside-down and the whole process while you’re at it. Let’s say you start with a really hard integral. What if you could rewrite it as the function of a random variable multiplied by the pdf of that random variable? Well, then its integral would be equal to the expectation of the random variable, which you can’t really know unless your approach has resulted in a boring random variable (in which case somebody else has probably already done everything interesting there is to do with it). But you can take a guess at its expectation. Let’s take another simple example. Imagine you want to integrate something unpleasant that easily falls apart as f(x) times that pdf that I put above. All you’d need to do is get some observations of a standard normal variable (you can get close if you just ask everybody around you their height then standardize the results), apply whatever function you wanted to each data point, then find the mean, which is not a terribly bad estimator of the expectation of that variable (and is typically the best “unbiased” estimator of the expectation of your random variable).

If you’re still reading and care what an unbiased estimator is, it’s simply an estimator where you expect to get the expectation. Crazy, huh? Some times, the best estimator is a biased one. Imagine you have some fair dice. They’re normal dice, except I’ve written “one billion” on the side where the one should go. Imagine you only get two rolls, and you have to figure out the average outcome of the dice rolls. You could do your two rolls, take the average, and call it a day. But what if in your two rolls you just so happened to not turn up “one billion.” Your estimate is going to be way off! What if on the other hand, you just decided, without even rolling, that your estimate will be “one billion.” Doesn’t seem very empirical, indeed, it seems like you’re biased before you even started, but, except for on “the price is right,” the second version of you is probably going to be closer most of the time.

Qualms about true randomness aside, it’s not that hard to generate observations of a random variable. You can do it in Excel. Sure, the actual algorithm that produces it may not be perfectly random, but it’s pretty damn close. In order to get your initially difficult integral into something manageable, you may have had to make your function something ridiculous to leave a lame-ass PDF. But since you’re already fudging things anyway, why not just fudge the generation of the random variable? Maybe the PDF corresponds to a random variable you can’t do a reasonable job of simulating, so instead, you just simulate a random variable you do particularly enjoy, then do adjustments for how close what you generated is to what is reasonable under the original PDF. If your random variable of choice is not a good approximation of the real random variable, most of your observations are not going to be worth much. But…………………if you can get so so so so many observations that it doesn’t matter, you don’t have to spend much time being clever with choosing a good random variable.

Still with me? The moral of the above story is that the problems with Monte Carlo techniques that historically have been solved with cleverness can now be solved with brute force. If my computer is strong enough, I can bend math to its will. I can simulate anything. Let’s have a quick thought experiment. Give me the following:

  1. An immortal human. They can be mutable, but they need to be able to perform the below described task every 5 minutes for ever and ever.
  2. Infinite computing power
  3. A pen and paper

Every 5 minutes, the person pauses for a second, thinks up a 15 digit random number, and writes it down. This all gets fed into a computer. Let’s take a super inter-connected view of human cognition and assert that every fiber of your being affects every thing you do. Ergo, the numbers you choose are a reflection of exactly who you are at that point in time. However, there is likely another person, who is not the same as you, who, at some given point in time, might generate that same number. So, it’s not exactly one-to-one, is it? However, if you keep giving little flashes into who you are in the form of these numbers, you are going to create an infinitely complex pattern that truly is one to one. There is only one YOU that would generate this extremely long string of 15 digit numbers. Let’s be super unambitious and super unclever and try to come up with the algorithm that you’re fundamentally using to generate numbers. Let’s try to write your brain in visual basic, taking the chimpanzees on a type-writer approach.

Let a computer randomly write code, run it, and see what it gets. If it matches what you have produced so far, it’s pretty close. Once it starts predicting what you’re going to do in the future, it’s even closer. Sure, it’s going to take it a lot of tries to get it right, but don’t forget that you gave me infinite computing power in number 2. Let’s get more meta and let it also develop algorithms to evaluate if it’s getting closer or farther from being right rather than just stumbling around in the darkness. Let’s step back even further and let it write algorithms that evaluate those algorithms, AD NAUSEUM!

All of a sudden, it’s theoretically possible that it’s going to model not just your number generating algorithm, but you, and how is a perfect model of you any different than the real you? Let’s blow our minds even further and say that it can model the whole damn universe that led to your being created and agreeing to the stupid rules of this stupid thought experiment.


Now, while I may be able to find number 3, I’m not likely to come across 1 or 2 anytime soon, but it’s kind of creepy when pushed to the limits. All of a sudden, data is so much more important than any sort of clever insights we may make into it. I was initially terrified by the idea of this self-organizing computer, getting smarter at making itself smarter and running simulations of anything conceivable. I smirked to myself at presentations I attended while people tried to explain data using kitschy, home-brewed theories. Even perfectly reasonable ideas started to seem shaky. Why should people die when deprived of oxygen? That’s a handy notion, but there’s a far more complicated structure at play underneath, something whose structure is unfathomably complex and beyond our articulation. This reared its ugly head even moreso in psychology, where we do factor analysis and then come up with cute names for scales based on how the items feel like they hang together. Sure before we trust somebody to do this they have to spend years wading in the literature and learning what their predecessors have thought, but isn’t it all just alchemy as people simplify beautifully complex structure into feel good aphorisms that can be explained in a few sentences?

I was really bummed about the capacity of the human brain. Our little notions of the world were handy for keeping us alive, but ultimately didn’t even begin to scratch the surface of reality, but then, while walking about glumly, trying to wrestle with this problem semantically and deductively (which is kind of ironic I suppose) I came to some peace.

Our articulated knowledge is an attempt to express this more complex structure in some simplified rule, but our behavior doesn’t always follow our declarative knowledge. When you ask me to explain why something works the way it does, I’m going to give you the best estimator I can lazily produce, but it may be a pretty biased one. But ask me to bet money on what number is going to come up on the die, and suddenly I’m playing a more complex game.

This is something that goldfish can do but humans struggle with. If we’re flipping a coin, and I tell you I’ll give you a dollar every time you’re right, and take a dollar every time you’re wrong, even if I tell you that it is an unfair coin that is manufactured to come up heads 55% of the time and tails 45% of the time, you’re probably not going to adopt the best strategy, which would be to just trust me and call it heads every time. You may be able to explain to me the mathematical proof that argues for you behaving like that, but in your actions you refuse to believe that the system is that simple. You’re gathering additional data like the position of my hand, wind, speed, rotational intertia, and trying to somehow build this much more complicated model of how the coin really behaves. Because nothing is really that simple. Coins don’t follow the rules that we set for them. We don’t do the “right” thing in every situation, because in reality, that is unknowable! Sure, our strategy is sub-optimal if the rules really are that simple, but they aren’t. Our brains are doing all the self-organization, and meta organization that I feared computers could do. We’ve developed all of these meta meta meta algorithms that govern how we develop new algorithms, with clever little heuristics and razor, nifty principles like parsimony, not because they’re true, but because they (probably) guide us towards building a better internal model of the universe.

Because at the end of the day, that’s all we’re doing our whole lives. We’re taking in sensory data and trying to make sense of it, trying to create some sort of internal representation of the universe. I’m sure some law of thermodynamics or the uncertainty principle or some other vaguely invoked rule of physics would argue that something inside of a system can’t possibly make total sense of that system, but we can sure try. So I’ll keep telling my far-fetched stories about why things are the way they are but with the added wisdom that while I’m probably not right, that doesn’t mean the lies are without value.

Editor’s Note: This is really long and ungainly and I’m super impressed if you made it this far. After writing it I’m not even willing to reread it, at least not immediately, so rather than sit on this post like I usually do, I’m just going to truck it out, in all its ugliness, and pick at it and clean it up and spring board off of it in the future.


February 16, 2010 Posted by | Academic, Personal | , , , , , | 3 Comments

Twitter and Blogging

Twitter fits the way I think better than blogging. It lets me express my usually hyperventilating mind, each gasping thought shallow, rapid, and impermanent. It’s a real challenge to write a cohesive piece, but I think blogging may train me up to think in a different way than I naturally do.

The problem I have is that I’ll have an idea, start a blog entry about it, then either get distracted or have to go do something, but I carry on the conversation with myself that I started in the blog. I reach all kinds of resolution and gain insight, and by the time I go to write it up, I’m stuck with just the result rather than the journey, which is really the more interesting piece. I wish there was a way I could capture my subvocalizations so that I could actually log my thought process. I’m doing my best, but sometimes blogging feels like rewatching a movie with a tricky ending. I already know where it’s going, and it’s hard to take myself along for the ride again without tipping my hand as to the result.

Below is an excerpt of my “Buzz” conversation regarding my struggles to find a place for my thoughts to go. Does anybody else have a hard time figuring out what to say and where?

Feb 12 Daniel Kessler: Dammit new social media plan: Tumblr is just a big twitter, wordpress is a blog
Feb 12 Albert Yao: keep it simple, do everything within google!
Feb 12 Chris Love: Hey Daniel, do you really think Tumblr is just a big Twitter? I think the difference for Tumblr is that it’s a community formed on the basis of interests more than previous friendships and acquaintances (though as you know I’m now friends with Mills, Peter Santiago and a few other tumblers). What do you think?
Feb 12 Chris Love: But I don’t see any reason to continue with Facebook and Twitter though
Feb 12 Daniel Kessler: The real struggle I’m having right now is what to do with ideas. Chris, if you remember my writing style, my thinking style is quite similar. My thoughts are usually staccato bursts that don’t readily self-cogitate and unfurl. Twitter is excellent for this as it fits the way I already think.

In my continuing struggle to train myself to be a more meditative, fluid thinker, I’m trying to practice better blogging. So often I end up with 20 half-written drafts that were very interesting when started, but that I lose interest in over time.

For me it’s less about the community than it is about finding a repository for my thoughts. I think that you and your Tumblr-circle are all better trained thinkers than me, so you do an excellent job of having semi free-form, semi structured conversations. I’m still just trying to find a receptacle for my thoughts, and I want one storage device that fits what my brain spits out, and another that forces me to stretch and grow.

For me, WordPress or other blogging platforms are great for longer meditations and sharing, but don’t necessarily invite commentary since they can be intimidating and lengthy. Tumblr seems like a great place for reposting material that others have drafted, expanding on it, then kicking it out to the Tumblsphere for further criticism. I just need to figure out where it fits in my continuum of creative outlets.
Feb 12 Chris Love: Daniel, I think your entire post here belies your estimation of your own powers of thinking and expression. I too am trying to find a way to figure out what forms of web communication fit my time, moods, modes and methods best, but aiming at constantly morphing and moving targets seems to make this more a process of error than trial.

I think you’re right about Tumblr: it’s great as a sketchbook for one’s burgeoning thoughts and ideas, but it’s also a fantastic source for cogent bursts of information about political and cultural events. For example, Sea of Green’s live-blogging of the demonstrations in Iran is far more informative and useful than anything coming out of the mainstream press right now.

What do you think makes Twitter more useful (user-base aside) than Buzz? Man I’m confused these days.
Feb 12 Daniel Kessler: I appreciate your reassurances and take comfort that I’m not the only one struggling with that.

I’ve enjoyed Sea of Green’s stuff and will probably subscribe to it in Google Reader (thanks for so often reblogging, it’s kept me more up to date on Iran than I have been in a while).

The reason I’m still in Twitter is fairly simple and unexciting. I understand Twitter’s API well enough to piggy back on publicly available scripts, and there are enough Twitter “bots” listening to me that I can do all sorts of neat commands from a launcher app I run on my computers. If I have something I need to do, I can tweet a direct message to the “ToodleDo” bot which will make sure it gets added to my to do list.

For now, Buzz is a place where I’m happy to concatenate my stuff, and comment on it, but I’m not yet that interested in directly putting content here.

To be honest, Buzz is pushing me more towards wordpress because of its integration. If I could get content from Tumblr into buzz easily, I’d totally use that more. I just want a place that concatenates all of my activity so that it’s easy to see what I’ve been up to and thinking.

February 16, 2010 Posted by | Personal | , , , , , | Leave a comment

Content Concatenation

One of the more exciting things, I think, to come out of social media lately is the ability to share and have seamlessly threaded discussions on a variety of issues. Facebook does this fairly well within its own domain, but with some limitations (that I think they’re working on fixing). Most significantly they’ve eliminated geographic constraints in having real time conversations. It’s sort of what I think chat rooms were always supposed to be, but by giving conversations context it cuts down on the cacphony. However, most of the discussions are, as in real life, sparked by the minutia of our daily existence: they are commentaries on birthdays, new pets, photos of silly hats, etc. That’s not to say these experiences aren’t important, which is why I still spend a lot of time on facebook, fulfilling these social needs. For now, facebook does a good job of alerting us as to where these conversations are happening and keeping us in the loop, but it doesn’t generate an easily searchable (YET!) record of our conversations, which for the most part is fine; how often do we really need to recall one of these comments.

In real life, so many potentially good conversations begin with, “I was reading a piece in…” or “Did you see…” and unfortunately the answer is often no and the conversation enters a period of retelling rather than analysis. Facebook is starting to fill this gap, too, which “Share on Facebook” bookmarklets and the ever-improving News Feed, but it’s still a little clunky. I do most of my reading via RSS using Google Reader (which has both really hurt and really helped my productivity), and the recent abilities to share and comment on things from there is really fantastic. However, it has nothing like the audience that Facebook does, and when I read neat stuff, I want to flag it as such and be able to have discussions on the common content with friends.

Unfortunately, right now only one other friend actually uses Google Reader, and I don’t know how many of my friends are motivated enough to snag my RSS feed (here it is if you are link, RSS recommended), and even if they do, they need to be in Google Reader to participate in the discussion. Google Reader and Facebook were not playing nicely recently, so that, in effect, sharing an item on Google Reader basically meant sharing it with Chris, which made me a bit more self-conscious when sharing articles.

I’ve seen people really have fun with Tumblr, which seems like a happy medium between micro-blogging and the kind of full-fledged stuff that I do here, but it has NO native commenting system (though that can be brought in through disqus) but still does a seemingly poor job of at least identifying trends for posted items in the way that Google Reader does (if I “share” something there I can, I believe, see discussions happening on it from a multitude of such sharers). But, Tumblr plays nice with Facebook, so whenever I saw something I liked in Google Reader, I snagged it, put it in Tumblr, and tried to say something to justify my having gone to the trouble of posting it.

It seems that suddenly Google Reader is playing nicely in a somewhat passive way with Facebook, which is fine with me. For now, if there’s something quick I want to bump up to Facebook, I’ll probably grab it in Tumblr. While I do think snagging it in Facebook would be fine, too, it’s nice to have a clean, outside source of my interesting quotes, and it’s better eye candy in the facebook feed. For now, I think I’ll let Facebook handle commenting on stories like that.

I’ll continue to use Google Reader whenever I come across something neat while reading RSS, or if I happen upon a more full length/less-sexy NY Times article I like or something like that.

I hope that I’m not just creating uses for the various social media stuff that I’m doing. It’s just getting tricky with so many services and so many different users. Ultimately the goal is the same. I want to share interesting things I come across on the web while allowing discussion on them, while grabbing the same things from others. For now I hopefully have it.

The upside of all this is that I’ve realized just how much terrific media we live in an age of. To be honest, this particular post is so meta most people probably won’t make it here, to the end, because there’s much better stuff out there. It’s kind of neat to see a consumption mindset applied to ideas.

Editor’s Note: This post is part of a new push for me to, more or less, rush content out the door. I have way too many half-written posts and drafts sitting in my draft box, so I’m hoping that just getting stuff out will be fine even if it comes at the cost of polish, but I’m curious to hear your thoughts. Feel free to comment using whatver media you do to read this! 😛

August 12, 2009 Posted by | Personal | , , | 2 Comments