My (minimal) criteria is it has to be packaged in Debian.
A couple years ago I considered both MongoDB and CouchDB immature for that reason. The recent confusion of CouchDB/Couchbase etc. shows that was a reasonable view.
I'm not sure when they started, but 10gen now packages and distributes their own MongoDB Debian and Redhat packages. It seems like the right move, so that they're not beholden to the update schedule of the distros.
Yes!!!
The development cycle of software like MongoDB, RabbitMQ and so on, is much faster than that of Debian or any other Linux distro. The Debian package is fine for dabbling or low volume use, but for any serious app, you MUST go direct to the developers and use their latest stable release. That is what they support best on their mailing lists, and that is where most bugs are already fixed.
A lot of software development teams are releasing their own RPM and .DEB Linux binary packages for just that reason, to encourage people to use up to date packages instead of the stale OS distro packages.
In a way, it's rather like security updates. Who would refuse to install security updates because it's not part of the Ubuntu 10.4 LTS release? Almost nobody even thinks of doing that. So why would you use old obsolete releases of mission critical software?
> why would you use old obsolete releases of mission critical software?
"If it ain't broke, don't fix it"
Because it's mission critical, and you can't afford for it to break. Once you hit a certain complexity, upgrades almost always break something:
APIs change. Undefined behavior changes. New bugs are introduced. A feature critical to your app starts performing worse. The above changes break something else you depend on (libraries, proxies, etc.)
Upgrading to a significantly changed version of a mission-critical app/library/language is a lot of work, and is sometimes impossible: many projects couldn't be reasonably ported to Python 3 if they wanted to; a lot of important libraries don't work on Python 3.
This is exactly why bug and security fixes are often backported into old versions. Python 2.5 is still receiving security patches. Apache 1.3 was maintained for years after 2.0 was considered stable.
Yes, especially with a database you can't just pull in new updates and expect them to work. It involves reading the release notes and doing a lot of testing. (much of it is automated)
Equally APIs are backwards compatible, behaviour is defined, bugs are fixed, performance is improved and software becomes more compatible with other software.
I have a feeling that Javascript based NoSQL backed web development is going to be just fine in two years. Quite possibly the standard stack for a large percentage of projects.
I have the opposite feeling. I understand very well the reasons some are pushed away from relational data and data normalization, and set theory formalism. Some are need for change. Some are fears of over engineering. Some are because the trend is to Twitter-like blobs. A big reason is because doing things right is a pain in the...
And come the only valid reason, from my POV: We have to denormalize and avoid using the powers of relational data sometime because we do not know how to store (read and write) conveniently huge amounts of relational data; yet. Notice the "yet"? Well, I think it is a technical problem, that may not be so far to find its solution.
One example of a solution showing its nose recently: "Google plus with your world". It do strike me that the fact that any query I make to the closest Google server respond /instantly/ to any random word with a join on a very possibly monstruous matrix of all the likes of all my circled users.
I don't know how they do store this, where and how they denormalize, but in any case it seems to me to be just "relational data as usual".
So in two years, if there is a "bigreldata" software allowing to have a Postgres sitting happily on 1000 T of relational data with instant reads and writes, I would certainly use that with a layer of python glue on the server feeding a slim client than a blob datastore with NoSQL handcuffs and a fat client with 20 libs of third party Javascript code.
I may be wrong, however, and would love more insights on this.
Google is not using a RDBMS to get Google Plus content on SERPs. What makes you think they are? It works just like the rest of Google on their leading edge kit: bigtable, GFS, etc. Amazon is able to personalize their site for each customer quite a bit more than Google is and relies on similarly horizontally scalable architecture.
Google talked a little about how personalized search works in a paper about BigTable, it's worth a review:
> Personalized Search stores each user's data in Bigtable. Each user has a unique userid and is assigned a row named by that userid. All user actions are stored in a table. A separate column family is reserved for each type of action (for example, there is a column family that stores all web queries). Each data element uses as its Bigtable timestamp the time at which the corresponding user action occurred. Personalized Search generates user profiles using a MapReduce over Bigtable. These user profiles are used to personalize live search results.
Regardless, even in your scenario with the perfect RDBMS, the future web stack wouldn't change much. You still have the same issues with blocking and different languages for client and server. As a developer myself, it doesn't matter at all to me if my call to a method is backed by a relational, document or key/value database. It's all an abstraction somewhere. It just needs to come back quickly and be easy to scale up.
The big change we're seeing is the client becoming primarily JS driven and the server more or less relegated to sending/receiving JSON. It's a much richer experience, but a pain when the toolsets on either end are completely different.
This is how I read it. He was saying it's alright to go with bleeding-edge technologies, even if they're a little shaky now. As long as they have a good dev community behind them, they'll get better.
Not to pick on him, because I think the overall received wisdom on new tech has shifted and morphed a great deal over the last few years, no doubt including Joel's...but I can't resist pointing out that, my how times have changed...
In other words, it's nice to see Joel greenlight something like this, I'd say it's kind of a sign of the times in terms of the industry's overall comfort level with what would be termed 'hip' technology, or 'new' or whatever moniker you want to attach to it.
"The Socket.io server currently has some problems with scaling up to more than 10K simultaneous client connections when using multiple processes and the Redis store, and the client has some issues that can cause it to open multiple connections to the same server, or not know that its connection has been severed."
I wonder if they ran into redis's hard-coded 10k connection upper-limit. As it turns out, their configuration for "unlimited" connections actually has a cap of 10k. I believe in master this is going away, but if you need more than 10k connections on redis <= 2.4, you need to manually patch the daemon, in case anyone else runs into this.
How vulnerable are all javascript approaches in terms of injection type attacks? Do apps like node and mongo effectively prevent them, or is it still possible to shoot yourself in the foot? I've read some off hand comments along the lines of "as long as you're not a total moron you have nothing to worry about", but that sounds a lot like what was said about sql injections and xss before exploiting them went mainstream and it turned out everyones apps were filled with them. Has anyone audited a real world app built with a stack like this and come away with any experience to share?
You CAN do injections to MongoDB code. An injection is basically 'allow user input to interfere with code' so for mongoDB assuming the query is a string you can do '{name: ' + user_input + '}'. and user_input, without sanitizing it (which is simpler, just converting it to a string) could be: {'$where': ...}
Unlike SQL, which you have to build as a string, the natural approach in JavaScript (even for junior devs) is to use an object literal to build the query. And then you get escaping for free.
Also note stuff like https://news.ycombinator.com/item?id=3419693: node.js is not free of issues itself. (To be fair, lots of people got that wrong and it was patched quickly.)
What would be the difference in how you would handle this in a single-page JS app like Trello versus a traditional Java/Python/Ruby-based multi-page backend that mostly serves HTML pages?
It seems like at the end of the day you still need to validate and sanitize user input before doing anything with it.
I was inquiring more about the backend js use in node and mongo.
You might, for instance, find that simply sanitizing user input isn't enough when you're using the same interpreted language at multiple stages. If an attacker could cause the right front end code to be executed on the app server, or backend code to be executed in the database you could potentially compromise a lot.
Our stack is Redis, MongoDB, Nginx, SCSS, HAML, Coffee, Rails and NodeJS. I'm extremely happy with these choices.
Recently me and a friend did a small weekend project: www.bubblefap.com (nsfw) The design and code is a homage to ugliness. We only used PHP-ActiveRecord and that's it. and I had so much fun!
I just hacked away! I was cowboy coding, hacking away, and I didn't need to think about frameworks and architecture and integration with fog and hacking Rack to support flash uploads. Oh, good times :-)
Node.js, Redis, and MongoDB? They've gone full web-scale!
In all seriousness, though, it looks like they are using Redis for exactly the right reasons, and the larger architecture is pretty much the definition of a sane forward-looking design.
I think this is the best write-up I've seen of a full Javascript/CoffeeScript stack.
I've always been hesitant to get too far away from my LAPP(ython) stack, but I'll almost certainly be hacking something together with these components to see how I like writing everything in CoffeeScript.
Actually, those have been great (and thank you for making them)! We had an outage during beta because somebody forgot to wrap an array comprehension assignment (CoffeeScript) and we didn't test right. Backbone has been really good, and while we had some complaints, I think many have been resolved in more recent versions - we're on something ancient.
Isn't that one of the major drawbacks of using a bleeding edge Tech Stack? Albeit mitigated by choosing to use code that is easy to comprehend/maintain (like backbone!)
What do you think you will do? E.g. With backbone: Merge up, status quo -- maintain your branch, change libraries, or something else?
Agreed, it made some necessary, non-backward-compatible changes as a result of being such a young project. I think it is likely that we will upgrade to a newer version soon.
We worked around the bug (which only happened when we were doing REALLLY long queries when we disconnected a client) and moved to Linux, where the bug does not exist. Awesome sysadmin Tim GDB'd into Mongo and figured out that it was an issue with SpiderMonkey, fixed in later versions.
Question: They tout MongoDB's "generally fast" read speed. Any idea what this is in relation to? In "fast", are they suggesting that it is faster than a typical RDBMS? If so, does anyone know of any supporting benchmarks or breakdowns of how they accomplish this and to what extent?
Does anyone know of any open-source projects out there using a similar stack? I'm particularly interested in learning more about client-side view rendering with Backbone and Mustache... server side can be any of Node, Rails or Django.
I've been using CoffeScript and backbone.js a lot lately, it really has changed my approach to web apps (keep the server light, just transfer json back and forth). The approach that has worked the best for me is to keep pages such as the landing and authentication as traditional rendered pages and save the client side rendering for the user-authenticated 'meaty' parts of the app.
If you want to look a couple really-small examples I have a couple apps on github I used for learning backbone.js/coffeescript. They both use the underscore.js template engine.
I like the idea of doing authentication and OpenId in a traditional web stack, while doing the "app" portion in a javascript backbonejs stack.
That lets you use existing libraries (Devise, etc.) for the authentication and once that's out of the way, you can go have "fun" with coffeescript and mustache. :)
Funny story, I found this particular lib the same day as Daniel wrote some code that was an exact drop-in replacement for its 'parallel' function. So he says 'Yes, let's use that one, as I understand and approve of its approach.'
This is a great write up. We have a very similar stack for a separate team based project we're working on. The only difference is the front end stack, and we have tornado sitting back on the backend to serve http requests when needed. What's the benefit of Mustache/Backbone over just writing your own client side templating, in a practical sense? Last time I used a KVO framework, there was a performance penalty that I did not like. Does Backbone have any performance issues? Or even mustache? What's the trade off of having these frameworks versus generating custom tempting code?
This post raises a few questions for me... and perhaps some one more versed in these stacks can provide answers.
- Does the use of CoffeeScript alleviate the MC Escher-esque quality of callbacks within closures involved in working with Javascript on both the server and client (and the data store)? I can totally see the appeal of CoffeeScript's syntax. Giving the programmers something different to look (and learn) at probably provides some cognitive benefit as well.
- Is there an acronym/name (ala LAMP) for the Node/MongoDB/Redis stack?
I am assuming that CoffeeScript should be able to add new syntax where the need arises, as the number of CoffeeScript users is probably still pretty low and able to adapt to change.
From what I can tell, CoffeeScript - and the Javascript/Node world - would really benefit from something like F#'s workflow/computation-expression syntax, which will take a fairly straight-forward readable statement and behind the scenes de-sugar the hell out of it into a bunch of closures.
Deferreds are
* Mostly monadic (creating a deferred is "return", the "then" method is "fmap" and "bind").
* Can be implemented as a library, without a separate compilation step or having to patch the runtime.
* Avoids most of the CPS inversion of control madness. You can return and store promises and you can also add callbacks after the fact so code is much more flwxible. (writing sequential async loops is still annoying though)
That's correct -- because we compile to "standard" JavaScript, we're at least somewhat comfortable introducing significant changes (even to the syntax) where desirable. Even if you never get around to updating older pieces of code, all of the compiled JS continues to be fully compatible and interoperable with the newer stuff.
After reading that ticket, I think the solution is going to have to involve more than one keyword. It's curious to see that the objection is to the hideous mass of js that "defer" compiles down into. I kind of thought that was the point of CoffeeScript.
- Daniel is going to write another post detailing this. The answer is that when you combine CoffeeScript and a good async library, you can indeed get rid of that stuff.
when you combine CoffeeScript and a good async library, you can indeed get rid of that stuff
My experience has been that you could only do this in few cases. Not an issue that can be worked around elegantly in libraries, you need language support.
http://tamejs.org/
I just ran the letters modb-re-nojs-li through my handy word_solver and came up with some candidates: MoRN, NoRM, ModeRN, NiMRod, ReMiNd, MeRLiN, LiMNeR, NiMbLeR
"Client-side MVC" is kind of unfair to backbone.js. I haven't confirmed it myself, but I've seen people explaining how it's also suited to server-side rendering, which means it's not ruled out for competent authors doing progressive enhancement.
Yes, I built my version almost in tandem with Steve. Pusher hosts their js lib on github under the MIT license which made building these things supremely easy.
Today I'd consider going with a service, but first I'd see how https://github.com/LearnBoost/websocket.io looks. Most of our problems have been with abstractions in the socket.io client and scaling issues with server process chatter. The chatter shouldn't be necessary with websockets-only.
When I first saw Trello I "felt" that something cool was going on under the hood. So I did a little poking to see what JS technologies you guys were using. The piece I was most excited to learn about was backbone. Had never seen it before and was really impressed by the space it was filling and what it's capable of. Thanks for the write up.
I have read about various issues in using HAProxy and socket-io (websockets mode). I am currently working on a project that's heading towards that direction - anyone has anything to share on that front?
Thanks. Looks like it's available only on the development version - I wonder if it's production ready...
Also - I took a look at the article - it does not mention using a total node approach of using something like node-http-proxy for load balancing. Any idea on such a set-up? I need HTTPS as well.
A pure-Node approach is a lot simpler: you can just use Node's HTTPS (if you don't care about load balancing) and attach Socket.io to it. Just don't put your Socket.io server behind Nginx if you want WebSockets support.
With load balancing, I would recommend going with Stud/Stunnel and HAProxy. Terminating SSL with a specialized piece of software is nicer (separate SSL overhead to another box), and using a separate load balancer allows for more flexible options, e.g. serving static assets from Nginx. There is nothing wrong with node-http-proxy, it's just that these two projects have been around for longer and are better understood from an ops perspective.
You cannot use round robin load balancing, you need to have at least IP-based stickiness with Socket.io for now. Well, you can round robin if you use the Redis store but you'll run into inefficiencies with https://github.com/LearnBoost/socket.io/issues/686 . I'll do a bit more coverage on other SIO deployment-related issues once I get the chapter on Socket.io finished for my (free) book in a few weeks.
If I were starting now, I'd just ignore the Flash sockets transport since it makes the whole stack more complex due to not looking like HTTP to load balancers, and start with Engine.io / WebSocket.io to take advantage of their simplicity.
Love the stack, already working on something similar (mine involves nginx, though.) Given we haven't deployed, it's refreshing to see a similar diagram from those already out in front.
" sending changes to Models down to browser clients". _Down_ to the client? He draws the stack with the client on top and the storage at the bottom, and then says 'down to the client'? I have a collegue who also does this. I never figured out why. Can anyone explain possoble reasoning behind this wording?
Because from the client downloads assets and we are more used to thinking about what happens from the client's pov.
Alternatively, it could be for the same reason people usually say they're going "up" to a big city, regardless of the direction of travel. The server is the hub and all the clients are down from it. This possibly dates back to when towns/fortifications were usually located on hills.
Because the client DOWNloads stuff, presumably. To me it makes perfect sense, servers send stuff 'down' to clients, regardless of a stack figure where stuff is arranged otherwise. Those two are orthogonal.
Socket.io is a great library. When it comes to a contender, I see Server-side Events [1] as a contender to WebSockets. I feel it's a simpler architecture which fits HTTP/REST better than WebSockets in the general case. I built a couple of small experiments/examples using CoffeeScript and Redis [2]; the middleware.coffee in particular is designed to be used together with REST, where if you send "Accept: text/event-stream" you will get updates to the resource(s) on that URL.
On the plus side, it's supported by Opera Mini and iOS.
For IE and others (Android, I'm looking at you!), I have yet to try it and so can't vouch for it, but there exists a polyfill: https://github.com/Yaffle/EventSource
This is one of those remarks that is obvious to someone who knows what it means but mysterious to someone who doesn't. So what does it really mean?
- Don't use a technology that, no matter how good, is just too niche for widespread adoption?
- Don't use a technology that we'll have trouble finding programmers to support?
- Look for stuff that will take better advantage of impending improvements in hardware and communications?
- Avoid frameworks for which support may wane?
- Avoid technology that we may have to support ourselves?
- Avoid anything proprietary whose owner may disappear?
- Make sure that whatever you choose, it will work well on mobile technologies, even as native apps?
- Choose technologies abstract enough to minimize our costs but not too abstract to be inefficient?
- Any combination of the above?
- What considerations have I missed?