MU Advanced: Issues and Discussion
Welcome Guest
  • Good evening, Guest.
    Please log in, or register.
  • July 31, 2010, 05:47:07 PM
Home Forums Contact Tags FAQ Links News Login Register
* *
Navigation Menu
Search

Random Quotes
Success always occurs in private, and failure in full view.
- Murphy's Law
Pages: [1]   Go Down
  Print  
Author Topic: Using S3 for Files (blogs.dir), Avatars, etc.  (Read 2544 times)
0 Members and 1 Guest are viewing this topic.
ZappoMan
Full Member
***

Karma: 1
Offline Offline

Posts: 157



View Profile
« on: April 05, 2008, 09:38:50 PM »

Has anyone played around with this idea?

I think I've seen some standalone wordpress plugins for doing this, but as you know in the WPMU world (especially assuming your users are not technogeeks like us) you want to hide some of the options and make it work easily out of the box for them.

Has anyone played around with building a plugin to handle this?
Logged

Yep, that's me... riding my bike 204 miles in one day.
drmike
Gate Keeper
*****

Karma: 3
Offline Offline

Posts: 2228



View Profile WWW
« Reply #1 on: April 06, 2008, 11:25:04 AM »

Not really.  Not sure of your setup but most of my boxes have 2 250 gig hard drives now in each of them and I only put 200-225 clients on a single box.  A typical box for us is only 20-30% used when it comes to hard drive space.  Bandwidth use isn't that bad either although I am thinking about another gigabit line.  (A salesman offered me a decent deal on one last week that I may take up just so I won't have to worry about it later on.)

Also relying on a second, non-local source for "mission-critical" material is something that you try not to do in hosting.

I could see it as an offsite backup solution though.  Our backup servers sit in the bottom slots in the same racks as our servers and that's not something you try to do either. (ie If the datacenter goes down, you can't get to your backups either)
Logged

ZappoMan
Full Member
***

Karma: 1
Offline Offline

Posts: 157



View Profile
« Reply #2 on: April 06, 2008, 11:55:23 AM »

How is what I'm proposing any different from using CDN like akamai?

I've already moved all my theme components to S3 and found it very reliable and very cheap...

There are several much larger domains than us that use S3 for pretty much their entire asset base.

I'm not trying to be argumentative, I'm just saying, if there is some logical reason to not off load serving of static files I'd like to hear it.


Logged

Yep, that's me... riding my bike 204 miles in one day.
trent
Jr. Member
**

Karma: 0
Offline Offline

Posts: 82



View Profile WWW
« Reply #3 on: April 06, 2008, 05:37:54 PM »

According to a post by Matt, wordpress.com has a "portion" of their files on Amazon S3, so they have it working that way.   I think they have it also on their own servers as well for backup in case S3 goes down, but they do it.
Logged
Luke
Key Master
*****

Karma: 5
Offline Offline

Posts: 3710



View Profile WWW
« Reply #4 on: April 07, 2008, 07:31:48 PM »

That would be correct, Trent.
Logged

10 frames?
Heh, that's for Quakers.

Note: This message may be Canadian friendly.

"Pornographic monster on the floor"
ZappoMan
Full Member
***

Karma: 1
Offline Offline

Posts: 157



View Profile
« Reply #5 on: April 07, 2008, 07:52:54 PM »

Ok, well, I know it's possible to do... I'm just curious if any one here has seen a plugin already written to do it.
Logged

Yep, that's me... riding my bike 204 miles in one day.
drmike
Gate Keeper
*****

Karma: 3
Offline Offline

Posts: 2228



View Profile WWW
« Reply #6 on: April 14, 2008, 10:26:38 AM »

Speaking of Amazon:

Link
Logged

ZappoMan
Full Member
***

Karma: 1
Offline Offline

Posts: 157



View Profile
« Reply #7 on: April 14, 2008, 12:16:56 PM »

Yep... it happens, this was an example of a classic case of an upstream network provider failure. It happens all the time, no matter who owns the server equipment.

I can't afford to run multiple geo located servers connected to different network providers (can you?) but now with Availability Zones, I can launch multiple EC2 instances in different data centers connected to different backbone providers to help protect against this sort of failure.

In other interesting news.... amazon today announced a limited private beta of there persistent storage for EC2.




Logged

Yep, that's me... riding my bike 204 miles in one day.
drmike
Gate Keeper
*****

Karma: 3
Offline Offline

Posts: 2228



View Profile WWW
« Reply #8 on: April 14, 2008, 02:57:43 PM »

(can you?)

Servers are here in Charlotte, 2nd and 3rd DNS servers are on Long Island and in Texas. Smiley

The point I was making was that using an offsite source for some material has been known to bite folks in the rump on occasion.
Logged

ZappoMan
Full Member
***

Karma: 1
Offline Offline

Posts: 157



View Profile
« Reply #9 on: April 14, 2008, 03:14:51 PM »

(can you?)

Servers are here in Charlotte, 2nd and 3rd DNS servers are on Long Island and in Texas. Smiley

So, my point is, you have a single point of failure... Charlotte... a lot of good backup DNS does you if your connectivity to Charlotte goes down... Owning your boxes doesn't solve that problem. The only solution is to:

Run servers at multiple, independent geo and network locations.
Verify that these colo spots use different top level network providers.
Implement a redundancy scheme that allows your data to be "hot" in multiple locations.

It doesn't sound like you're doing that... based on your answers to my questions about DB backup/redundancy schemes on a different thread.

So... my point is... that what bites people is not "using an offsite source for some material"... but what bites people is having a single point of failure... in the case of the EC2 incident you're describing... this exact failure could have occurred at the ISP you're hosting your server in charlotte at... and you would have had the same experience.

The question is, do you trust the netops team at your ISP more or less than you trust the netops team at Amazon?

And since we all know that we shouldn't trust either of them... what techniques are you using to build in an appropriate level of "bullet proofing" to your architecture when the guys you trust to not screw up... do indeed screw up.
Logged

Yep, that's me... riding my bike 204 miles in one day.
trent
Jr. Member
**

Karma: 0
Offline Offline

Posts: 82



View Profile WWW
« Reply #10 on: May 10, 2008, 08:53:53 AM »

I know you already did this yourself Zappoman, but interesting stuff:

http://www.ringofblogs.com/2008/04/12/off-loading-wpmu-theme-files-to-amazon-s3/

This resulted in a pretty good cost savings for him in April:

http://www.ringofblogs.com/2008/05/08/first-month-data-using-s3-offloading-for-wpmu/

Trent
Logged
ZappoMan
Full Member
***

Karma: 1
Offline Offline

Posts: 157



View Profile
« Reply #11 on: May 10, 2008, 05:49:44 PM »

That plugin is very similar to what I implemented. Although I added versioning support which allows me to set FAR FUTURE expires headers so that the bulk of the theme components should only be downloaded once instead of at each session.

But since you set FAR FUTURE to be way out in the future you need a mechanism for rolling out new versions (for bug fixes or whatever) and so, I use the version info from the Style.css to modify the URL on S3... that way I can make a fix, and roll a new version to S3 and everyone gets the new CSS and components.
Logged

Yep, that's me... riding my bike 204 miles in one day.
drmike
Gate Keeper
*****

Karma: 3
Offline Offline

Posts: 2228



View Profile WWW
« Reply #12 on: June 07, 2008, 12:26:25 PM »

I mean no offense when I say this but I'm sitting here on a site that uses amazon's s3 to hold images and the browser icon is spinning and spinning trying to get the data.  Looks like Amazon is timing out.  That's what concerns me with using 3rd party hosts like that.
Logged

Luke
Key Master
*****

Karma: 5
Offline Offline

Posts: 3710



View Profile WWW
« Reply #13 on: June 07, 2008, 02:21:23 PM »

As well as latency, when it is working.

Ever watch google analytics hold up page loads for a while?

Logged

10 frames?
Heh, that's for Quakers.

Note: This message may be Canadian friendly.

"Pornographic monster on the floor"
ZappoMan
Full Member
***

Karma: 1
Offline Offline

Posts: 157



View Profile
« Reply #14 on: June 08, 2008, 01:19:49 AM »

I suspect that there's something wrong with that particular sites implementation.

But it's clear you're not likely to be convinced that using a content distribution network like S3 is a good idea.

That's ok with me.
Logged

Yep, that's me... riding my bike 204 miles in one day.
drmike
Gate Keeper
*****

Karma: 3
Offline Offline

Posts: 2228



View Profile WWW
« Reply #15 on: July 21, 2008, 08:00:49 AM »

Ever watch google analytics hold up page loads for a while?

Hell yes.  One of the reasons why I prefer running Urchin and running it locally.

Just for reference: http://en.forums.wordpress.com/topic.php?id=31951

By the way, I'm not saying that using Amazon S3 is a bad idea.  Just the idea of using a third party source for mission critical support is something for concern.  At the very least wp.com had a backup method to serve those files.  I would think that most wpmu admins we've seen backups are not something that is on the top of their lists.
Logged

Andrea_R
Key Master
*****

Karma: 5
Offline Offline

Posts: 1626


The spiky-haired mistress of homeschooling. (R)


View Profile WWW
« Reply #16 on: July 21, 2008, 08:29:51 AM »

twitter uses AS3 for serving avatars. It was down yesterday, so there was a million tweets of "hey where'd the pictures go?"
Logged

Am I the only one with a sig?
Ah well, might as well read my blog for lols.
ZappoMan
Full Member
***

Karma: 1
Offline Offline

Posts: 157



View Profile
« Reply #17 on: July 21, 2008, 12:45:18 PM »

So, we serve our theme components off of S3... and I was in the process of moving to serve Avatars (and then eventually blog images) off of S3.

The good news, is that my theme images are controlled by a plugin (that we wrote) which as an administrator, I can simply click a button and serve locally or from S3.... so once I knew about the problem, I made the switch, and everything was fine.

I did have 1 hardcoded file that was referencing S3, which unfortunately was for our custom signup process... which is kinda funny, since the fact that signups took a nose dive, was the indicator to us that something was going wrong with the site.

As for avatars and blog images, reading this thread from wp.com gives me some good ideas for how to optimally handle image and avatar serving...

Clearly wp.com uses S3 as their primary image hosting, but they do some "local caching" to the tune of 70-80%...

I'm curious what that really means... what strategy do they employ to pull this off.

Some ideas I can think of are:

Strategy 1:

1) upload everything locally, then upload to S3
2) serve from S3, and have the ability to flip the switch to server locally
3) implement some kind of a garbage collection/optimizer that removes local files for blogs that get little or no traffic.... (is this worth it?)

Strategy 2:
1) upload files via a system that mostly uploads to S3
2) serve files via a local php/wordpress plugin cache that fills from S3

Strategy 3: (pretty similar to 2)
1) upload files to S3
2) use perball or some other reverse proxy to serve image files, have this reverse proxy keep a cache, and load the cache for S3.

============
So, here's my question:

If you're building a system that needs to handle 1 million+ page views a day, and you're clearly running on more than 1 web server, how do you architect this solution?
Logged

Yep, that's me... riding my bike 204 miles in one day.
drmike
Gate Keeper
*****

Karma: 3
Offline Offline

Posts: 2228



View Profile WWW
« Reply #18 on: July 21, 2008, 01:08:01 PM »

Avatars and uploaded files: I think the uploading to both sites but allow s3 to actually serve the files would probably be best.  My "concern" for all of this is only going with the offsite source and not having anything locally is just asking for trouble.

I don't see a point to having the cache.  You're going to have processing if you have them locally. (Unless they have some manner of serving the files without any major processing involved.)

As to theme files, I'm torn.  Granted they get called the most and would be the number 1 set of files I would want to offload but they're also the most important ones and I'd rather have them on the box.

Hmmm....

For our big site, all uploaded files are on the same box as the site.  We only moved the databases to another server.  We've been recently thinking about rearranging all of our mu installs although we haven't worked anything out.  I have 16 lesser servers sitting in a closet upstairs at home that I've been thinking about using as db servers and moving to a 3 servers up front - 16 in back setup.  Not sure what we'll offload though.

Sorry that I really can't give an answer.

I wonder what puts the most load on a box?  Uploaded files?  Theme files? (I would think those would be cached after the first view)  Javascripts? (Ditto on the caching) Maybe if we knew the answer to that, it would help.
Logged

Luke
Key Master
*****

Karma: 5
Offline Offline

Posts: 3710



View Profile WWW
« Reply #19 on: July 21, 2008, 01:44:14 PM »

Well, in terms of load, static files are always less.

The downside to MU, is blogs.php, IMHO.
It requires a DB connection every time it is called, which to me is inefficient.
Grabbing the blog_name from the HTTP_HOST, then processing that would seem to be better, and no need at all to even fire up WP.

In terms of WP, I read (Mark wrote it, I think) that they have a connection check routine, which checks if S3 is available, and if it isn't they serve from the local storage. All done on the fly, no intervention required.

IIRC, they save locally, then sync to S3, which is where their message about caching delays comes from in part.

I don't recall how often, how much, etc. and it may be save to S3 then sync locally, I don't know specifically.

For me, I'd rather store to something like SANS, and serve from there. Still of the main web servers, but without the headache of a remote service. If the connection to the SANS is down, most likely the connection to the whole site would be too. Which at that point, serving local, SANS, or remote is of little consequence.

Logged

10 frames?
Heh, that's for Quakers.

Note: This message may be Canadian friendly.

"Pornographic monster on the floor"
ZappoMan
Full Member
***

Karma: 1
Offline Offline

Posts: 157



View Profile
« Reply #20 on: July 21, 2008, 07:14:37 PM »

Interesting point about blogs.php...

Looking at that code it does look like it's doing a lot of heavy lifting that it doesn't really need to do. I guess most of it is to protect against malicious content, etc...

Kinda interesting, since it seems like a lot of this might get broken (or not fully supported) but plugins like the TanTanS3 plugin.

What's the best design pattern for doing this sort of thing, have you ever replaces blogs.php? Is that the way to do it? Or do you just write plugins that handle the URL remapping (like TanTanS3 does).
Logged

Yep, that's me... riding my bike 204 miles in one day.
drmike
Gate Keeper
*****

Karma: 3
Offline Offline

Posts: 2228



View Profile WWW
« Reply #21 on: October 09, 2008, 09:39:12 PM »

The pixel audio plugin got a fix to use amazon:

http://benperove.com/howto/boost-wordpress-audio-w-amazon-s3/
Logged

Tags: avatars  user files  amazon s3  aws  s3 

Pages: [1]   Go Up
  Print  
 
Jump to:  


Login
 
 
Recent Posts
Recent Topics
No new topics.
Hot Tags
Whos Online
7 Guests, 0 Users
Home Forums Contact Tags FAQ Links News Login Register