Archive | IaaS – Infrastructure as a Service RSS feed for this section

Beyond the Network (The Future of the Network, Part 3)

23 May

Beyond the Network

This post the third in a series: To read them in order, start here.

Imagine for a moment that if instead of treating each server (or container, or vm) as if it is one or more network hops away, that the server was directly attached to the services and other servers that it needs to interact with most. This would increase efficiency and would solve quite a few consistency and consensus issues that exist today when dealing with modern infrastructure. If each network switch and router is already a specialized computer, why not further empower these devices to solve many of the problems we struggle with in infrastructure today? We had quite a few specialized network appliances such as load balancers, firewalls, and storage appliances that emerged in the late 90’s and early 00’s. These follow a path of mixing the network with servers providing some specialized function or functions.

The demands for the ever faster evolution of infrastructure capabilities and scale, and the fact that the network is treated as a dumb pipe or simply plumbing between nodes all contribute to the friction of trying to have a dynamic infrastructure. This dumb pipe approach forces all complexity into the software on server nodes, which makes things using distributed systems and solutions vastly more complex than necessary. Trying to create overlays and abstractions just increases complexity and impairs performance as a tradeoff for flexibility.

If you accept that switches and routers are computers that are purpose built for pushing packets, why not consider adding additional capabilities inside these computers? I know there will be naysayers that think this is a bad idea and there will be cases where some of these things may very well be a bad idea. However, there is definitely a class of problems where this would be a far more elegant solution than what we have now.  Simplifying out some complexity and vastly improving performance and scalability seem pretty attractive. A message bus/queue comes to mind as a great example that could start providing incredibly low latencies by being tied into the network device itself.

This approach would also leverage the power of persistent connections, if a server is attached to two other “direct attached switches/network devices” (for redundancy), if either side goes down the other knows. This is an over-simplification that I ask you to overlook for now as I realize there are exceptions/edge cases. Generally when a physical server goes down, the network interface(s) will go down, this would immediately alert any service running on them that this node is down. There could also be a socket or other specialized service running in the switch looking at heartbeat, packet flow, buffers, and communicates with a server side service, maybe via an optimized coordination protocol? This would work well for local clusters, messaging, and similar scenarios. Your cluster needs concensus? No problem if you are only dealing with the concensus service running the switch or router. Even doing cluster concensus across switches and routers would be superior.

Now think about vendors providing prewritten services available to run on switches and routers that can be connected together with serverless functions. This provides a much more flexible and powerful model of “wiring up” infrastructure. Have a function that needs to be more performant and adding additional instances won’t help?  Maybe it should be deployed on an FPGA, or perhaps a special ASIC is added to the switch. I realize that ASICs aren’t as portable/flexible, but perhaps there are common functions where it still makes sense to add ASICs (serialization/deserialization of a protocol for example). Perhaps these FPGAs and ASICs could even be admin insertable via cards or other modular interface wrapped around them? Or perhaps the switch itself is simply more programmable, but adds special function services running on general processors.

There is also an opportunity to enable more sophisticated systems that combine FPGAs, ASICs (TPUs anyone?), GPUs, and CPUs to handle a greater and broader number of problems. These systems would act and be closer to current big iron routers. However, they would be offer a much wider array of capabilities than what is available today. This might be an appropriate place to run a container running a critical piece of OSS infrastructure code for example.

Mock-Switch

[Above is a Mockup of a Switch with FPGAs, Pluggable ASICs, and CPUs]

I have only scratched the surface of this concept in this post, come back tomorrow to read part 4, where we will explore other possibilities.

Advertisements

The Future of the Network, Part 2

22 May

This post is the second in a series: To read them in order, start here.

There are three trends that I have seen happening over the past few years that are indirectly or directly influencing the future of networks; The first is the use of FPGAs and ASICs to solve problems around distributed applications and network loads(specifically I’m thinking of Bing/Azure and several AI projects — including one that chose to go down the ASIC path — Google. The other path that is intriguing is of PISA by Barefoot Networks, which seems on a positive trajectory.

This has been driven by the need for greater performance and better efficiencies in the data center at extreme scale. These efficiencies also translate into very large amounts of capex that ends up being saved. The use of custom hardware in conjunction with software, brings much better returns because of the purpose built nature (which brings increased efficiency and/or performance). This could even be seen as what would be the equivalent of VMware shipping VMware Workstation. This technology provided the foundation for the virtualization transformation of the last decade, all based on making more efficient use of server hardware sitting in datacenter already. Yes, virtualization brought many many other benefits, but the story that initially took hold was better utilization of hardware.

Another trend has been the explosion of infrastructure technologies and projects, of which most are OSS. These projects include a wide array of technological building blocks that have been contributed by many of those same large cloud companies mentioned earlier. However, in addition to these large cloud companies, many other large software companies and some fast growing small companies like Hashicorp, have also developed and released infrastructure technologies (Infrastructure as Code). Technologies such as:

There is a specific category that I would like to directly address:

Network gear has stagnated so much, that pure software infrastructure has taken over a large swath of functionality that should live in software on network equipment. This isn’t a knock against any of the projects above, they are all quite innovative and are answering an obvious unmet need. This is another key indicator that we need to change the nature of the network and how it integrates with what runs on it.

The final trend is that of microservices and serverless aka (lambda/azure functions/cloud functions). This is increasing in popularity very quickly by proponents (and with good reason). What is intriguing about the technology, is that you can write very small amounts of logic/code to interconnect the equivalent of lego blocks of services and infrastructure functionality. This has the same risk as microservices, in that there is also some complexity to be managed (especially if the implementation is not well thought-out upfront). However the benefits in speed of development, scalability and delivery outweigh the costs in most cases from what we have seen so far.

 Venn

So how does all this involve the future of networks? Read Part 3 of this series tomorrow to find out what the future of networks is pointing to.

Defying Data Gravity

2 Apr

How to Defy Data Gravity

Since I have changed companies, I have been incredibly busy as of late and my blog has had the appearance of neglect.  At a minimum I was trying to do a post or two per week.  The tempo will be changing soon to move closer to this….

As a first taste of what will be coming in a couple of weeks I thought I would talk a bit about something I have been thinking a great deal about.

Is it possible to defy Data Gravity?

First a quick review of Data Gravity:

Data Gravity is a theory around which data has mass.  As data (mass) accumulates, it begins to have gravity.  This Data Gravity pulls services and applications closer to the data.  This attraction (gravitational force) is caused by the need for services and applications to have higher bandwidth and/or lower latency access to the data.

Defying Data Gravity, how?

After considering how this might be possible, I believe that the following strategies/approaches could make it feasible to come close to Defying Data Gravity.

All of the bullets below could be leveraged to assist in defying Data Gravity, however they all have both pros and cons.  The strengths of some of the patterns and technologies can be weaknesses of others, which is why they are often combined in highly available and scalable solutions.

All of the patterns below provide an abstraction or transformation of some type to either the data or the network:

  • Load Balancing : Abstracts Clients from Services, Systems, and Networks from each other
  • CDNs : Abstract Data from it’s root source to Network Edges
  • Queueing (Messaging or otherwise) : Abstracts System and Network Latency
  • Proxying : Abstracts Systems from Services (and vice versa)
  • Caching : Abstracts Data Latency
  • Replication : Abstracts Single Source of Data (Multiplies the  Data i.e. Geo-Rep or Clustering)
  • Statelessness : Abstracts Logic from Data Dependencies
  • Sessionless : Abstracts the Client
  • Compression (Data/Indexing/MapReduce) : Abstracts (Reduces) the Data Size
  • Eventual Consistency : Abstracts Transactional Consistency (Reduces chances of running into Speed of Light problems i.e. Locking)

So to make this work, we have to fake the location and presence of the data to make our services and applications appear to have all of the data beneath them locally.  While this isn’t a perfect answer, it does give the ability to move less of the data around and still give reasonable performance.  Using the above patterns allows for the movement of an Application and potentially the services and data it relies on from one place to another – potentially having the effect of Defying Data Gravity.  It is important to realize that the stronger the gravitational pull and the Service Energy around the data, the less effective any of these methods will be.

Why is Defying Data Gravity so hard?

The speed of light is the answer.  You can only shuffle data around so quickly, even using the fastest networks, you are still bound by distance, bandwidth, and latency.  All of these are bound by time, which brings us back to the speed of light.  You can only transfer so much data across the distance of your network, so quickly (in a perfect world, the speed of light becomes the limitation).

The many methods explained here are simply a pathway to portability, but without standard services, platforms, and the like even with the patterns etc. it becomes impossible to move an Application, Service, or Workload outside of the boundaries of its present location.

A Final Note…

There are two ways to truly Defy Data Gravity (neither of which is very practical):

Store all of your Data Locally with each user and make them responsible for their Data

If you want to move, be willing to accept downtime (this could be minutes to months) and simply store off all of your data and ship it somewhere else.  This method would work now matter how large the data set as long as you don’t care about being down.

CLASH – CLoud Admin SHell

11 Jan

It has been several weeks since I have posted to this blog.  I would blame this on the holidays, but that would be inaccurate as it has been something far more insidious!

What is CLASH?
CLASH is a universal shell.  What is a universal shell?  First, by universal I mean that it is intended to run on all major desktop operating system platforms including:
Windows XP, Windows Vista, Windows 7

Mac OSX Leopard, Mac OSX Snow Leopard
Ubuntu Linux, and most other flavors of Linux
——————————–
Now the shell portion is a bit different.  Since this is a CLoud Admin SHell, Cloud is an important part of idea.  In this initial prototype I went through many experiments in learning the nuances of JRuby <follow on post link here>, but have been able to get a reasonable working version of VIJava (the interface I’m using to get JRuby to work with VMware vCenter/vSphere).

Great, so what can I do with this prototype?

-You can try it out on any of the platforms listed above!
-If you are on a Windows System, you can either edit the clash.bat file and put in your server’s IP or name along with a username and password to connect
-If you are on Mac or Linux, you can simply type ./clash –Server 127.0.0.1 –Username administrator –Password password to connect 

-The following are the working commands:
> Get-VM SomeVMName
Above gets the VM object and prints out the Guest Operating System type

> $result
This command prints the name of the VM object
> $result $
This displays the methods available to the VM object
> $result $.get
This shows only the methods with “get” in them
> $result $.set
This shows only the methods with “set” in them
> $result $.find something
This shows only them methods that contain “something” in them
> $result method
This will execute the method against the object (i.e. $result getName will display the name property of the VM object from Get-VM)
> disconnect
This will cleanly disconnect from the vCenter / vSphere server
> Start-VM SomeVMName
This command is flakey at the moment, this will be fixed when the next prototype is released in a few weeks
> Stop-VM SomeVMName
This is in the same state as Start-VM
> Get-VM SomeVMName > $result getName
This allows a limited form of piping in clash by using the > as an operator (you must have whitespace on both sides of the > symbol.

Where is this headed?
After many experiments, it will be broken out into a flexible system that allows many different options and cool capabilities against not only vCenter and vSphere, but most Cloud platforms as well.  Currently planned platforms include:
-Amazon EC2 and S3
-Rackspace
-(Looking for the next provider for this list)

Other interfaces to the shell (for both input and output) will include a Web interface, I’m looking for thoughts on other types of interfaces desired.

How will this work?
Below is my latest planned diagram for how I hope/think things should work:

Where can I get the Prototype?
You can get the code from GitHub here
You can download the entire package here

Installation Directions (AGAIN, this is a PROTOTYPE, it does NOT follow best practices)
1.) Make sure that you have Java 1.5 or Above Installed
Download Java from Here

2.) Install JRuby 1.5.6
Download JRuby from Here
3.) Follow the JRuby Setup Instructions Here
4.) Download the CLASH prototype/alpha-1 from GitHub (See Above Links)
Unzip/Tar it to c:\clash or /clash directory in ROOT
5.) Start clash by going to c:\clash\bin\ or /clash/bin
On Windows edit clash.bat to contain the correct IP Address, Username, and Password
then run clash.bat
On Mac and Linux type ./clash –Server 127.0.0.1 –Username administrator –Password password
Make sure to select the IP or Name of a valid vCenter / vSphere Server
6.) Play with clash
A Request:
Please supply feedback to me through comments to this post or communicate directly with me through Twitter – my handle is @mccrory
I’m looking for ideas/things that you would like to see clash do, better commands, capabilities, features, etc.

Cloud Escape Velocity – Switching Cloud Providers

18 Dec

The term Escape Velocity is the speed needed to “break free” from a gravitational field without further propulsion according to Wikipedia.org.  Data Gravity as explained in THIS previous post is what attracts and builds more Data, Applications, and Services on Clouds.  Data Gravity also is what creates a high level of Escape Velocity to move to another Cloud.

Some background on why this post is timely:

A few days ago Amazon announced a new AWS service for importing VMware disk images (VMDKs) into EC2.  VMware already offered a method for converting EC2 instances through their Converter tool into Workstation VMs and with a 2nd pass conversion into ESX VMs.  While all of this sounds wonderful and it does have value, it brings to light an entirely different issue.  Only Stateful / Fully encapsulated applications can be moved around in this way.

Examples of sources of Cloud Gravity (App, Service, and Data Gravity Combined) on your specific Application.

If someone selects a Cloud provider and writes an application leveraging anything more than a handful of VMs, Data Gravity will make it virtually impossible to move to a new/different Cloud provider.  Don’t believe it?

See the Diagrams Below:

Here is a diagram of an app that has a Low Escape Velocity because of Lower Cloud Gravity:

Cloud Escape Velocity with Low Gravity

Below is a diagram of an app that has a High Escape Velocity because of High Cloud Gravity:

Cloud Escape Velocity with High Gravity

Some potential dependencies include:

Database with a specific API

Web Worker which serves as a web interface and uses internal Authentication (Your user logins are here!)

Application code that uses the Database and Web Workers specific APIs and/or depends on Low Latency and High Throughput access to them.

Here are a few additional things to think about:

– The longer (more time) an Application stays in a specific Cloud the more difficult it is to move.  Why?  Data Gravity increases due to more Mass (data being stored).  Imagine accumulating 100’s of GBs of Data, how easy will it be to shuffle/transfer that much data around?

– The more provider APIs and Services that you depend on the harder it is to move.  Why?  Because there are only two paths that can be taken in a move.  The first is to find another provider that has the exact same set of APIs and Services (this will limit your choices).  The second is to change or rewrite your application to take advantage of the new Cloud provider’s APIs and/or Services.

– Different providers have different charges for the consumption of the same resources.  Your current provider gives free usage of queues for applications. The provider you are looking to go to charges after the first X number of messages on the queue.  Now what do you do?  You will either pay more when you move, rewrite your application to fit the new provider’s model, or pick another provider that has free queue usage.

– Different QoS guarantees from Cloud provider to Cloud provider.  Some Cloud providers offer SLAs with reimbursements for outages, others only offer best effort.  Some providers offer tiered Services, others only offer a single tier.  What happens if you want to move and you can’t get the minimum level of QoS that you need?

This is NOT an attempt to dissuade anyone from using Public Clouds (they are incredibly valuable and powerful), but I would like more people to go in eyes wide open.

Data Gravity – in the Clouds

7 Dec

Today Salesforce.com announced Database.com at Dreamforce.  I realized that many could be wondering why they decided to do this and more so, why now?

The answer is Data Gravity.

Consider Data as if it were a Planet or other object with sufficient mass.  As Data accumulates (builds mass) there is a greater likelihood that additional Services and Applications will be attracted to this data. This is the same effect Gravity has on objects around a planet.  As the mass or density increases, so does the strength of gravitational pull.  As things get closer to the mass, they accelerate toward the mass at an increasingly faster velocity.  Relating this analogy to Data is what is pictured below.

Data Gravity

Services and Applications can have their own Gravity, but Data is the most massive and dense, therefore it has the most gravity.  Data if large enough can be virtually impossible to move.
What accelerates Services and Applications to each other and to Data (the Gravity)?
Latency and Throughput, which act as the accelerators in continuing a stronger and stronger reliance or pull on each other.  This is the very reason that VMforce is so important to Salesforce’s long term strategy.  The diagram below shows the accelerant effect of Latency and Throughput, the assumption is that the closer you are (i.e. in the same facility) the higher the Throughput and lower the Latency to the Data and the more reliant those Applications and Services will become on Low Latency and High Throughput.
Note:  Latency and Throughput apply equally to both Applications and Services
How does this all relate back to Database.com?  If Salesforce.com can build a new Data Mass that is general purpose, but still close in locality to its other Data Masses and App/Service Properties, it will be able to grow its business and customer base that much more quickly.  It also enables VMforce to store data outside of the construct of ForceDB (Salesforce’s core database) enabling knew Adjacent Services with persistence.
The analogy holds with the comparison of your weight being different on one planet vs. another planet to that of services and applications (compute) having different weights depending on Data Gravity and what Data Mass(es) they are associated with.
Here is a 3D video depicting what I diagrammed at the beginning of the post in 2D.

 

More on Data Gravity soon (There is a formula in this somewhere)

Public Cloud Comparison and Calculator v2

5 Dec

After some time away from the Public Cloud Compute Comparison that I did a couple of months ago (which got X hits), I decided to update it based on feedback and new ideas.  What follows is a brief walkthrough with instructions on how to use the Calculator.

Before I go any further, a brief disclaimer:  I do not warrant the accuracy of this Comparison and Calculator, it may have errors and omissions (all unintentional if they exist).  I also take no responsibility if your bill turns out to be something very different than what the Calculator shows.  And finally, I’m employed by Dell, which has relationships with Microsoft, Joyent, and Amazon, and possibly the others and I’m just not aware.

First, let’s cover the updated Compute Comparison:

I’ve updated the Compute with Terremark as an additional provider, I would be interested in adding others if people are interested simply add a request by commenting at the bottom of this post.  Also added, is the ability to Calculate/Estimate your costs by using the add Quantities fields.  To use the Quantities, add the number of each type of Compute instance as you like and get a rough idea of what the cost will be.  Please note that there are assumptions, however for the most part I have annotated in the spreadsheet what those assumptions are.  Once you are happy with your compute instances, you can go down to the bottom and move to the new Cloud Storage Comparison and Calculator.

Cloud Compute Comparison and Calculator

Next, The Cloud Storage Comparison and Calculator:

It was quite an ordeal getting the tiered storage pricing models to work correctly as formulas, but it is all in there.  The Cloud Storage Comparison and Calculator attempts to cover the other side of the Cloud equation, by taking into account the following:

Monthly Persisten Storage Requirements

Data Transfer (In and Out of the Storage Cloud)

API Calls (In and Out bound requests)

Redundancy Costs

By entering some estimates across the top and choosing a quantity (this would usually be 1, which acts as a trigger to calculate Monthly Cost) you can easily get an idea of what your storage cost would be.

Cloud Storage Comparison and Calculator

And finally we have the Cost Summary Page.  This page combines the Total Monthly Cost from the Compute and Storage sheets into one place.

Cloud Comparison Summary Monthly Total Costs

To switch from the Cloud page to the Storage page and from the Storage page to the Summary page (or worksheet if you want to be precise), go to the bottom of the page an select (as shown below)

Tabs and Sheets

In the next post I will cover different results that came out as I used the Calculator.