Using ZFS with AWS for higher performance and reliability.

With the adoption of cloud going main stream, huge amounts of sensitive data is being moved from in premise infrastructures to the cloud platform, to help reduce costs of hardware, maintenance and achieve more reliability. While it is a fact that by virtue of leveraging a well designed cloud platform like AWS (Amazon is known for its reliability of its resources with almost no downtime of the data centers), your applications benefit from increased reliability and up-time, our engineers however learned that it is not enough if we want to move up that last 0.1% or 0.2% to get us closer to our goal of always on availability.

Related: For SMEs Cloud Based Services is the default way forward

Here we discuss our journey to achieve that magical 99.9% uptime figure for our Endurance-S solution, served off the AWS infrastructure.

Related: 5 unavoidable reasons to adopt collaboration services on the cloud

Amongst the many components of a collaboration infrastructure, here we discuss our attempt to move the storage infrastructure up a few notches and instead of using a disk as a disk with basic RAI.

We defined the following requirements for our new storage platform:

  • A Logical Volume Manager to help us scale the storage on demand, with no downtime
  • Data Integrity to ensure reliability for the data
  • Compression to optimize storage and also achieve higher performance.
  • Higher I/O performance to enable better response to end users.
  • Protection against data corruptions
  • Snapshots for quick online backups, which don’t load the servers
  • Quota Allocations, etc.

In our production environment for the collaboration infrastructure, we need to handle millions of small files distributed over thousands of folder (maildir). Our current infrastructure runs of an ext4 file system and we had to make a choice of whether we can get all of the above requirements met with EXT4 or would we need another file system.

With this post, we have shared our observation of performance benchmarks done between Linux+EXT4 on EBS and Linux + ZFS over EBS.

The Setup:

  • Hardware:
    • AWS EC2 Instance m3.xlarge (14 GB RAM and 4 vCPU Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz)
    • 3×250 GB General Purpose SSD (AWS specific) to test on.
  • Software:
    • CentOS 6.0
    • Kernel Version: 2.6.32-220.el6.x86_64
    • ZFS on Linux v0.6.3
  • Testing Tools used:
    • IOzone
    • Apache JMeter
  • ZFS has been configured as follows:
    • The zfs pool is configured in MIRROR mode of 2×250 GB Disks (Amazon-specific SSD).
    • L2ARC of 40 GB has been added to the zfs pool for improving read speed.
    • ZFS has been limited to use the system RAM maximum of 8 GB and minimum of 4 GB (in 16 GB system).
    • Checksum has been turned off as faster performance is noticed with it.
    • ZFS Compression is set to LZJB compression as it is faster than its counterparts.

Testing IMAP with JMeter

Using Apache JMeter:
We made use of a pre-configured Mithi Connect Xf collaboration server on CentOS 6.0 to stress test POP and IMAP Protocols for different users.

Apache JMeter Configuration Screenshots:
Specify the number of threads (or users) connected concurrently

Under the Mail Reader Sampler,

  • Specify the protocol to be used (IMAP/POP). We will be testing IMAP as of now.
  • Insert Required details as marked in the below screenshots.
  • Number of messages to retrieve (per thread/user) is set to 10. *For very aggressive testing, one can set it to All. However, it is not recommended for large mailboxes as the server will tend to hang up.

The above setup will perform as follows:
For each thread, 10 messages will be retrieved, simultaneously.

In the above screenshot, we can see the throughput = 279.06/minutes. This means that the server can handle ~279 requests per minute for retrieval. Average time is the time taken for 1 request to fulfill.

Output :

The output is shown in CSV format. The field of concern is elapsed and bytes. Here, elapsed time is the time in milliseconds, to fetch 101091 bytes of data from server.

To get different results, we varied the number of thread/users count in the 1st screenshot. The output is shown below in tabular form (for both ZFS file system and EXT4 file system):

Results :

P.S.: For EXT4 no changes had been done as far as optimization and tunings are concerned.

Observation:

  • EXT4 disk get strained on tests done for threads/user counts above 40.
  • ZFS can handle more than 40 threads/user counts up to 60 without significant strain on imapd and/or server.
  • Faster test results obtained when tested against small mailboxes (range: 300-700 MB)
  • Slightly slower tests results obtained when tested against gigantic mailboxes (range: 6-10 GB)
  • No 2 tests had been run for a single user.
  • Each test has been conducted for a different user (large mailbox only).

Conclusion:

  • ZFS looks promising on delivering data during peak hour usage without hampering other processes. On the other hand, tests on ext4 brought down idle time to ~50%.

Testing POP with JMeter:

Using JMeter to test POP protocol is similar to that of IMAP. In this the only change is to be done of protocol name to pop3 and port number to 110.

Same test environment were kept for pop3 testing too.

Results :

*For POP protocol testing, we tested for both Checksum kept ON and OFF.

Observation:

  • Difference of request handling by server per minute when checksum is set to OFF is greater by ~60%
  • Average time to complete 1 request is less when the checksum is set to OFF.
  • Between ZFS (Checksum OFF) and EXT4, slightly higher performance gain can be seen in ZFS. However, the average response time for 1 request is almost same in both the filesystem.

Conclusion:

  • Since both filesystems have the same average response time for 1 request, capacity to handle requests is better on ZFS, which means ZFS can handle loads better than ext4.

 

Unexpected behavior of MS Outlook 2013 IMAP client when the IMAP Server is migrated to a new machine

Recently during an outage we encountered on one of our cloud servers, we learnt something about the behavior of MS Outlook 2013 IMAP client, which is worth sharing via this post.

To verify our observation of this behavior during the outage phase, we conducted tests on MS Outlook 2013 and other IMAP clients in our labs, which are documented below with conclusions on how to handle such backend changes to minimise the impact on end users.

Scenario/Situation:

A Mail server (bare metal or virtual) having Connect Xf  is running the IMAP service. Using a variety of mail clients such as Thunderbird, MS Outlook, Baya, mobile, users connect to the Mail server over IMAP to view their mail boxes. Typically when the users connect for the first time, the IMAP client synchronises the mailbox from the server to the client and then for every connect, it maintains the mailbox in sync. Read more about how IMAP works

Depicting an IMAP server and IMAP clients

Now if for any reason, the mail server is changed (rebuilt on another machine or even on the same machine with the same IP address and DNS host names and with the same data restored), we have observed that some IMAP clients “misbehave” as if they were a little confused by what happened back there. The unexpected behavior includes attempting to rsync the entire mailbox and not syncing any mail from the mailbox after the server change, etc.

Depicting the rebuild IMAP server and IMAP clients

Depicting the rebuild IMAP server and IMAP clients

Observed Behavior

We have documented the behavior as observed by us below and have also suggested remedial action to be taken on the client.

Lab conditions for the test:

  • During the rebuild procedure, the server was totally inaccessible to the clients. This means that if the clients were running, they would be attempting to connect to the IMAP server but would not be able to locate the server (via IP or DNS name).
  • Once the server is rebuilt and is back online, the IMAP client which is continuously attempting to connect to the server,  will now find it online. The behavior in the IMAP clients is observed and documented at this stage.
  • The email clients are configured as per steps given here.

Conclusion

Thunderbird on Windows, exhibited this behavior, but was not a serious issue. It was easily resolved by simply restarting the Thunderbird client.However, we were not so lucky with MS Outlook 2013, which was the only client that got badly shaken up when the IMAP server was changed and needed a full reconfigure and full re-sync of the entire mailbox, which can take a long time depending on the size of the mailbox and also would inadvertently load the server during this process. We found that it is possible to avoid this if the MS Outlook 2013 client is configured (in advance) with the Root folder option set.

Mind you that this is not required for MS Outlook 2010.

References: IMAP issues affecting Outlook 2013 and Office 365 functionality

Making things better at the workplace through Baya’s Video Conferencing feature

The last release of Connect Xf 3.18, launched the new Video calling and Multi party video conferencing facility (in Beta currently), from right within the browser. This allows the user to simply dial a colleague from the roster and have a audio/video chat with him/her and also add multiple people into the call to have an audio video conference with them. To join the call, a person only needs access to a browser and a web camera (easily available on most devices today).

More details about the Audio video chat feature can be seen here.

We did a little study internally within Mithi and with a customer from the manufacturing industry, both of whom are using this feature to improve their operations. This note captures some experiences of how the two organisations are leveraging Connect Xf’s new Audio Video call/conferencing facility to increase productivity

At Sharada Industries

More productive and cost effective Operational reviews.

Earlier Sharada Industries managers and C level people would have per-scheduled travel itenaries to visit and conduct reviews of the various plants located across India. Working this way caused them to either travel too much for impromptu/unscheduled meetings or postpone some important face to face reviews to the next planned trip, possibly leading to delayed decisions. By switching many of their review activities to using Baya’s video conferencing facility, the Sharada management can now schedule any number of virtual meetings as required, allowing them to take quicker business decisions without having to wait for the next trip or spend time and money on an urgent trip.

Getting technical expertise from across plants together with a click

Typically in such industries, it is common to have experts in different areas of the production spread across plants or may be concentrated in the HO. While the plants are staffed with operators to handle production from a set production system/assembly line, any major breakdown in the process, may need expert intervention. In the earlier days, experts would travel “urgently” to consult on such breakdowns. But now Sharada uses the video conferencing, gets the technical experts from various locations on the call and uses the web camera to demonstrate the problem situation. Using this the experts are quickly able to suggest fixes and guide the operators to make the fix, resuming the process quickly and efficiently.

At Mithi

Reduced Travel, More Effective Meetings for the sales team

Mithi has a geographically distributed sales network and used to sync up over a conference call earlier for weekly reviews. By shifting to using the multi party video conferencing facility, the sales team now meets “virtually” by simply signing into Baya at a scheduled time and connecting into the call from within the browser. Besides being able to see and hear each other, the system also allows them to exchange information over text chat and mail, making the video meeting more effective. Result is cost savings from reduced travel and more effective coordination of work. The unproductive time spent in waiting, driving or traveling can now be put to better use.

Work from home (or anywhere) without loss of productivity

Employees on leave or having commitments at home that keeps them from travelling to office, can now login from home and be fully in sync via Baya with email, calendar, chat. They can even attend meetings (face to face) via the video conferencing feature. For a larger group meeting, the team simply projects the display on a large screen and can involve multiple employees who are not physically present. This has provided immense flexibility to employees to cater to their home commitments, while still being able to engage with the team at work if required.

Greater flexibility with Virtual Face to Face meetings with customers

Most customers and partners would like a face to face before making important buying / selling decisions, but are hard pressed for time. Mithi converts many such meetings to scheduled video meetings. This gives the team a flexibility in organising and scheduling / rescheduling the meetings.
Baya allows the creator of the meetings to loop people into a video meeting by sending them a time bound invitation via email. This allows the team to bring in all the people necessary to resolve / discuss an issue at hand at anytime during the meeting.

Saving time and costs with Virtual Job interviews

The typical process at Mithi used to be to test a candidate using online testing mechanisms and then conduct multiple face to face interviews. For remote candidates, we would first have multiple interview rounds over phone and then invite the candidate over for the final interview to our office. By switching to using Video calls for all interviews, the HR team has an additional level of comfort since they have now seen the candidate and also the candidate gets to see the team (as they go through multiple interview rounds), without needing to travel at all. The system is more efficient, saves cost of travel and allows the team to opt to have another few rounds if needed at mutually convenient schedules without having to deal with the logistics of arranging for travel and stay.

So, how could you use the video conferencing feature in Baya to make things better at your workplace?

Why a Hybrid (Cloud+In Premise) solution is better than a fully In Premise solution for SMEs

Fully In premise solutions make maximum sense for enterprises only if they can afford to maintain a reliable private data center (resources, operations man power, infrastructure – power, cooling, spares, compute, storage, network and OS/App licenses) to ensure maximum uptime on all fronts. This is true for any applications and more so for collaboration and communication apps since they are most frequently used to enable teamwork. Add to that, if the mail exchanger (MX) server is located at the in premise data center, the need for maintianing a high uptime becomes even more critical to reduce chances of mail delays and possible mail loss on the inbound path. Additional investments and planning is also required to setup a DR site and backup systems.

Related: 5 Unavoidable Reasons to Adopt Collaboration Services on the Cloud

Considering all this and more, while we feel that for SMEs, using Cloud Based collaboration services is the default way forward,  its possible that you are unable to shift to an 100% cloud hosting model for various reasons.

In such a situation, you may want to consider hybrid solution , that allows you the flexibility and privacy of maintaining an in premise setup and adds on the benefits of shifting a part of your critical workload to the cloud.

Mithi SkyConnect - Cruise

Mithi SkyConnect – Cruise

  1. 99.9% uptime: The Cruise solution (hybrid) hoists the Mail exchanger and mail cleaning work load onto the cloud. This adds a very high level of reliability to this critical function, which “should be always on” since mail is always flowing in from the external world.
  2. Mini DR site: Additionally, the cloud platform can store inbound mail in respective mailboxes on the cloud itself for a defined period so that the user has an option to login to the cloud account and access these new mail in case of any disaster or temporary loss of access to the in premise server.
  3. SecureMailFlow: To top it all, the Cruise solution will also provide you with a SecureMailFlow (a mail cleaning service on the cloud)  to detect and quarantine spam and virus infected mail. Considering that most of the mail traffic is junk, this will free up immense amount of compute and network bandwidth (since only clean mail will make their way in) on the in premise infrastructure.
  4. Optimized local mail flow: You maintain your in premise server, so that the mail traffic amongst users on the LAN/WAN stays local within the server, which reduces load on the Internet Bandwidth and also speeds up the mail flow amongst the local users. In most organisations, local mail flow accounts for the bulk of the mail traffic.

Related:  Working of the Hybrid Cruise solution

Yes, while the Cruise solution may cost more initially since:
- Its a fully managed service that involves cost in terms of compute, storage and network infrastructure, man power resources and a NOC for maintenance and monitoring.
- The solution comes bundled with SecureMailflow, a mail cleaning service based on one of the top International security solutions.
In the long run considering that you will benefit from the following:
- Save on compute and bandwidth resources
- Save on manpower required to manage this aspect of your infrastructure
- Get 99.9% up-time on your most critical collaboration function (inbound/outbound mail delivery)
- Get access to a mini DR site, that can be used to retrieve inbound mail of the last ‘n’ days and also send mail out in case of an outage on the in premise site/server.
Our research and experience with customers has shown that you get a better overall ROI by choosing a hybrid over a fully in premise solution.

Enabling Email Spoof prevention on Connect Xf: Impact and Client configurations

There are several good resources on our website and blog that describe what email spoofing is , how impacts your business  and what you can do to prevent it on Connect Xf

It is strongly recommended that you enable the spoof check feature on your server to prevent internal spam attacks which eventually lead to a lot of junk mail escaping your servers into the Internet. Once this impacts your IP reputation, the outbound IP addresses of your server are likely to get blacklisted in RBL sites worldwide, causing a major impact on all your users

This article is about describing the full impact and plan to enable the spoof check feature on your Connect Xf server.

Step 1: What is the Expected and Correct client configuration if Spoof check is enabled?

Essentially when a user sends a mail using a mobile client like android, IOS, etc or desktop client like Outlook, Thunderbird etc. it is important that the following two configurations carry the same value:

  • The email id configured for the account.
  • The authentication email id configured for the account.

Only if these two email id values are the same, will the mail from this user be allowed to pass through the server.

The Reply to address can be same as the email or different if the recipient’s reply must land somewhere else

Why do users set these two email ids differently, while configuring their email account?

Typically most email servers will relay any type of mail once the user has authenticated himself. This means that once I have connected to a server, authenticated myself, the server now will become my servant and relay any mail for me (from anyone to anyone)

Lets see some typical reasons why users specify different email ids for authentication and for their account.

Scenario 1:

I am James and I work in the support department of Acme corp.
I configure my MS Outlook to authenticate with my email id: james@acmecorp.com
However I want replies to my email to come to support@acmecorp.com
So I will configure the account email id as support@acmecorp.com

Scenario 2:

I am Mary and I work in the marketing department of Acme corp.
I want to shoot a mail campaign to about 1000 users but any replies to the campaign mail should come to marketing@acmecorp.com
So I configure my Thunderbird to authenticate with my email id: mary@acmecorp.com
And I will configure the account email id as marketing@acmecorp.com

Essentially this configuration, where the authentication email id is different from the account email is is always done so that the replies come to a different email id.

Alright, I will make the changes you suggested, but I still want to achieve the objectives in the above scenarios.

You can still achieve the objective of having replies come to a different email id by configuring the “Reply To” email id in your account. This will ensure that when the recipient replies, the reply will be sent to the email id specified in the “Reply to” box. An image for this is shown below.

Check screen shots below of the WRONG configuration and the CORRECT configuration to be done on clients.

Android


iPhone


Thunderbird


MS Outlook

Step 2: Enabling spoof check on the server

Only after all the clients are configured as above, should you get into this step where you enable Spoof check for the “Default” SMTP address such that any connection from the end users will get checked for spoof check

Command to enable spoof check for the default SMTP control :

/mithi/mcs/bin/setsmtpcontrols.sh default -spoofcheck 2

Click here to learn more about SMTP controls

How the spoof check feature will work?

On enabling the spoof check feature, clients configured with invalid SMTP address are restricted to send mails.
Below are the sample error messages received.

Android

iPhone

Outlook

Thunderbird