Tuesday, 19 January 2010

Exchange 2007 Public Folder Mail Routing

We had a report recently that mail from outside the Exchange organisation destined for Public Folders was being returned in the form of an NDR, but all other mail was flowing fine.

To explain the problem, here’s a little background about the Exchange 2007 topology. We have two HUB servers that handle mail heading inbound and outbound of the organisation. Beneath that we have a lots of exchange deployments at physical sites with varying local configurations. To complicate things we have firewalls sat in front of these other deployments with some more strict than others. As we add more exchange deployments it can be a considerable task getting these firewalls adjusted to allow the new hub transport servers to communicate with the old, usually leading local administrators to notice queues forming on their sites.

I had all the information I needed to track the messages, so started by tracking the message at our two hub transports handling mail into and out of the system. The Public Folder that the message was being delivered to, only had one replica. I discovered that the message was being sent to what seemed to be a completely random hub server, not to the site where the replica existed. The messages were queuing there as the complaining administrators hadn’t opened their firewalls as requested. Fine I thought, get them to open the firewalls properly, but I wanted to figure out why the message was being sent to this strange server in the first place.

The answer lay in the following Microsoft TechNet Article - http://technet.microsoft.com/en-us/library/bb232041(EXCHG.80).aspx

The article explains how messages are routed for public folders. The start of our problems were because that our two Hub Servers that were receiving mail from the internet didn’t have a copy of the Public Folder Hierarchy to know where to route the message, in this instance it will look at the values of msExchOwningPFTreeBL a property of CN=Public Folders,CN=Folder Hierarchies,CN=First Administrative Group,CN=Administrative Groups,CN=Cymru,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=cymru,DC=nhs,DC=uk . All of the public folder stores should be listed in that property and the Exchange 2007 SP1 or SP2 categoriser filters them out in the following way…

1. Ranking by the age of the public folder database   By default, public folder databases that have an age threshold of less than two days are not considered unless the age of all public folder databases is less than the threshold or the age is unknown.

2. Proximity   The local server is preferred. If the local server does not contain a replica of the public folder database, a server in the same Active Directory site is preferred. If the local Active Directory site does not contain a replica of the public folder database, a server in a remote Active Directory site or routing group is selected as the preferred destination.

3. Cost   If more than one remote Active Directory site or routing group contains a replica of the public folder database, the server in the Active Directory site or routing group that has the least cost routing path from the local Active Directory site is selected as the preferred destination.

In the long term, I’d want the messages routed directly from our two entry point Hub Servers, but in the short term point 1 stopped us from just creating a Public Folder Database to store only the Hierarchy for routing purposes, two days might have been a problem. I created the databases anyway.

Our AD site layout is fairly simple , its a snowflake design where all of the AD sites with connections to our central site had all the same costs. The quick way to resolve this was to drop the cost of a site where you wanted these messages to be routed via, this solved the problem short term until the mandatory two days expired until the newly created PF Databases could route the messages itself.

OR the local admin could have opened the firewalls properly, but that would have been too easy. :-)

Friday, 4 December 2009

Upgrading Exchange 2007 Clusters to SP2 – Workaround

I posted last month about a problem delegating installs of Exchange 2007 SP2. Delegated Admins will receive an error message stating the following…

You must be a member of the 'Exchange Organization Administrators' or 'Enterprise Administrators' group to continue.

Have been looking into the issue and have had a case open with Microsoft. Turns out that you only get this issue on a fully patched server. If you try upgrading or installing as a delegated admin on a fresh install of either server 2008 or 2003 you don’t see the problem either with Exchange SP1 or SP2. I haven’t had time to identify exactly what patch causes this yet, if I’ll bother at all.

If you have patched your server though, MS came up with this workaround.

  1. Disable update checking for the BPA by heading into the registry and HKCU\Software\Microsoft\Exchange\ExBPA and either creating or modifying a DWORD named “VersionCheckAlways” and set it to ‘0’
  2. Copy the installation files to a local drive and replace Setup\ServerRoles\Common\en\ExBPA.PreReqs.xml with this Modified XML File

Once you’ve done this you can ignore all Pre-Requisite Checking for the install. I was strongly advised my Microsoft that you should ensure that there are no other Pre-Requisite Failures by running an unmodified setup before making the changes above.

Microsoft have said that they’ll pass this to the product group for a fix.

Sunday, 29 November 2009

Upgrading Exchange 2007 Clusters to SP2 - Continued

Regarding my previous post delegated installs and upgrades to SP2, see here - http://daiowen.blogspot.com/2009/11/upgrading-exchange-2007-clusters-to-sp2.html

Microsoft has informed us that this will be classed as a bug and is working on discovering the cause before saying if they will fix the problem or not.

Geographically Dispersed CCR Cluster

I recently had the opportunity to install a geographically dispersed CCR Exchange 2007 cluster.

Server 2008’s cluster features can now handle clusters on separate subnet’s making the fact that the only data centres available were operating on Layer 3 wasn’t a problem. I didn’t need to stretch a VLAN across physical sites.

Configuring the networking for the cluster went slightly against the grain for me. Essentially the Private networking element has gone for these types of clusters, because all traffic, heartbeat and all has to go over the public network. That said, it was a simple process. I configured the networking using four NIC’s, three were teamed and another was on its own but it was set not to register in DNS. I didn’t want client traffic coming over the single NIC.

When you set up the cluster you simply enter two IP addresses that the cluster can use, and on failover, one, the one that’s not on the subnet the active node is in, will stay offline, sounds nice doesn’t it, but wait.

Even though you don’t have to stretch a VLAN anymore for this type of cluster. Exchange 2007 still requires cluster nodes to be in the same Active Directory site. This means that if you are planning for the disaster of losing a site, then you’ll need two DC’s in each site in the same AD site so that each node will always have a DC in the event that you loose one of the physical sites. You can’t use DC siteCoverage for this, as I discovered.

With the cluster set up I set up a combined HUB CAS in each physical site. Exchange will load balance mail flow to each HUB Transport Server by itself, but what about CAS connectivity. Autodiscovery service will handle Outlook Web Services, such as OAB & Out of Office etc, but what about Outlook Web Access. On the same subnet you’d use NLB to provide users with a single resilient point of entry to OWA. That’s no good on separate subnets unless you have a hardware load balancer, which I didn’t. So the OWA failover process became a manual process using CName’s in DNS. Not the nicest of solutions.

Another issue… You can’t put a Public Folder Database on a CCR unless it’s the on CCR in the Exchange Organisation. So Public Folders were to be sat on the HUB/CAS servers with content replication between each server. But in the event of a loss of one of those PF servers, it’s a manual failover process to get PF access back. You need to change the Default Public Folder Database for each Mailbox Database in the CCR. But that’s the same for any Public Folder failure.

So now we have two parts of the failover that requires manual failover, not nice, was starting to not like separating my Cluster over different subnets.

Issue number 3… When cluster failover occurs, the cluster IP changes. Meaning that unless all your clients are sat on the same AD site this change of DNS record will take time to replicate to them. By default the TTL of cluster DNS names is 20 Minutes. Meaning that in the worst scenario, your clients could be waiting 15 minutes for AD replication plus 20 Minutes for the DNS record to expire on their machines. 35 Minutes is a long time. Not really acceptable either. You can alleviate this issue by reducing the TTL of the record. I reduced mine to 3 Minutes. Another change you can make is by enabling change notification on the AD site links between the Cluster’s AD site and the AD site/sites where the clients sit. This brings the failover time down to 3 Minutes. Another change we made was in group policy… We created a GPO that configured Outlook not to complain about connectivity issues for 4 Minutes after disconnection from the Exchange Server.

This configuration meant that during a failover the majority of clients would not notice a problem unless they were sending emails and noticing that they were sitting in their outbox.

So with the exception of OWA and Public Folders, the system was quite acceptable. Just after covering off all of the above problems, space became available in our main data centre. We could now stretch a VLAN between these sites. So I reconfigured the networking and put each node in the same subnet. And guess what, most of the problems above went away. With the exception of Public Folder failover, but I can’t get these people to use the SharePoint servers available in the organisation, so I’m afraid that they’ll just have to live with that  :-).

Tuesday, 17 November 2009

Duplicate legacyExchangeDN Properties

Had a case recently that wasn’t immediately obvious to resolve.

We had reports of a user that no one was able to e-mail due to duplicate addressing. At first look there was no duplicate addresses on the object. We were receiving the following NDR’s

There is a problem with the recipient's e-mail system. More than one user has this e-mail address. The recipient's system administrator will have to fix this. Microsoft Exchange will not try to redeliver this message for you. Please provide the following diagnostic text to your system administrator and then try resending the message after the problem has been resolved.

IMCEAEX-_O=ORGNAME_OU=EXCHANGE+20ADMINISTRATIVE+20GROUP+20+28FYDIBOHF23SPDLT+29_CN=RECIPIENTS_CN=NAME+2ESURNAME@DOMAIN.SUFFIX
#550 5.1.4 RESOLVER.ADR.Ambiguous; ambiguous address ##

Further investigations showed that there was a problem with the way that the user was shown in the Exchange Address Books. It seemed as though the object was being confused with another user with the same name.

Comparing the properties of the two users revealed that their legacyExchangeDN properties were the same. The result was that the users were being confused in the Address Lists and no one was able to e-mail either due to this duplication.

The resolution was to change the container name that represents the user to another unique value, we changed ours to the users sAMAccountName value.

o=EXCHORG/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=firstname.surname

to

o=Cymru/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=sAMAccountName

The only problem with renaming this value is it will break reply ability if senders Outlook Cache is not removed.

As to how this happened, we believe it’s because we have multiple installations of the Quest Migration tools running against the same AD domain, and they happened to be migrating a user with the same name and populated the property with the same value.

Friday, 13 November 2009

Upgrading Exchange 2007 Clusters to SP2

In the Exchange Organisation I look after at work, we have quite a few Exchange Clusters. We have SCR & SCC clusters across multiple sites and ran by different subordinate administrators.

With the release of SP2 for Exchange 2007 we went about testing implementing SP2 and getting it rolled out. Unfortunately, our test lab doesn’t include any clusters, something we’ll have to address now, but I digress.

We installed SP2 on the Exchange servers we manage ourselves without issue, again, no clusters.

When it came time for the local admins to install SP2, they hit a problem on their Exchange Clusters. Following the steps described in this Technet Article - http://technet.microsoft.com/en-us/library/bb676320.aspx the attempts failed with the following error…

You must be a member of the 'Exchange Organization Administrators' or 'Enterprise Administrators' group to continue.

on inspection of the ExchangeSetup.log the prerequisites check failed with the following error.

[ERROR] The operation could not be performed because object '<server>' could not be found on domain controller '<domaincontroller>.<domain>'.

The install works fine with Exchange Organisational Administrator permissions, but it’s not ideal to go around each cluster and do it ourselves, we have quite a few and don’t want the blame for any subsequent failures.

We logged a call with Microsoft over a week ago now, and have been troubleshooting with them. They can reproduce our problem in their labs. Until then, it looks like we’ll have to upgrade the clusters ourselves.

I’ll post an update as soon as / if Microsoft come back to us with a solution.

Monday, 19 October 2009

83-640 Exam “Windows Server 2008 Active Directory, Configuring”

MCTSI passed 83-640 “Windows Server 2008 Active Directory, Configuring”  today. The exam is a replacement of 70-640 with exactly the same tested skills. This was my first Microsoft exam with with a simulated testing environment so I thought I’d write a bit about it.

The exam itself is split into three parts, two Virtual Labs followed by the more familiar multiple choice questions, there were thirty questions.

The VM labs, in my case were identical, and most of the tasks are fairly simple to complete. I got caught out on a CA question that I couldn’t remember a command for. Things like adjusting the dataset of the Global Catalog, configuring site replication and bulk update of AD objects should all be fairly common place for most administrators.

The multiple choice questions posed no major difficulty. There were the odd one that I wasn’t familiar with, but some of the answers were obvious.

My main problem with the test was the speed of Virtual Labs. The machines are somewhere on the internet, and proudly show the speed of the VM CPU as 7MHz with 1024Mb of RAM on the desktop using BGInfo. Believe me, they are slow mouse clicks take seconds to register and MMC consoles take minutes to open. Even though this was an annoyance, there is still plenty of time in the exam to complete the tasks.

The best part of the new VM testing method is that help is available as it would be on a normal install. While this isn’t always a help, on one VM Task it saved my bacon.

Generally though, after sitting the exam, if you’ve learned about the new features of 2008, Rights Management & Federation services and update your knowledge of PKI’s you should be fine. Everything else is the same as the 2003 exam.

Next up is 70-642, a couple of weeks revising for that and I’ll let you know how it goes.