Access from different subnets

htt · October 14, 2020, 3:52pm

I have an odd issue here and i’m sort of going down the list to get to the root cause. Is there anything in UCS that will prevent connecting from different VLANs?

I have a handful of UCS servers in a “server” vlan along side various other servers. There are another 4-5 VLANs for various other purposes. We have 2 pfsense firewalls in HA configuration to cross VLANs. All of this is running on a Proxmox cluster.

All of a sudden we cannot connect to some of our UCS servers, on the server VLAN from other VLANs. It looks like the packets are being dropped. My first thought was to look at the firewall but I don’t see the connections being blocked and even from subnets where there are any:any rules the UCS servers are not accessible. I’ve tried both firewalls and i even migrated the firewalls to different hosts in the Proxmox cluster but the results are the same. What’s very odd is this only affects UCS servers, but not all of them. The issue starts with one server and eventually moves on to others but only the UCS servers, and not all of them. The servers are always accessible from the server VLAN. We got hit hard with storms last week where we lost power, and after a full system shutdown all UCS servers but one are accessible again from different subnets. However, just a couple days later we now have 2 UCS servers we can not access.

I’m not convinced this is a UCS issue but as I said a couple of these subnets have any:any rules and every other server on the server vlan is accessible from those subnets, so i have to ask. Is there anything in UCS that will prevent or promote connecting from different VLANs?

Kind regards

Kevo · October 14, 2020, 7:23pm

I’m assuming you’re talking about IP connectivity? Is the UCS server on a single VLAN? How are you routing between the VLANs?

If I couldn’t find a config reason for this type of problem, this is something where I would usually just run a few packet captures at well chosen locations to see what device is dropping the packets or not processing them the way I wanted. I would probably start on the UCS box to make sure the packets aren’t making to the box with the issue. Then I would work my way backwards until I found the device that was dropping them.

htt · October 15, 2020, 12:11am

@Kevo - Yes, I’m speaking about the web interface and/or what ever service is running on the server. Currently I can not access our file server or the Zammad server on any port from other subnets. It’s as if the servers are not there. All servers are on a single subnet, to be clear it’s not a VLAN, the traffic isn’t tagged. The servers are on their own subnet, I have VLANs for other subnets. The Firewalls are on the same subnet as the servers and there are interfaces configured with the various VLANs. Firewall\routing rules allows traffic from the VLANs to the server network and the other way around. Router on a stick, i believe is one of the terms.

I basically did as you’ve suggested and I’m out of ideas. I don’t know what I don’t know, so I’m checking to make sure there isn’t some sort of config in UCS that could limit connections from remote subnets. Like I said, I’m looking at UCS because they are the only servers I have that are experiencing this. If I ping or traceroute from the firewalls I get a timeout to the UCS servers when the source is anything but the server interface. This pretty much points at the firewalls, expect 2 of these interfaces have any:any rules to the server network. The rules are working because I can hit all the other servers on the server network except these particular UCS servers. What’s also weird is how they were working and then suddenly stopped, one at a time over a few days. With any:any rules I would expect all or nothing if there were an issue with the rules. I wouldn’t expect it to work today but not tomorrow for a handful or servers. Besides, I can see in the firewall logs that it’s allowing the traffic to the affected servers on the expected ports.

I’ve rebooted the firewalls and the VMs several times with no luck. I’ve also created explicit rules and routes to the affected servers. After powering down the entire cluster during the power outage all but one UCS server became accessible again from other vlans. Unfortunately after a couple of days, now 2 servers are inaccessible from other vlans. At this point I’m reaching for anything that could possibly cause this.

Kevo · October 15, 2020, 12:41am

Yes, sounds like an odd problem. If you can ping from the server lan interface of the firewall when the problem is happening, then I would say you’re right. It must be an issue with the firewall setup. It might not necessarily be a rule. It could be a routing issue of some sort. Not sure what to suggest at this point other than basic troubleshooting stuff. If it were me, I’d probably start simplifying the setup until I got it working and then start adding things back.

I’m somewhat familiar with pfsense as I run opnsense, but I’d be suspect of the HA setup and probably drop that and try to get everything working with just the one firewall first and go from there.

htt · October 15, 2020, 2:23am

@Kevo - Thank you, Thank you!! You gave me the nugget I needed to get to the problem. A combination of things you said lead me to look closer at the IP assignment. Sometimes you just need to talk it out i guess. Since I already tried disabling HA before on the firewalls and dialed everything back to a more basic setup(that’s how I ended up with any:any rules lol), I revisited the IP configuration of the servers. I was reminded of a previous issue I was having with UCS. I figured with static IP addresses it wouldn’t be a problem but the issue is larger than I thought.

For some reason I’m having trouble with the network configurations on the UCS servers. I set it to static, it switches to DHCP, I create a reservation and it’s eventually ignored. For some reason these servers switched from static to DHCP. For some other reason the DHCP server didn’t hand out a default gateway (it gives it to other devices on the subnet however). Without a gateway the packets weren’t making it back to my test machines. I just flat out missed it the first time I looked at the server and never looked back at the server since the issue sprung up on other servers. How the servers was able to access the internet to detect available updates, who knows…

I just looked closer and found 3 other servers that were not accessible from other subnets, 2 of which were sharing an IP address. I’m not sure why DHCP has a mind of it’s own. I reconfigured them to static with the correct information and now they’re accessible from other subnets.

My issue is actually the IP configuration on the UCS servers keep changing. I had an issue my first month on UCS where the DC master IP address kept changing. I thought I resolved this then but apparently I did not… If there are any suggestions to this new problem I’m all ears. For now I can access the server from everywhere I’m supposed to.

Kevo · October 15, 2020, 1:13pm

I have not experienced any issues with the IP settings changing on my UCS server, which is actually running under ProxMox as well. I set it up during install with the IP I wanted, and haven’t messed with it since. I can imagine that if you changed it after install that might cause some issues. I have read a few times that the IP and domain shouldn’t be changed after setup. I think that it has to do with the directory services as I know that used to be an issue with Mac OS X server as well. However, I have never tried changing the IP or domain with UCS, so I’m not really sure what the side effects would be.