Wednesday, December 7, 2016

Lessons from running a Zimbra mail server

This should maybe be titled 'Lessons from running a SMTP mail server' as it's not so much Zimbra that's been the issue, but fighting spammers.  I'm not doing anything fancy, I only have a single VM running the free edition of Zimbra with not many mailboxes, the trouble is that any presence on TCP/25 on the Internet will have the same issues - getting you mails delivered, while not getting overwhelmed by spam being received.
With only a few mailboxes the former should be easy, except that out of the box my setup wasn't secure enough to prevent tons of spam email being relayed - no I wasn't an open relay, but I hadn't rejected unlisted senders, which means someone sent a bunch of stuff 'from' my domain, which I wanted Zimbra to relay (due to system emails from vCenter / NetApp etc.)  I turned that off:

zmprov mcf zimbraMtaSmtpdRejectUnlistedRecipient yes
zmprov mcf zimbraMtaSmtpdRejectUnlistedSender yes
zmmtactl restart
zmconfigdctl restart

Receiving email should be a breeze too, except once I'd tightened the spam scoring to reduce spam to a manageable trickle, false positives crept in too.

In /opt/zimbra/data/spamassassin/ there are several configuration files, I found it useful to whitelist several domains from which I regularly receive mail, adding them to localrules.cf, though I found that they still got rejected sometimes.  I fixed that by changing the score for whitelisted domains in ../rules/50_scores.cf though these aren't supposed to be modified so be warned that an upgrade may well overwrite my changes.

After making changes you need to restart Zimbra, or at least 'zmamavisdctl restart' (as Zimbra user)

Other things that have been useful to me have been around collecting data from the command line, using zmprov to grab stuff out of the DB that I can then look at /pipe to a file:

zmprov -l gaa   - lists all accounts 

zmprov gadl     - lists all distribution lists

zmprov gdl mailinglist@example.com | grep zimbraMailForwardingAddress: | awk {'print $2'}  - lists members of a distribution list

for i in `zmprov -l gaa` ; do zmprov ga $i zimbraMailAlias ; done  - lists aliases


Tuesday, October 11, 2016

HP Comware and Procurve switches

Oh the joy of finding a completely new command line syntax...

I've used HP switches a good deal in the past, all manner of Procurve models, mostly modular ones as they were an inexpensive rack solution when dual power supplies were a requirement.  These days top of rack switching is so normal every vendor makes 1U datacenter switches with dual power, and when someone asked me for a recommendation I surfed a little bit and suggested HP 5900 as a cost effective option for 48 gigabit ports and 12 ten gigabit ones.  Little did I know they'd ask for my help configuring them - they are certainly very powerful, but it took me long enough to figure out how to enable SSH and basic layer 2 stuff and more features are being added to the Comware OS every few months it seems, including hardware VXLAN VTEP by the looks of it, though when/whether they will get that certified/supported is anyones guess.

On a Procurve I'd enable SSH with:
ip ssh
ip ssh filetransfer
no telnet-server

With the only caveat being that on the old switches I have in my home lab creating a self signed cert / RSA keys on the command line doesn't work, though it does in the GUI.

Back to the Comware based switch:
It expects you to have an enterprise RADIUS system to authenticate against and it took a lot of figuring out to create a self contained config.

system-view
public key local create rsa
ssh server enable
sftp server enable
ssh user simon service-type all authentication-type password
user-interface vty 0 15
authentication-mode scheme
protocol inbound ssh

There's a free ebook available from HP that may be helpful too,
https://h30590.www3.hp.com/product/HP+Networking+and+Cisco+CLI+Reference+Guide+-+Version+2-PDF-8407
now updated for version 7 of Comware.

The usual necessities:

dns domain sjhwilkes.local
dns server 10.206.3.5
dns server 10.206.3.17
ntp-service enable
ntp-service unicast-server 10.206.3.1

It took me ages to stop typing show and use display instead, and likewise no becomes undo in order to remove lines from the config.

On my (ESXi) host facing ports I have:

port link-mode bridge
 port link-type trunk
 port trunk permit vlan 1 10 to 11 15 101 to 102 150 254

Which hardcodes them to be dot1q trunks with a selection of VLANs permitted and VLAN 1 native (though I don't actually use it for anything, force of habit as it was a security recommendation many moons (years) ago)

I'm not doing LACP to my hosts, the amount of messing I do with difference versions of NSX and vSphere it's easier to stick with failover/manual.
I experimented with LACP to my old NetApp 2020, which looked like:

interface Bridge-Aggregation10
 description laxnas01
 port link-type trunk
 port trunk permit vlan 1 15
 link-aggregation mode dynamic
 lacp edge-port

Then on the constituent ports:
 port link-mode bridge
 description laxnas01-e0a
 port link-type trunk
 port trunk permit vlan 1 15
 port link-aggregation group 10

I'd still like to figure out if I can put the management interface into it's own VRF and have some sort of back door into the rack - difficult without springing for another circuit of some kind though.

To silence log messages about non-H3C transceivers (which work anyway):
transceiver phony-alarm-disable

Monday, August 29, 2016

RSA SecureID Authentication Manager 8.2

To update the notes from the 8.1 post, I had a working setup with a primary and replica 8.1 AM server, and a web server for each.

Updating the Authentication Manager's themselves was straightforward, edit the VMs to add a CD-ROM drive and mount the ISO of the 8.1SP1 update - 8.1.0 directly to 8.2 is not supported.  Take a snapshot of the working 8.1 VM.  Enter the Service Console, and navigate to updates in the maintenance menu.  Then set the CD as the update source, do a scan, then select install on the resulting option.  This got both AM servers to 8.1.1 in fairly short order.  Delete the snapshots when complete.

Repeat to go from 8.1.1 to 8.2.

In theory the web servers are similar, in practice I tried to update them to 8.1.1 and somewhere along the line things went awry and the primary one went into status 'reinstall required' while the secondary just became disconnected altogether.

I uninstalled the RSA software from each of them and reinstalled complete with a new web tier package file from the Manager, and all was well.

Update

- All wasn't well, replication was broken.  I found RSA DOC 49528 with a fix for it:

SSH to the primary as rsaadmin,

cd /opt/rsa/am/utils
./rsautil manage-secrets -a get com.rsa.db.dba.password
com.rsa.db.dba.password: blah blah long password here
cd ../pgsql/bin
./psql -h localhost -p 7050 -d db -U rsa_dba
./psql -h localhost -p 7050 -d db -U rsa_dba
Password for user rsa_dba: blah blah long password her
psql.bin (9.4.1)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-SHA, bits: 256, compression: off)
Type "help" for help.

db=# select * from rsa_rep.IMS_INSTANCE_NODE;

(returns a table of your authentication manager instances)

db=# update RSA_REP.IMS_INSTANCE set deployed_state='out_of_sync' where is_primary='FALSE';
UPDATE 1
db=# 

Then you can go back into the Operations Console and select manual sync within replication reports and things are then fixed.

Thursday, August 11, 2016

Adventures in 10 gigabit Ethernet for a home lab

I wanted 10 gigabit to my home 3 node vSphere cluster, perhaps excessive, but even with 4 gigabit ports per host vMotion and VSAN performance is less than I wanted.  My side plan being to retire my old NetApp 2020 in favor of all flash VSAN, as the NetApp though reliable, is dog slow being based on 7200 RPM 500GB SATA drives.

The best option I could find was an old HP 6400CL, which is 6 ports of CX4 plus a slot for an extra 2 ports, so 8 ports for circa $250.  My existing 3400CL 48 port gig switch took one of the same modules so now they have 20 gig between them.  The only spoiler was the immense cost of SFP+ to CX4 cables, over 300 for 6.  I found low profile Mellanox single port PCIe NICs for $10 each.

Foolishly I purchased the above but didn't get around to installing it for the best part of a year, and then lo and behold, it doesn't work.  I got link lights on the NIC end (and status in ESXi) but the switch didn't see link so no traffic passed.  Troubleshooting was going to be expensive - I could buy an Intel X520 NIC (my preferred choice but much more expensive than the Mellanox ones), new cables, or find another CX4 switch.  I might have been more inclined to go this route were my lab at home, but driving to a colo and paying for parking / losing half a day = not attractive.

I bought a H3C S5820X and 6 X SFP+ to SPF+ cables, which was much simpler.  Switch was 300 and SFP+ cables $25 per on Amazon with Prime delivery, I could have got them for 15 had I been prepared to wait for them to come from Hong Kong.  Installed and working, almost fine.  Turns out one of my NICs is bad too!  Argh.  (yes I switches cables/switchports to be sure)  I can address that another day, but at least my VSAN and vMotion traffic has 10 gigabit now.

I did try a CX4 - SFP+ connection between the new and old switches - no dice, which makes me think that despite my finding cables SFP+ connections do not ordinarily support the CX4 protocol at all and that path was a rat hole.  The 5820 has 14 X SFP+ ports and 4 X 10/100/1000, so I also have enough ports that were I to add a 4th host it wouldn't be a blocker (and would enable VSAN dedupe / erasure coding)


Postscript
I couldn't find another matching Mellanox NIC, so I bought 3 X Intel X540-DA2 cards complete with 2 SFP+ cables each on eBay.  I switched out the bad card and one in each of the other host, so now all three have 30 Gigabits into the switch - bit excessive but whatever.  I like that the Mellanox can handle VSAN traffic and be left alone, while I regularly upgrade / mess about with NSX on the Intel NICs.

Thursday, August 4, 2016

Verify Cisco IOS against MD5 / SHA hash

I'm not sure if this is exactly a problem:

2911-2 uptime is 2 years, 42 weeks, 3 days, 21 hours, 7 minutes

but it seems sensible to update IOS once every few years (yes I am joking, a actual maintenance cycle of six monthly or whenever there's a critical security patch) just for the many security patches that will have occurred.  Now as this box is a long way from me and I don't have the time or money to travel to it I wanted to actual verify the bits I'd installed on the flash were good.  

To verify Cisco IOS image is valid against it's internal SHA hash:

2911-2#verify flash0:/c2900-universalk9-mz.SPA.154-3.M4.bin
Starting image verification
Hash Computation:    100% Done!
Computed Hash   SHA2: 4363F1CFF3EF05BB32E48BB49C9E03B3
                      5D7C9D91F351C095E94E82267DCC5719
                      7C5D1CC1669184B20A37CF9DD710806B
                      7388298DB7DD5B18581330D3F388B77A
                     
Embedded Hash   SHA2: 4363F1CFF3EF05BB32E48BB49C9E03B3
                      5D7C9D91F351C095E94E82267DCC5719
                      7C5D1CC1669184B20A37CF9DD710806B
                      7388298DB7DD5B18581330D3F388B77A
                     
CCO Hash        MD5 : 9F652984B1DBB1146AF25DCD5F6F5020

Digital signature successfully verified in file flash0:/c2900-universalk9-mz.SPA.154-3.M4.bin


verify /md5 (flash0:/c2900-universalk9-mz.SPA.154-3.M4.bin) = 9f652984b1dbb1146af25dcd5f6f5020



In both cases reassuringly the same, which made me feel better about scheduling a reboot and not waiting up for it to happen at 2:00 AM.

Tuesday, February 16, 2016

Supermicro IPMI

Hope you have more luck with Supermicro IPMI interfaces than I, they're not old school iLO or DRAC like in reliability for me, and I guess there aren't many developers working on keeping the few models updated.  Not quite Enterprise grade is what I'm saying I guess.  Like so many products if they could just increment it so support current Java it would be less painful.

Remote power management is vital though, as is remote console and media. I really, really don't want to drive (or fly) to a colo just to reboot a purple screen or do an O/S reinstall.

The first of my recent issues was unusual as anything I touch normally is running vSphere ESXi rather than Linux, in this case I was trying to recover IPMI access on a box running Ubuntu on the bare metal.
The card was in some sort of funky state, where is didn't respond on 443, but on 22 I could see the SSH banner (with telnet), but not login with ADMIN ADMIN with SSH.

Some Googling later I downloaded and installed the Supermicro IPMIcfg utility,
ipmicfg_1.23.0_general_20151106.zip, stuck the 64 bit binaries on the affected host, ran:

modprobe ipmi_si
modprobe ipmi_msghandler
modprobe ipmi_devintf

These load the kernel modules to enable the utility to talk to the hardware.  Then you're ready to run useful commands:
ipmicfg-linux.x86_64 -m   # Which lists the IPMI IP and MAC
ipmicfg-linux.x86_64 -user list   # Which gives a list of users and privilege levels
ipmicfg-linux.x86_64 -r   # Which performs a reboot of the card

It allowed me to confirm the IP address and username was correct at least, then do a reset of the card, after which access was restored.

The second issue was with my lab, somehow I've created users that don't work and deleted ADMIN.  I can login to the web interface as me and it shows 'simon / administrator' but all the options say 'You have no permission to view this section.' with the exception of a few read only ones.
I then managed to purple screen (I presume as I couldn't get to the remote KVM) one of my hosts so really wanted to issue a reset.
IPMIcfg is no use as I'm neither running Linux nor have a running O/S, so I had a long play with another tool, SMCIPMITool.  Also downloaded as a static binary only this time it is used against a remote target card.

SMCIPMITool 10.10.10.1 ADMIN "ADMIN" ipmi sensor   # for example reads the sensor status

I found I couldn't do anything useful with the command in this form, but when I tried the shell option,

SMCIPMITool 10.10.10.1 ADMIN "ADMIN" shell

I eventually found that though 'power cycle' 'power reset' etc. returned an error (possibly permissions due to my weird account) 'power off' and 'power on' did work and I got my server back without venturing into the LA traffic.

Upon even further experimentation I also discovered that I can create new user accounts, so one quick,
user add 4 testuser testpassword 4
later, and I have an account which works properly in the GUI**.

Though getting the virtual KVM to work in modern Java on a model O/S was a huge pain - the applet is signed with a key that's both short and MD5, so even after whitelisting the IPs of my IPMIs I have to edit java.security to reenable MD5 certs and permit 256 bit certs.

Friday, February 12, 2016

RSA SecureID Authentication Manager 8.1

RSA SecureID Authentication Manager (AM), is one of those bits of software it seems I have to install once every five years or so, during which time I've lost all memory of how I did it, and anyway the product has probably evolved enough any knowledge would be out of date.
This time round OVA packaging of the appliance itself has simplified that bit of things, but the addition of a web tier for soft token distribution and user self-service added some complexity.

I don't think AM needs a lot of notes, but the complexity of licensing it and provisioning the tokens is exponentially greater than last time I did it, I'm guessing as a result of some well publicized breaches that have occurred.  Follow the docs and though tedious you end up with the required files and the application to decrypt the token seeds.

The tokens came on a CD - finding a way to read it took me a while, Celeron Linux mini system from the back of the garage pressed into service for that.  Then you use the codes printed on the CD to create a decryption file and password on the RSA site, then use the application to turn those plus the encrypted token seeds into something you can import into the app.

The AM web GUI is horribly unreliable for me, and I've tried Chrome, Mozilla, and IE, with IE being the least bad - though I still need to frequently mouse over a different tab in order to get menus to show up in the tab that I need - it took me a long time to realize this as first I thought it was a permissions issue, so I wasted time creating various different classes of administrator, logging in as them and finding still no luck on the menus.

The web tier install was complicated by RSA/EMC only supporting RHEL, which of course I don't have.  CentOS 6.5 seems to work fine but you have to change /etc/redhat-release to
'Red Hat Enterprise Linux Server release 6.5 (Santiago)'
so the RSA installer doesn't complain and exit.

I had various permissions issues, I gave up and chown rsauser / chmod 777 all the install files and their directory - I deleted them all after the install anyhow so why mess about.

Usual Linux best practices apply, NTP is vital due to the tokens etc. open-vm-tools, and the only other thing that caught me out despite my looking for it was that iptables blocked 443 out of the box, adding a rule:
'-A INPUT -p tcp -m tcp --dport 443 -j ACCEPT'
 to /etc/sysconfig/iptables solved that.

To recover the Super Admin account, run: 
./rsautil restore-admin –u [tempadmin_name] –p [password]
from /opt/rsa/am/utils as the console user elevated to root.