This lab has been sanitized for portfolio purposes. All sensitive information including company names, client identifiers, IP addresses, and server hostnames have been replaced with generic placeholders (Company X, xxx.xxx.com, etc.). The troubleshooting methodology and technical approach remain authentic.
📋 Lab Overview
This interactive lab simulates a real-world SSH connectivity issue reported by a customer. You'll walk through the complete troubleshooting process, from initial assessment to resolution, learning industry-standard diagnostic techniques along the way.
Ticket #78432: "Multiple users from Client ABC cannot establish SSH connections to remote VPN gateway for administrative tasks. Connection attempts timeout after 30 seconds. Affecting 5 administrators across different locations."
- Customer Organization:
- Client ABC (Enterprise - 1,500 users)
- Affected Service:
- SSH access to VPN gateway (vpn-gw-abc.xxx.com)
- Error Message:
- "ssh: connect to host vpn-gw-abc.xxx.com port 22: Connection timed out"
- Impact:
- Administrators cannot manage VPN configurations, impacting deployment schedule
- Reported By:
- Client ABC (IT Manager) - clientabc@xxx-client.com
🔍 Investigation Steps
Click on each step below to expand and follow the troubleshooting process:
Objective
Gather critical information about the issue before diving into technical diagnostics.
Questions to Ask
- When did the issue start? (Timeline helps identify changes/deployments)
- Is it affecting all users or specific ones?
- Are users connecting from corporate network or remote locations?
- Has anything changed recently? (Firewall rules, network config, SSH server updates)
- Can users SSH to other servers successfully?
Initial Findings
Customer Response:
- Issue started approximately 2 hours ago (around 14:00 UTC)
- Affecting users from multiple locations (HQ and 3 branch offices)
- Users can access other internal servers via SSH
- No recent changes reported by customer's IT team
- VPN clients are connecting successfully (only SSH admin access affected)
Your Analysis
Based on these findings:
- ✅ Issue is specific to this VPN gateway (not client-side SSH problem)
- ✅ Started recently (suggests a change or failure occurred)
- ✅ Affects multiple locations (rules out local network issue)
- ⚠️ Need to check our infrastructure logs for changes around 14:00 UTC
Objective
Verify that the SSH service (sshd) is running on the target server.
Access the Server
First, access the VPN gateway through our management console (since direct SSH is failing):
# Connect via console access (alternative method)
ssh -i ~/.ssh/company_x_admin.pem admin@console.xxx.com
# Once connected to console, access the VPN gateway
console> connect vpn-gw-abc.xxx.com
Check Service Status
# Check if SSH service is running
sudo systemctl status sshd
# Alternative: Check if process is running
ps aux | grep sshd
# Check if SSH port is listening
sudo netstat -tuln | grep :22
Expected Output
● sshd.service - OpenSSH server daemon Loaded: loaded (/lib/systemd/system/ssh.service; enabled) Active: active (running) since Mon 2025-02-19 14:05:32 UTC Main PID: 1234 (sshd) tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
Findings
Result: SSH service is running normally. The service restarted at 14:05 UTC (5 minutes after reported issue start). This is suspicious and warrants further investigation.
Objective
Examine logs to identify what caused the SSH service restart and any connection failures.
Check Authentication Logs
# View recent SSH authentication attempts
sudo tail -n 100 /var/log/auth.log | grep sshd
# Look for errors around 14:00 UTC
sudo journalctl -u sshd --since "14:00" --until "14:10"
# Check for connection timeouts
sudo grep "Connection closed" /var/log/auth.log | tail -20
Critical Finding in Logs
Feb 19 14:00:15 vpn-gw-abc sshd[1234]: error: Bind to port 22 failed: Address already in use. Feb 19 14:00:15 vpn-gw-abc sshd[1234]: fatal: Cannot bind any address. Feb 19 14:05:30 vpn-gw-abc systemd[1]: sshd.service: Main process exited Feb 19 14:05:32 vpn-gw-abc systemd[1]: sshd.service: Automatic restart scheduled
Check System Logs for Related Events
# Check what else happened at 14:00
sudo journalctl --since "14:00" --until "14:10" | grep -i error
# Check if configuration was changed
sudo grep "sshd_config" /var/log/syslog
Analysis
SSH service crashed at 14:00 due to port binding conflict. The service auto-restarted at 14:05, but this explains the 5-minute outage window. Need to investigate what else was trying to use port 22.
Objective
Verify network connectivity and firewall rules to ensure SSH traffic can reach the server.
Test Network Connectivity
# From your workstation, test basic connectivity
ping -c 4 vpn-gw-abc.xxx.com
# Test if port 22 is reachable
telnet vpn-gw-abc.xxx.com 22
# Or use nc (netcat)
nc -zv vpn-gw-abc.xxx.com 22
# Check route to server
traceroute vpn-gw-abc.xxx.com
Check Firewall Rules on Server
# Check iptables rules
sudo iptables -L -n -v | grep 22
# If using firewalld
sudo firewall-cmd --list-all
# Check if port 22 is allowed
sudo firewall-cmd --query-port=22/tcp
Check Security Groups (Cloud Provider)
Since this is hosted infrastructure, verify security group rules via API:
# Query security groups via API
curl -X GET "https://api.xxx.com/v2/infrastructure/security-groups?server=vpn-gw-abc" \
-H "Authorization: Bearer xxxTOKENxxx" \
-H "Content-Type: application/json"
Findings
Network Check Results:
- ✅ Server is reachable via ping (no packet loss)
- ✅ Port 22 is accessible from external networks
- ✅ Firewall rules correctly allow SSH traffic from authorized IPs
- ✅ Security group configuration unchanged
- ✅ No network-level issues detected
Objective
Examine SSH daemon configuration and validate host keys/certificates.
Review SSH Configuration
# Check SSH daemon configuration
sudo cat /etc/ssh/sshd_config | grep -v "^#" | grep -v "^$"
# Verify configuration syntax
sudo sshd -t
# Check what port SSH is configured to use
sudo grep "^Port" /etc/ssh/sshd_config
Validate Host Keys
# List SSH host keys
ls -la /etc/ssh/ssh_host_*
# Check key permissions (should be 600 for private keys)
stat /etc/ssh/ssh_host_rsa_key
# Verify key fingerprints
ssh-keygen -lf /etc/ssh/ssh_host_rsa_key.pub
ssh-keygen -lf /etc/ssh/ssh_host_ed25519_key.pub
Critical Discovery
Port 22
Port 2222 ← DUPLICATE PORT DEFINITION!
# This was recently added by automation script
# Caused port binding conflict when service restarted
Root Cause Analysis
What happened:
- An automation script added "Port 2222" to sshd_config at 13:58 UTC (2 minutes before issue)
- Configuration reload was triggered at 14:00 UTC
- SSH daemon attempted to bind to both ports 22 and 2222
- Port binding conflict occurred (another process using port 2222)
- SSH service crashed and auto-restarted after 5 minutes
- On restart, the duplicate port line caused intermittent binding issues
Verify the Conflict
# Check what's using port 2222
sudo lsof -i :2222
# Alternative command
sudo netstat -tulpn | grep 2222
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME openvpn 5678 root 8u IPv4 12345 0t0 TCP *:2222 (LISTEN)
Explanation: OpenVPN management interface is already using port 2222, creating a conflict with the new SSH configuration.
Objective
Resolve the configuration conflict and verify SSH access is restored.
Solution Steps
1. Backup Current Configuration
# Always backup before making changes
sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.backup.$(date +%Y%m%d-%H%M%S)
2. Remove Duplicate Port Configuration
# Edit SSH configuration
sudo nano /etc/ssh/sshd_config
# Remove or comment out the duplicate port line:
# Port 2222 ← DELETE THIS LINE
# Or use sed for automated fix
sudo sed -i '/^Port 2222/d' /etc/ssh/sshd_config
3. Validate Configuration
# Test configuration syntax
sudo sshd -t
# Should return nothing if config is valid
4. Restart SSH Service
# Reload SSH configuration (safer, doesn't disconnect)
sudo systemctl reload sshd
# Or restart if reload doesn't work
sudo systemctl restart sshd
# Verify service is running
sudo systemctl status sshd
5. Test SSH Connectivity
# From external system, test SSH connection
ssh -v admin@vpn-gw-abc.xxx.com
# Test from multiple sources to ensure it's working
ssh -i ~/.ssh/client_abc_key.pem admin@vpn-gw-abc.xxx.com "uptime"
OpenSSH_8.9p1, OpenSSL 3.0.2 debug1: Connecting to vpn-gw-abc.xxx.com [10.x.x.x] port 22. debug1: Connection established. debug1: Authentication succeeded (publickey). 15:23:45 up 5 days, 3:18, 2 users, load average: 0.15, 0.12, 0.08
Post-Resolution Actions
1. Document the Incident
Required Documentation:
- Update ticket #78432 with root cause and resolution
- Create incident report for post-mortem
- Add to knowledge base: "SSH Port Binding Conflicts"
- Update runbook with validation steps
2. Preventive Measures
# Add monitoring alert for SSH service failures
curl -X POST "https://api.xxx.com/v2/monitoring/alerts" \
-H "Authorization: Bearer xxxTOKENxxx" \
-d '{
"name": "SSH Service Down - VPN Gateway",
"condition": "sshd_status != running",
"severity": "critical",
"notify": ["oncall-l3@xxx.com"]
}'
# Add configuration validation to automation script
# Before: sshd config modification
# After: sshd -t validation + port conflict check
3. Customer Communication
Subject: [RESOLVED] Ticket #78432 - SSH Access Restored
Hi John,
I'm pleased to inform you that SSH access to vpn-gw-abc.xxx.com has been fully restored as of 15:25 UTC.
Root Cause: A configuration management script inadvertently added a duplicate port definition that conflicted with an existing service, causing SSH to fail during a routine reload.
Resolution: We corrected the configuration, restarted the SSH service, and verified connectivity from multiple locations.
Prevention: We've implemented additional validation checks in our automation scripts and added monitoring alerts to detect similar issues faster.
Total downtime: ~85 minutes (14:00 - 15:25 UTC)
Please verify that your team can now access the gateway. Let me know if you have any questions!
Best regards,
Gabriel Mazer
L2 Support Engineer
📖 Quick Commands Reference
| Command | Purpose | Common Options |
|---|---|---|
systemctl status sshd |
Check SSH service status | start, stop, restart, reload |
sshd -t |
Test SSH config syntax | -T (dump config), -d (debug mode) |
netstat -tuln |
Show listening ports | -p (show PIDs), -a (all connections) |
lsof -i :PORT |
Check what's using a port | -i (internet connections) |
journalctl -u sshd |
View SSH service logs | -f (follow), --since, --until |
ssh -v |
Verbose SSH connection | -vv, -vvv (more verbose) |
nc -zv HOST PORT |
Test port connectivity | -w (timeout) |
iptables -L -n |
List firewall rules | -v (verbose), -t nat (NAT table) |
🎯 Key Takeaways
sshd -t before restarting SSH service. Syntax errors can lock you out.
lsof and netstat to identify port binding conflicts. Common issue when multiple services compete for same ports.
🔗 Related Resources
- KB-1234: SSH Best Practices
- KB-5678: Firewall Troubleshooting
- KB-9012: Service Recovery Procedures
- Lab 2.2: VPN Gateway Management
- Lab 3.1: Advanced Log Analysis
- Lab 4.3: Network Diagnostics
- OpenSSH Manual (man sshd)
- RFC 4253: SSH Protocol
- Linux System Admin Guide
✅ Lab Completion Checklist
Before marking this lab as complete, ensure you can:
- ☐ Explain the SSH troubleshooting methodology
- ☐ Use systemctl commands to manage services
- ☐ Read and interpret authentication logs
- ☐ Identify port conflicts using lsof/netstat
- ☐ Edit and validate SSH configuration safely
- ☐ Write clear customer-facing resolution emails
- ☐ Document incidents for knowledge sharing