Windows/Linux Cloud System Admin
The Systems Administrator will provide operational and maintenance support for enterprise clinical data exchange applications and supporting infrastructure hosted in a cloud-based environment (VA Enterprise Cloud). This role focuses on day-to-day administration of Windows and Linux systems, ensuring reliability, security, and performance through proactive monitoring, patching, and incident response. The position provides Tier 1-2 operational support across multiple production systems.
Key Responsibilities
Windows & Linux System Administration
- Administer, configure, and maintain Windows Server and Linux systems (Red Hat, Ubuntu, Amazon Linux) supporting enterprise applications.
- Perform routine system maintenance including OS patching, vulnerability remediation, software installations, and configuration updates.
- Manage system services, startup scripts, scheduled tasks, cron jobs, and configuration baselines.
- Support file systems, storage utilization, system performance tuning, and capacity management.
- Validate and maintain system backups, snapshots, and recovery procedures.
Monitoring and Incident Response
- Monitor system health, resource utilization, and application availability using enterprise monitoring and logging tools (e.g., CloudWatch, Splunk, Nagios, or equivalent).
- Review alerts and logs, analyze system behavior, and identify root causes of issues.
- Provide Tier 1 and Tier 2 operational support, including service restarts, system recovery actions, and connectivity troubleshooting.
- Participate in incident response activities and maintain operational runbooks and troubleshooting documentation.
Security and Compliance
- Apply system hardening, patching, and configuration standards aligned with CIS benchmarks and federal security requirements (RMF, FISMA, FedRAMP).
- Support Authority to Operate (ATO) activities by maintaining compliant system configurations, patch baselines, and asset inventories.
- Work with cybersecurity teams to remediate vulnerabilities identified through scanning tools such as Nessus or Tenable.
- Ensure least-privilege access and proper credential management on Windows and Linux systems.
Cloud Platform Support (Secondary Focus)
- Support systems hosted in AWS-based environments by maintaining compute instances, storage, and networking configurations as needed.
- Assist with system recovery, scaling activities, and environment maintenance using existing cloud tooling and processes.
- Collaborate with cloud engineers and developers to support application deployments and operational stability.
Documentation and Automation
- Maintain system documentation including operating procedures, configuration standards, and architecture diagrams.
- Develop and maintain scripts for administrative tasks using PowerShell, Bash, or Python.
- Contribute to continuous improvement of monitoring, alerting, and operational processes.
Qualifications
Required
- 3+ years of experience as a Systems Administrator supporting production Windows and Linux environments.
- Strong hands-on experience with Windows Server and Linux OS administration, patching, and troubleshooting.
- Experience responding to system alerts and incidents in a 24x7 production environment.
- Familiarity with system monitoring and logging tools (Splunk, CloudWatch, Nagios, Datadog, or similar).
- Working knowledge of scripting and automation (PowerShell, Bash, Python, Ansible).
- Strong problem-solving skills and ability to operate under time-sensitive conditions.
Preferred / Desired
- Experience supporting systems hosted in cloud environments (AWS, Azure, or GovCloud).
- Familiarity with VA or other federal enterprise IT environments.
- Understanding of federal security frameworks (RMF, FISMA, FedRAMP).
- Experience supporting ATO-authorized systems.
- Linux (RHCSA/RHCE) or Microsoft Windows Server certifications.
- Active Public Trust or ability to obtain VA suitability clearance.
Work Environment
- Remote support for production Windows and Linux systems hosted in cloud and VA environments.
- Participation in an on-call rotation for critical system support.
- Regular collaboration with developers, cybersecurity teams, QA, and help desk staff.
|