Observability Engineer

JLL
United States, Illinois, Chicago
200 East Randolph Street (Show on map)
Sep 26, 2024
JLL is looking for an Observability Engineer to assist in the support and administration of the Datadog monitoring platform. The focus of this role will be ensuring the reliability, scalability and efficiency of Datadog for monitoring and AIOps within the organization. The primary goal being to maximize the availability and performance of Applications, Infrastructure and Network services, while also improving the overall system stability. As part of a team, they will interface with multiple internal and external teams. The right person will be a critical thinker, and adaptable to change and juggling multiple work priorities. Responsibilities Establish monitoring solutions to track the health and performance of systems and set up alerts to promptly respond to any issues or anomalies using Datadog. Instrumenting new devices and application onto Datadog. Setting up metrics, traces on logs to monitoring the health of JLLs infrastructure and applications. Develop and configure dashboards to observe the health of JLLs infrastructure and applications. Develop and maintain automation tools and frameworks to streamline operational tasks such as deployment, monitoring, and incident response using Ansible Tower. Investigate opportunities to minimize the impact of incidents and identify ways to prevent future occurrences. Analyse system performance, identify potential bottlenecks, and plan for future capacity needs to ensure scalability. Optimise system performance by analysing and improving code, database queries, network configurations, and other system components. Collaborate with security teams to ensure systems are secure and compliant with relevant regulations and policies. Work closely with cross-functional teams, sharing knowledge, documenting procedures, and promoting best practices to improve system reliability across the organization. Configure AIOps capabilities to deliver noise reduction, event correlation and enhanced Root Cause Analysis. Experience & Education Minimum 5 years of experience working as a Observability/Site Reliability Engineer supporting Network, Infrastructure and applications. Must have previous experience supporting Datadog as a monitoring tool within the following areas: Application Performance Monitoring Synthetic API and Browser Monitoring Network Device and Performance Monitoring Infrastructure Monitoring Previous experience developing ansible scripts to support automation. Previous experience supporting and administrating an AIOps platform (i.e. Watchdog, Moogsoft, Big Panda etc). Previous experience of working with Cloud technologies including Azure, AWS, Kubernetes etc. Technical Skills & Competencies Comfortable working in a fast-paced environment Excellent verbal and written communication skills Highly professional, with the ability to deliver solid work on tight schedules. Demonstrated ability to define new approaches to complex design problems. Experience working in Agile/Scrum delivery teams. Effectively communicate project status to management team on a regular basis Have excellent problem-solving skills utilizing both internal and external resources to get the job done. Self-starter and motivator and ability to work in team and solo. Flexibility in working hours to accommodate meetings with US colleagues. Ability to be accessible online to the team (Microsoft Teams) during their shift and able to attend key Sprint related meetings and use video conferencing from time to time.