Twiga Senior Site Reliability Engineer Jobs in Kenya
Twiga Senior Site Reliability Engineer Jobs in Kenya
About TwigaTwiga is a B2B e-commerce company that builds fair and reliable markets for agricultural producers, food manufacturers and retailers based on transparency and efficiency. Our Mission is to build a closed ecosystem for the African retail, anchored on affordable access to food and grocery across urban cities. Our Ambition is to leverage technology, the ubiquity of mobile phones, modern distribution and logistics to modernize African retail.
Senior Site Reliability Engineer VacancyThe role holder will be responsible for leading the end-to-end design, development and deployment of engineering solutions to run scalable, distributed and fault-tolerant software systems for Twiga Foods. The role holder will lead the implementation of automated solutions to ensure uptime, reliability and improvement of Twiga Food’s systems in line with set service level objectives.
He/she will be required to provide leadership in determining software engineering needs from product/engineering requirements and collaborating across the organisation to clarify requirements and expected outcomes.
They are also accountable for work assigned, ensuring that it is broken down into a plan with estimates, priorities and deliverables; ensuring that adherence to the plan and communicating when any adjustments to scope are needed to meet deadlines.
Additionally, he/she will contribute to the wellbeing of the Twiga technology ecosystem by tracking production systems’ capacity and performance, fixing issues and taking on-call responsibilities.
Key Responsibilities
Site ReliabilityCollaborate with other cross-functional teams to design, develop, and deliver required software
Develop, manage and support SRE tools and applications.
Lead/own and drive the development/implementation of SRE tools within the Product/Technical Requirements Document.
Develop or review technical specification documents within the SRE team and wider engineering team.
Lead the deployment, training, and rollout of major/minor SRE tools across various engineering/tech teams.
Deliver feature work consistently and on time whilst still tackling tech debt. Ensure that code fits agreed, accuracy, testability, and efficiency and style guidelines. Software systems that meet agreed SLO for performance and reliability
Produce a work breakdown structure with estimates, deadlines, and deliverables. Own features from technical specification, implementation right through to deployment into production
Engage in improving the software development lifecycle, providing feedback on requirements, architecture, designs, and solutions.
Build resilience into systems so underlying failures are handled gracefully and do not impact end users.
Develop automated predictive analysis of future capacity needs and proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources usage.
Manage individual priorities, deadlines, and deliverables.
Defend and challenge technical decisions made through solution design and code review feedback
Finalise and own technical documentation for the developed features
On-Call Technical Support
Monitor application availability and performance, take steps to improve overall application performance and stability, and follow through with implementation
Participate in on-call technical support rotation, respond to all incidents, and lead minor/major incidents in collaboration with relevant engineering/product stakeholders.
Triage system issues and debug/track/resolve by analysing the sources and offering corrective measures. Through end-to-end incident response and management.
Drive efficiencies through systems improvement and root cause analysis resulting in service delivery, maturity, and scalability.
Analyze logs and telemetry data by writing monitoring and automation code.
Identify and automate repetitive, manual, and non-tactical work that impacts software development and deployment.
Innovation
Investigate site reliability technologies and their applicability to the Twiga ecosystem.
Identify significant projects that result in substantial improvements in reliability, cost savings and/or revenue.
Provide reports on findings, with recommendations and a viable plan of action.
Lead design reviews with peers and stakeholders to decide amongst available technologies
Evaluate and review existing
systems, SRE processes, & tools.
Develop and lead implementation of a viable technical specification document in collaboration with members of SRE or engineering team.
Contribute to the definition of SLOs for services/applications.
In-Team Collaboration
Work with peers to build a stronger engineering team
Lead process improvements that boost productivity and quality of Twiga engineering
Regularly contribute improvements to existing documentation and codebase as per agreed standards.
Review code developed by others and provide feedback to ensure adherence to Twiga Engineering best practices.
Contribute regular knowledge shares through a variety of mediums including lunch and learn sessions.
Provide mentorship for SRE engineers and interns in the section.
Mentor/Coach/Train engineers on system design, reliability, monitoring, and availability concepts to help improve the overall system quality.
Develop and maintain relationships with various engineering teams and their members.
Acquire and maintain an understanding of multiple engineering teams processes and tools.
Influences the engineering roadmap and works with engineering and/or product counterparts to influence improved resiliency and reliability of Twiga systems.
Deep domain knowledge and radiation that knowledge through recorded demos, technical presentations, discussions, and Incident Reviews.
Self-management
Model Twiga’s culture and way of working.
Deliver the performance objectives set for the team. Hold monthly 1-on-1 performance reviews with line manager, and institute corrective action where performance falls below expectation.
Proactively manage own learning and development
Adhere to the annual leave plan agreed with the line manager
Adhere to people management policies
Compliance
Comply with all organization policies, procedures, and statutory guidelines. Minimize and mitigate risks to the organization and enforce zero-tolerance to non-compliance.
Close gaps/lapses identified as an outcome of audits; risk and/or any other compliance review; investigations; or other assessment mechanisms and take corrective/preventive actions within the agreed timelines.
Minimum Qualifications & Requirements
Degree in Engineering, Computer Science, Information Technology or a related discipline. Or demonstrated equivalent skill/competence.
Minimum of 5 years of relevant experience
Observability and monitoring of infrastructure, applications, services, and networks
Troubleshooting issues across the entire stack (hardware, software, network etc.)
Writing infrastructure as code and automation scripts
Building and maintaining CI/CD pipelines
Building, running, and optimising containers with Docker or ContainerD
Setting up, running, and managing Virtual machines, Kubernetes clusters, Databases and Virtual Private Networks
Operating highly available and reliable infrastructure
At least 3 years’ experience working with relational databases (Postgres, MySQL or Microsoft SQL Server) non-relational, and in-memory data stores
At least 2 years' experience creating/managing SLIs/SLOs/Error Budgets.
Strong technical understanding of android, front-end and backend development
Experience in design, implementing and securing distributed systems
Strong experience with; Analysing logs, metrics and traces.
Creating system reports and system alerts.
The use, maintenance and configuration of monitoring, observability and telemetry metrics and logging infrastructure (Prometheus, Grafana, ELK, or Sentry)
Understanding of Agile/Scrum development principles
Understanding of ITIL incident and problem management practices
Can work accurately and quickly, to ensure key project milestones are achieved within set timelines, even when working under pressure.
Always have a positive attitude and approach to the role and team.
How to Apply
For more information and job application details, see; Twiga Senior Site Reliability Engineer Jobs in Kenya
Find jobs in Kenya. Jobs - Kenya jobs. Search our career portal & find the latest Kenyan job positions, career opportunities & jobs in Kenya.
Jobs in Kenya - banking jobs, IT jobs, accounting jobs, NGO jobs, business administration, ICT, UN jobs, procurement jobs, education jobs, hospital jobs, human resources jobs, engineering, teaching jobs, and other careers in Kenya.
Find your dream job from 1000s of vacancies in Kenya posted and updated daily - click here!