Software Engineer (Large-scale crawling)
Main duties and responsibilities: What we'll challenge you with?
You will join a team of AI, machine learning, and big data graduates from the very top Universities world-wide, that all share a deep passion for both engineering and science. We take a highly collaborative approach and expect team members to help out where they feel they can contribute and enjoy doing so, all the while contributing to the team's core mission. You will take ownership of a core part of our mission:
Own and continuously improve a modern, flexible web technology stack for crawling
Design and develop cutting edge methods to scale up high-fidelity crawling based on full-fledged browsers by automatically translating it into crawlers that operate on plain HTML
Develop effective strategies and services for effective scheduling and monitoring of large scale (> 100k sources, >1M requests per day) crawling systems
Engage with the wider automated testing and crawling community and feed new trends and systems into the team that have the potential for real impact
Continually define your role as we grow
What you'll get out of it?
You will be working on some of the most challenging problems in data acquisition, knowledge base construction, and data integration with cutting edge techniques.
You will be challenged to expand your skill set constantly and together with your team members dive deep into challenging, often cutting-edge problems to find the best solution.
A sense of ownership you won't find elsewhere easily
Great people on teams all over the world
Get in early as a senior member of a growing department
Be at the heart of the growing data science community in London
Enjoy a great work environment in Shack 15 ( SHACK15 )
Located in the heart of Shoreditch in London
Skills, qualifications and experience: Selection Criteria
Our ideal candidate has (essential criteria):
3+ years of experience (or equivalent) in developing crawlers, automated testing solutions for web technologies or similar applications, as witnessed by previous employment record, involvement in open-source projects, or published peer-reviewed work
Experience in running such technologies in a cloud environment, as witnessed by substantial contribution to at least one production-level system
Experience and passion for finding solutions to problems that haven't been solved before
Feels passionately about software quality and takes pride in their work
Believes in validation through software and rapid prototyping
Fluent in English (verbal and written) and loves to work, learn, and teach others
Bonus points for (desirable criteria):
Experience with Selenium, WebDriver, as well as state-of-the-art headless browsers
Experience in large-scale web scraping technologies and/or automated wrapper generation
Strong background in Java, e.g., through contributions to an open-source project
Experience with using continuous integration for crawler or other web technologies API to ensure continuous availability
Experience with big data systems
Experience in designing and developing REST APIs