本文档是关于 Visual Studio Code 的快速上手指南,介绍了该编辑器的安装、基本操作、版本控制及与 PHP 的集成。Visual Studio Code 是由 Microsoft 开发的跨平台、开源且功能强大的代码编辑器,具备多种自定义选项和插件支持。用户可通过此指南快速掌握工具使用,提升编程效率。
Ansible is an automation tool for deployment and management, governed under GPL 3.0 by Red Hat. It uses YAML for playbooks and modules to configure nodes over SSH, differentiating itself from other tools like Puppet and Chef by allowing any computer to act as a controller. The document provides a quick start guide, an overview of inventory and playbook structure, as well as details on roles and handlers for task management.
本文档是关于 Visual Studio Code 的快速上手指南,介绍了该编辑器的安装、基本操作、版本控制及与 PHP 的集成。Visual Studio Code 是由 Microsoft 开发的跨平台、开源且功能强大的代码编辑器,具备多种自定义选项和插件支持。用户可通过此指南快速掌握工具使用,提升编程效率。
Ansible is an automation tool for deployment and management, governed under GPL 3.0 by Red Hat. It uses YAML for playbooks and modules to configure nodes over SSH, differentiating itself from other tools like Puppet and Chef by allowing any computer to act as a controller. The document provides a quick start guide, an overview of inventory and playbook structure, as well as details on roles and handlers for task management.
本文介绍了 Max Lai 使用 pytest 进行单元测试的经验,强调了编写测试的重要性以确保代码的正确性和可维护性。文章详细讲解了pytest的安装步骤、基本用法以及单元测试的最佳实践,包括如何创建测试文件、验证结果,并探讨了mock和fixture的使用场景。最终,文章还提到了一些pytest插件和代码覆盖率工具,以帮助提升测试质量和效果。
此文档分享了Angular 4在网站开发的最佳实践,包括使用Visual Studio Code与Angular CLI、推荐的扩展以及如何优化Angular应用绩效。探讨了变更检测策略、模块化应用及延迟加载等主题,并提供了具体代码示例和指令。此外,强调了遵循Angular风格指南和使用工具如source-map-explorer来提高开发效率与代码质量。
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...Jimmy Lai
Managing tech debt in large legacy codebases isn’t just a challenge—it’s an ongoing battle that can drain developer productivity and morale. In this talk, I’ll introduce a Python-powered Tech Debt Framework bar-raiser designed to help teams tackle even the most daunting tech debt problems with 100,000+ violations. This open-source framework empowers developers and engineering leaders by: - Tracking Progress: Measure and visualize the state of tech debt and trends over time. - Recognizing Contributions: Celebrate developer efforts and foster accountability with contribution leaderboards and automated shoutouts. - Automating Fixes: Save countless hours with codemods that address repetitive debt patterns, allowing developers to focus on higher-priority work.
Through real-world case studies, I’ll showcase how we: - Reduced 70,000+ pyright-ignore annotations to boost type-checking coverage from 60% to 99.5%. - Converted a monolithic sync codebase to async, addressing blocking IO issues and adopting asyncio effectively.
Attendees will gain actionable strategies for scaling Python automation, fostering team buy-in, and systematically reducing tech debt across massive codebases. Whether you’re dealing with type errors, legacy dependencies, or async transitions, this talk provides a roadmap for creating cleaner, more maintainable code at scale.
PyCon JP 2024 Streamlining Testing in a Large Python Codebase .pdfJimmy Lai
The document discusses strategies for streamlining testing in a growing Python codebase at Zip, which has doubled to 2.5 million lines of code with 100 developers. Key strategies include parallel execution of tests, caching dependencies to save time, and skipping unnecessary tests to improve efficiency. The insights aim to enhance quality assurance and developer experience through optimized continuous integration practices.
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseJimmy Lai
The document outlines strategies for optimizing testing in a large Python codebase at Zip, which consists of 2.5 million lines of code and over 100 developers. It emphasizes the need for quality assurance and efficient testing practices using tools like pytest and continuous integration workflows to handle challenges such as increasing test execution time and coverage. Key strategies discussed include parallel execution, caching dependencies, skipping unnecessary tests, and modernizing runner infrastructure for faster and cost-effective testing.
The document discusses the challenges and solutions for managing Python linters in large codebases, focusing on improving developer experience and scaling issues. It highlights various popular linters, their configurations, and practices for optimizing linting processes, including using caching, parallel execution, and custom auto-fixing tools. Additionally, it emphasizes the importance of tracking metrics and establishing consistent linting practices across multiple projects to enhance efficiency and reduce errors.
EuroPython 2022 - Automated Refactoring Large Python CodebasesJimmy Lai
Like many companies with multi-million-line Python codebases, Carta has struggled to adopt best practices like Black formatting and type annotation. The extra work needed to do the right thing competes with the almost overwhelming need for new development, and unclear code ownership and lack of insight into the size and scope of type problems add to the burden. We’ve greatly mitigated these problems by building an automated refactoring pipeline that applies Black formatting and backfills missing types via incremental Github pull requests. Our refactor applications use LibCST and MonkeyType to modify the Python syntax tree and use GitPython/PyGithub to create and manage pull requests. It divides changes into small, easily reviewed pull requests and assigns appropriate code owners to review them. After creating and merging more than 3,000 pull requests, we have fully converted our large codebase to Black format and have added type annotations to more than 50,000 functions. In this talk, you’ll learn to use LibCST to build automated refactoring tools that fix general Python code quality issues at scale and how to use GitPython/PyGithub to automate the code review process.
Annotate types in large codebase with automated refactoringJimmy Lai
The document discusses an automated refactoring process for enhancing type annotations in a large Python codebase at Carta, which consists of 1.8 million lines and 120,000 functions. It highlights tools like libcst for modifying Python code and mentions strategies such as using static and runtime analysis to improve type coverage. Additionally, it encourages continuous improvement through automated weekly updates and the engagement of developers.
The journey of asyncio adoption in instagramJimmy Lai
The document discusses the adoption of asyncio at Instagram, detailing its challenges and the subsequent benefits like improved API performance and reduced CPU idle time. It outlines key features of asyncio, common myths, and migration strategies from synchronous to asynchronous coding practices. The implementation insights share methods to optimize performance and achieve better concurrency, ultimately leading to enhanced user experience.
Hung-Che Lai successfully completed the Data Analyst Nanodegree program from Udacity in 2016. The certificate verifies that Hung-Che Lai learned data analysis skills and discovered insights from data. Sebastian Thrun, CEO of Udacity, certified that Hung-Che Lai completed the program on October 19, 2016.
Distributed system coordination by zookeeper and introduction to kazoo python...Jimmy Lai
The document provides an overview of Apache ZooKeeper, a coordination service for distributed systems, and introduces the Kazoo Python library for interacting with it. It details common tasks that ZooKeeper simplifies, such as leader election and configuration management, and explains concepts like znodes and session states. Additionally, practical examples demonstrate how to implement distributed coordination patterns like locks and master-worker relationships using ZooKeeper.
The document outlines the construction and search capabilities of a searchable knowledge base, emphasizing the importance of efficient interfaces for data navigation. It discusses specific technologies such as Python, Solr, and DBpedia for data aggregation and search functionalities including string matching, synonym search, and geo search. It concludes with potential applications of such a knowledge base in personal assistants and question answering systems.
The document outlines a backend API solution for fast prototyping using Solr, detailing steps for adding a new core, defining the schema, feeding and updating data, and performing queries. Key instructions include creating a core, editing schema.xml, and using JSON format for data updates and queries. The next steps involve understanding Solr's mechanisms and integrating with Django Rest Framework for further development.
[LDSP] Search Engine Back End API Solution for Fast PrototypingJimmy Lai
The document outlines a backend API solution for rapid prototyping of a search engine using Linux, Django, Solr, and Python. It includes installation instructions, data preprocessing steps, and details for creating a geo search API for a Taiwan movie dataset. Further steps for understanding Solr and Django REST framework functionalities are also suggested.
This document provides an overview of text classification in Scikit-learn. It discusses setting up necessary packages in Ubuntu, loading and preprocessing text data from the 20 newsgroups dataset, extracting features from text using CountVectorizer and TfidfVectorizer, performing feature selection, training classification models, evaluating performance through cross-validation, and visualizing results. The goal is to classify newsgroup posts by topic using machine learning techniques in Scikit-learn.
Big data analysis in python @ PyCon.tw 2013Jimmy Lai
The document outlines the use of Python for big data analysis, covering key tools and frameworks such as Scrapy for web scraping, MongoDB for database management, and Scikit-learn for machine learning. It discusses the advantages of Python including its readability and productivity, as well as practical examples for data collection, analysis, and visualization. The presentation also provides references and resources for further learning in big data applications with Python.
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
The document discusses text classification in Python utilizing libraries such as pandas, scikit-learn, and matplotlib. It covers the entire process from data collection and feature extraction to classification model training and performance evaluation using iPython notebooks for fast prototyping. Additionally, it includes practical examples, demo code, and references for further learning.
The document outlines software development practices in Python, focusing on runtime environments, source code management, unit testing, coding conventions, documentation, and automation. Key tools discussed include virtualenv, Mercurial for version control, and Sphinx for documentation. The document emphasizes the importance of test coverage, coding standards (PEP 8), and automation through Fabric to enhance development efficiency.
Fast data mining flow prototyping using IPython NotebookJimmy Lai
The document discusses a workflow for fast data mining prototyping using IPython Notebook, highlighting its features such as interactive web IDE, code execution flexibility, and data visualization capabilities. It exemplifies text classification using a newsgroup dataset and details various stages involved in data mining along with a demo code for practical application. The goal is to classify articles according to their newsgroup, showcasing results and observations from experiments with feature extraction.
The document provides a comprehensive guide on using Sphinx for Python documentation, covering setup, document types, reStructuredText syntax, and autodoc features. It includes instructions for installing Sphinx, configuring Apache, and generating PDF documentation. Key topics include documenting system architecture, usage, classes, modules, and functions.
Apache thrift-RPC service cross languagesJimmy Lai
The document describes the Apache Thrift framework, which enables cross-language RPC services, supporting multiple programming languages such as C++, Java, and Python. It outlines the steps for defining data structures and service interfaces using IDL, generating language bindings, and writing server and client scripts for a machine learning prediction service. An example is provided, demonstrating the process of predicting outcomes using a pre-trained model in a server-client architecture.
NetworkX - python graph analysis and visualization @ PyHugJimmy Lai
NetworkX is a Python package for analyzing and visualizing graphs and networks. It allows users to construct graphs from data, model network topology and examine properties like centrality and connectivity. The document provides instructions on installing NetworkX and links to tutorials, demonstrates analyzing a social network from a PTT bulletin board, and lists the top users by PageRank centrality.