PyCon 2018 Ukraine: Highlights

Sciforce
Sciforce
Published in
6 min readMay 21, 2018

--

Since the dawn of the information era, Ukraine could boast a strong community of IT enthusiasts who showed interest not only in learning and pushing the boundaries of the computer science, but also sharing their knowledge with peers. The community managed to hold first local and then bigger international conferences that served as platforms for networking and exchange of best practices shaping the image of Ukraine as a country with a strong school of IT-related specialists.At present, one of the biggest Ukrainian conferences is PyCon which took place in Kharkiv on April 28–29. As a Kharkiv-based company, SciForce could not but send there a group of our Python developers. Below we share their highlights and speeches that inspired them most.

An Introduction to Time Series Forecasting with Python

Given by Andrii Gakhov, the report was dedicated to time series, an important instrument to model, analyze and predict data collected over time. The idea that underpinned the report was to structurize the models and approaches used for time series analysis. The speaker gave the overview of the basic theoretical concepts and presented different models, including the ARIMA family of statistical models: ARMA, ARIMA, SARIMA. Interestingly enough, only a few hidden nodes in a single hidden layer can perform good enough, compared to statistical models like ARIMA.

Other mentioned approaches were based on Prophet (the Facebook time series forecasting tool) and RNN approaches, LSTM in particular. Another interesting concept was the seasonal ANN (SANN).

The take-home message we can summarize as the following:

  • Always visualize the data;
  • Select the decomposition model that describes your data best;
  • Carefully select the prediction model.

Verification of Concurrent and Distributed Systems

Another speaker, Nikolay Novik, gave a talk on how model checking tools can help ensure the correctness of algorithms used in Concurrent and Distributed systems.

The proposed approach was based on TLA+ (Temporal Logic of Actions) formal specification language, which is highly impacted by mathematical logic operators like unity, intersection, etc. Limitation of this model’s implementation is the finite number of states that can be taken in account. For certain algorithms, this can lead to the ‘out-of-memory’ state. This approach can be assumed as a sophisticated algorithms debugger, allowing to define bottlenecks/bugs/deadlocks inside the logic of the formalized algorithms. For instance, Amazon and Microsoft are using TLA+ to validate their algorithms internally.

Goodbye cron, hello Airflow

This talk given by Yuriy Senko covered core concepts of the Airflow tool that is used to create, manage and schedule complex tasks workflows. As part of the Airflow typical use cases — design and control of complex data processing pipelines — the speaker shared his experience of replacing the legacy ETL process managed by the cron service with the Airflow-based one.

Thanks to the fact that Airflow, similar to cron, can run any bash script as a task, the migration can be done in small iterations. As a result, such migration has helped the company to drastically improve the manageability and stability of the system.

mypy: static types in Python

In his talk about a static type checker for Python, Ivan Levkinskyi showed how to improve readability, stability, and maintainability of the code. He explained how to use MyPy efficiently and smoothly for semantic safety; it also helps ensure that our code actually does what we want and the annotations are consistent.

The presenter also showed the tool pitfalls, such as contravariance of callable types, invariance of mutable containers, forward refs & import cycles, non-checking functions without annotations, using self-types within copy-likes methods, type aliases vs. Type[…] and others.

He also talked about future plans, including:

  • improving performance;
  • stabilizing the plugin API;
  • editor integration and automated refactoring; and
  • extending the type system with simple dependent types

Talk about the differences between the three implementations — CPython, Grumpy, PyPy

The speaker Itay Weiss decided to ponder one more time over multithreading and multiprocessing, GIL in CPython. The talk mainly concerned the differences between the three implementations — CPython, Grumpy, PyPy, focusing on use cases in which each of them can be used efficiently.

To summarize the speech, we can say that:

  • CPython is for GUI applications and Network servers;
  • PyPy is for long running processes and mostly for Python code (not for libraries like Numpy, which are C/C++ driven);
  • GrumPy (Python2Go transcompiler that doesn’t have VM and uses Go Garbage Collector and goroutines) is for situations with many threads, which execute small tasks, when go libraries are needed.

Load distribution in heterogeneous microservice environments

The speech given by Roman Prykhodchenko dealt with the fair load distribution among allegro (micro)services, whose amount approaches 600 instances. It turns out that heterogeneous environments complicate the proper load distribution, especially in cases of microservice architectures. As the heterogeneous nature of the environments where the services are run prevents us from using Load Balancer directly, the key idea was to introduce a tiny orchestrator service, which would obtain the data on their CPU utilization from all instances, perform calculations and transfer this information to Load Balancer to improve balancing.

Peer to peer file synchronization for your apps

The speech given by Paul Colomiets was about Ciruela software implemented by the presenter. The aim of the software is to synchronize static-like data among the servers. This solution rests on three pillars: the UDP Gossip within instances cluster, Merkle Tree hashes and hard-linking.

The possible applications include:

  • nearly-constant cached structures (category tree in online market);
  • configuration;
  • feature flags;
  • translations;
  • game data

Binary data in Python with a bit of C spice on top

The speech by Taras Voinarovskyi mainly concerned binary data types and modules for working with binary data in case the C++ client does not send the JSON data.

The main subthemes of the speech included the description of the following tools and modules to manipulate binary data:

  • struct intended to parse/pack simple built-in types;
  • Binary IO intended to work with files;
  • memoryview used to avoid redundant copies of data;
  • buffer management as the best for efficiency.

From beta to a world-class SaaS

In this talk, Ulises Reyes, the VP of Engineering at DataRobot, showed how their infrastructure evolved over time, from a home project with five people to the present state of the Production-grade SaaS with paid customers. The talk covered the zero-bugs culture and described the organizational changes made, the tooling used, the mistakes and bugs that the project faced on the way to success.

Perhaps, the most interesting and practical part was the migration from a simple Continuous Deployment scheme to Continuous delivery with Blue/Green deployments.

The current state and future of asyncio

The speaker Andrew Svetlov gave a talk on the current state of development of asyncio-based programs, best practices and discovered pitfalls as well as on disclosing the plans for future asyncio features and the main evolvement line.

It was interesting to learn that the great effort is made to hide the interior of asyncio from the user (developer), including the inside of async/await and methods for run/create_task/current_task/all_tasks/get_running_loop.

As one of the most recent developments, the ssl support has been added as part of loop.start_tls(). In future, the asyncio team is planning to reimplement TLS and to provide TaskGroup entity.

Finally, the most general tip from the speaker was that the sync and async code should not be mixed.

--

--

Sciforce
Sciforce

Ukraine-based IT company specialized in development of software solutions based on science-driven information technologies #AI #ML #IoT #NLP #Healthcare #DevOps