Code audit results for app.sharpay.io and plan for future code refactoring (from 2019)

Hi Sharpayers!

As you know, we pay a lot of attention to development. We know that the product comes first. Without a product there will be nothing. On the Internet, a product is a code. The better and faster the product code, the better the product works. Last year we did a great audit of our code. And then we realized that the time had come for the first big improvements. It’s time to “refactoring”.

After the audit, we did a lot of refactoring. But so far we have not talked about this. Today we publish the last year’s big audit of our system (longread, 7 pages). Today, just read what our product really is and how we approach the development process. We did the audit ourselves. The full text of this audit is published on our blog.

If you have any thoughts, ideas, suggestions after reading our audit, please feel free to contact our Admin (https://t.me/sharpay_admin).

In Sharing We Trust!
Sharpay.io Team

Our Sharpay project has a fairly long history. The first prototypes of our application were created in around 2013, years before cryptocurrencies started going into the mainstream. From 2017 to now, our team has been working hard on the alpha version of Sharpay application. During this period the project’s codebase evolved rapidly as we implemented many features to satisfy the needs of our growing audience. Normally such a fast growth pace comes with the high price tag and, as many other startups, we accumulated noticeable “technical debt”… So at this point, if we wish to maintain current development speed in the future, we should invest a fair amount of time into the process called “refactoring”.

Motivation behind the code refactoring

In the text below, the phrases like “difficult to understand”, “difficult to read code”, “you need to remember about X” are often found… It is not because we are lazy and want everything to be easy and simple. Actually “difficult to understand” means that an engineer will spend several hours instead of 10-15 minutes to understand how particular code works. “Code is difficult to read” means that it is very easy to miss some important point and introduce a new bug, which may be discovered in a month or two, and during this time it will make users happy. “X must be remembered” is just expenditure of energy, that is always limited, even with the strongest motivation to work. If less energy remains, the result will be worse in quality or work will take much more time.

So all these “difficult”s ultimately translate into losses. It’s like a water leak. Even if a stream is thin, a large amount of water can be lost in a month. Only the difference is that with the code we can’t set up a bucket under the “trickle” and, accordingly, it’s hard to calculate how much such “leakage” costs.

Another dangerous aspect of dirty code is its influence on developer’s motivation. It’s one thing if you create a high-quality product at work and very different if you try to make candies from nothing. Completely different sensations… Dirty code kills the initiative. Trying to do something new becomes not only much more difficult but there is a non-illusory risk of breaking something in the other part of the system.

Technology

Composer (package manager)

Provides the ability to quickly install and integrate third-party PHP libraries into an application. Eliminates the desire to reinvent the wheel. Allows relatively painlessly updating dependencies, eliminating bugs and vulnerabilities.

Recommendation: must have. Time estimation – 1-2 weeks of work.

The path is the following – we to revise project’s dependencies, compile the list and find the packages we need on packagist.org. We will have to upgrade some of the dependencies, which most likely will take most of the time. Also we should replace the current file loader with one provided by Composer and most likely the entire code should be split into namespaces.

> It is convenient and useful, we need to implement it. We will do it along with the transition to the Framework.

Modern PHP Framework (Laravel or other)

Now we have a strange situation. For some tasks we use the outdated version of the current framework (database access, sending emails, caching, etc.) and for others we have self written several “bicycles” (routing, logging, loading resources, updating the database schema) or use various external libraries. Our project is young, but because of the above, the sensations from working with it are almost the same as from working with well-aged legacy code. There is practically no documentation and we have constantly deeg into the code in order to understand how the “bicycles” work. Plenty of time is spent on supporting “bicycles” and creating new ones, instead of just taking a ready, well-tested and documented module (a good example of this, the OAuth server). Definitely there will be a drama when on-boarding new developers to the project. The new person will have to spend weeks, maybe even months, learning all these “tricks”.

Recommendation: we need to do research and discussion. Time estimation – N/A.

> We need to consider 2-3 alternatives, make a comparative table of the features we need. Then we can take one more or less isolated piece of the application and transfer it to the selected framework in order to evaluate the labor costs for the transition of the whole project. At this stage, it makes sense to spend 2-3 weeks and after that, having at least vague estimates in hands, make the final decision about the transition.

> We can do it, but we should very carefully approach the choice of new framework, taking into account all the features required for our project. Also it is possible that excessive flexibility of selected framework will lead to system’s speed degradation (which is why we have our own module loader and routing). Performance definitely needs to be tested to make the right decision.

Modern JS Framework (Vue.js or other)

Now the application’s user interface is almost completely rendered on the server side and only when necessary “galvanized” with the help of jQuery library. This approach works well for simple pages, but rather lousy for more complex things like our site editor. Plus, we still have to draw the UI on the client from time to time and so we get a mix of different approaches, which complicates both the solution of current tasks and the maintenance of existing code.

Recommendation: we will implement it gradually. Time estimation – N / A.

We now use Vue.js library in 2-3 places, so it makes sense to continue using it. It is necessary to make a list of pages where the SPA approach will bring the most value. Then, based on this list, create several tasks. As they are resolved, it will be possible to estimate how much time it takes to port the entire interface and how appropriate it is. Perhaps as a result of this approach, we will have some kind of hybrid, where some of the pages will be rendered on the server, and some on the client. This work is quite possible to combine with the improvement of the user interface.

> There is no sense to refuse completely from “ordinary pages” and jQuery. But on complex interfaces, the transition to Vue will be justified.

Architecture

Service layer

We use the MVC pattern (Model-View-Controller), but it looks like we are already “outgrowing” it. Controllers are getting too congested. We believe that it’s already necessary to create a separate service layer located between controller and model layers and transfer to it a significant part of application logic, for example, working social networks and sharing.

Recommendation: should be done after the introduction of composer and namespaces. Time estimation – see below.

We already have task #481 – this is the refactoring of the code responsible for working with social networks. Let’s start with this task, it is the most problematic area. It will take us one to two weeks. Then having experience of such refactoring, we will look through the code of controllers, models and classes in the Library/Sharpay directory to compile a list of places where improvements are needed. Mainly this will be smaller tasks, lasting from several hours to several days to complete.

SQL outside the model layer

In some places, we access the database from controllers and views, thus mixing various levels of abstraction. This makes the application’s code harder to understand and modify.

Recommendation: we need to clean it. Time estimation – 3-5 days of work.

Project’s file structure

The structure of the project at the moment looks quite strange. Part of the application code is intermixed with third-party libraries. There is a “project in the project”, which is dependent on the application code, but has its own structure. All this complicates the navigation code.

Recommendation: we will do it after implementing Composer. Time estimation – 1-2 days of work.

> We will change it with the transition to the PHP Framework.

PHP

Namespaces

Namespaces were introduced in PHP 5.3 released 10 years ago and now they are the standard means of structuring code in almost all modern programming languages. We do not use them. Instead, there is a Hungarian notation for naming classes, which makes it difficult to read and modify the code. Also, because of the long and clumsy class names, it is often easier to push the solution of a problem into one class, instead of creating several interacting classes. This makes code more difficult to maintain because duplication gradually creeps in (a good example is our social network classes). Plus, it’s hard to connect new libraries, as our file loader does not support namespaces, but unfortunately this is not a significant reason for the authors of third-party code to stop using this PHP language feature.

Recommendation: do it simultaneously with the transition to Composer. Separate time is not necessary.

> We can now fix our loader and smoothly switch to the namespace in the libraries Sharpay_ *. The structure of the application itself will also be dictated by the framework used.

Type hints

Practically for all functions it is necessary to specify the types of parameters and return values. This will increase the readability of the code and help to avoid some types of errors and thus saves the time spent for debugging.

Recommendation: do it in a smooth transition. Separate time is not necessary.

> Take the rule to add type hints in all files that are included in the commit. The most frequently changed code will get “typed” quickly, the rest is much less critical.

> We can now do it in the background, as we work with tasks, update the methods in the classes used.

Code style

Standard for code design

The absence of standard code style makes code more difficult to read, and that is important because reading code takes an average of 9/10 of working time. We suggest adopting PSR-2 as a standard and reformat all the code according to it using a suitable tool.

Recommendation: we must do it. Time estimation – 1 day of work.

> Since we use phpstorm we can customize the style of the code and save it as a template for our development.

Naming classes, functions, variables …

It is necessary to develop a common naming scheme in the project. For example, if a function is called isSomethingExists(), then it should return a boolean value.

Recommendation: implement it gradually. Time estimation – see below.

It is necessary somewhere in the project to make a document that extends PSR-2, which contains a dictionary of nouns and verbs used to name the objects of the subject area and the actions performed with them. General principles are taken from Chapter 2 of the Clean Code by Robert Martin. The initial version of the dictionary and rules can be “probrinshit” in the chat, to allocate a day for this, without interrupting from other tasks.

> Implement as we work with the code. The general rule – change the function, correct its name and all the names in it, if necessary.

Sizes of functions

We have a lot of places where the functions are prohibitively large. For example, the function Sharpay_Telegram::hookMessage () on February 21 was 780 lines long. Now there are already 1022 lines… The problem here is that it’s very difficult to make any changes in such unstructured code to anyone other than its author. A lot of time is required, just to understand where and what is happening and how to modify it correctly, so as not to break the remaining code.

Recommendation: we must do it. Time estimation – by tasks separately.

> We need to simplify individual tasks and break them into subfunctions.

Nesting Level

Code with a nesting level of more than three is quite difficult to read. Making changes in it is even more difficult. In our code there are enough places where the level of nesting is 5 or higher, it is very desirable to restructure them.

Recommendation: eliminate it within the scope of tasks on refactoring large functions.

Heterogeneous return values

Each function must return a value of a specific type. It should not be that in one case the function returns an array, and in the other a number. Such code is very difficult to understand and fragile when changed. This is also adjacent to the return of the error code instead of the result. It looks pretty harmless, but often leads to an increase in the depth of nesting code. In this case, it is better to use exceptions.

Recommendation: eliminate it within the scope of tasks on refactoring large functions.

> We should clean up such cases when we find it.

Database

Give field names in tables to one schema.

Now we have in some tables the user ID is named, for example, sites.s_user, and in other users_social_share.uss_u_id. Such confusion interferes with writing and debugging SQL queries, developers have to either remember where and how the fields are named or climb to the database schema.

Recommendation: do it in conjunction with the next task. Time estimation – 1 day of work.

> There are only 2 or 3 such places, we have to fix it.

Shape table names

Use name only in singular, no plurals. Now the tables, whose names consist of one word, are in plural form, but we have a problem with tables with compound names. For example, users_access and users_task, but users_logs and users_comments. It also interferes when working with SQL queries.

Recommendation: must do. Time estimation – 1 day of work.

Get rid of Hungarian notation in table field names

When a table name consists of one word, everything is fine. Two words are acceptable. But as the system grows, these freaks utpc_ut_id, utc_ut_id appear. Plus there is confusion. What is the “ut” in the above names? users_task? users_transaction?

Recommendation: do it in the process of transition to the new framework, ORM most likely will have its own convention on the naming of tables.

> To avoid confusion in field naming (ut_), it is better to select for the tables of one group the synonyms users_task -> users_job. Long names like utpc_ * are more difficult to read, but in the context of the structure of the base and the logical grouping of tables it looks right. One should strive to use shorter titles (1-2-3 words maximum).

> We do not agree with the comment about short table names. The name must unambiguously and clearly describe the entity and its length does not matter, if we need 5 words, we should use them. A naming scheme that imposes restrictions on the length of a name is flawed.

It is necessary to audit the entire database schema.

Ensure that all integrity constraints are correctly set. Field data types, primary keys, secondary keys, unique indexes.

Recommendation: must do. Time estimation – 1 day of work.

> Audit of the database is necessary. For a start, we can make a superficial (checking the main keys of the tables). And more detailed on the results of logging requests.

We need to create a mechanism for logging all executed SQL queries

It should be possible, if necessary, to view all SQL requests with parameters, results and execution time that any HTTP request generates. This is required in order to be able to easily find the bottlenecks and understand which particular query needs to be optimized.

Recommendation: we will take care of this when moving to the new framework.