just droidy things

Sunday, June 21, 2015

GitHub please let us select the type of merge for pull request

From reading the book Git in Practice the author mentioned two git workflows, which boil down to Merge v.s. Rebase.

Rebasing is rewriting git history. If we already pushed to remote before rebasing, we have to force push to overwrite the remote history. When we force push, two bad things could happen. First, if the remote branch is a collaborative branch to which other people can push, we are at risk of overwriting other people's commits if our local branch doesn't contain those commits. Another risk is that we could accidentally force push to the wrong remote branch (maybe because of auto-complete). This could be more severe since our local history could be far behind the remote of that branch.

If we use merge instead, we also encounter one problem. After we finish our work on our branch, we have to do two things. First, we have to make sure our branch contains all the commits on the branch we want to merge to (say, Master), some of which could conflict our changes. Instead of rebasing Master, we merge Master into our branch and resolve those conflicts. This creates a conflict resolve commit. Second, we've created a pull request on GitHub and kind people have commented, we fix those comments. Without rebasing, we have to commit again and create a comment fixing commit on remote. Now our branch is ready to be merged into Master. But if we just merge, Master will have two commits that are irrelevant if we consider commit messages as change log.

Git itself provides us a solution to this problem. "git merge --squash". Let's say we are at the point where we have resolved the conflicts and fixed the PR comments with two corresponding commits on our branch (both local and remote), now we can checkout master, and do this:

"git merge ourBranch --squash"

This command will leave our branch's history alone and squash all of our branch's commits into one modification to Master. It also generates a commit message that contains all the commit messages of our branch and we can modify it to be more precise.

After merge squash, we can push the Master branch's merge commit to remote and there's no push force involved in the whole process. No history rewriting at all. Master's history contains only clean/meaningful commits that are relevant from the change log point of view.

However, if you like to merge pull request from GitHub's website, you are out of luck. The merge button of GitHub pull request only support "git merge --no-ff" Actually even if you rebase to make sure your branch's history is nice and clean, merging from GitHub's PR will still generate a merge commit on Master that is not very informative since it's a non fast forward merge. Purists will git merge locally and push to remote Master. Why not save us some Ctrl+Tabs and give us the option for PR merging GitHub?

Friday, March 13, 2015

Codility Lessions 4 Sorting - NumberOfDiscIntersections Simple Solution

Essentially, we treat the input as a set of line segments with different start and end values. If we sort the lines by their start values, and iterate from the smallest start value line, then for a given line segment, what it overlaps can be calculated by counting the lines which start before the end of the give line segment. After we have to remove this line's start value from the start values array, otherwise we get duplicate counts from other line segments overlapping with the one we already counted. When we remove, any duplicate start values won't matter. I tried to implement this using a Line class, creating two arrayLists for start values and lines. Which is somewhat lots of code.

But for this question, there is another input property we can use. Notice that for the input array A, the end value of A[j] line cannot be smaller than the start value of A[i] line if i<j. So we can just sort the start values array and iterate through input array A. For any A[i] and A[j] where i<j, if end value of A[i] is larger than start value of A[j], they must overlap.

Also, be careful with integer overflow.

Wednesday, March 4, 2015

A Simple Multi-Tenancy Implementation with Hibernate and Spring

For our implementation of the multi-tenant application, Each tenant's data is kept in a physically separate database instance. For each database instance a connection pool is set up when the application starts. For any http request that is not authenticated, a default connection pool is used, pointing to where the user login info is stored. After login the tenant identifier is stored in the Spring Authentication object, so that the corresponding connection pool for that user can be identified. For Spring Security login, we use hibernate, entity manager, and UserDetailService for user authentication.

The User class implements UserDetails interface, so that after user is authenticated we can retrieve this user instance from security context.

For Spring Security, we have to implements UserDetailService.

Then we have to config Persistence configurations.

Notice that we don't have to config a DataSource bean. Instead, we added additional configuration properties for hibernate. We have to create both MyTenantIdentifierResolver and MyMultiTenantConnectionProvider.

First let's look at the MyTenantIdentifierResolver, which essentially output a string for MyMultiTenantConnectionProvider class to choose the right connection pool.

Inside MyMultiTenantConnectionProvider, we first initialize a map containing all connection pools for all tenant. For the overridden method selectConnectionProvider(String s), the input string is provided by MyTenantIdentifierResolver, and the output is a ConnectionProvider object containing a datasource for the specific tenant.

Lastly all we need to define is a custom ConnectionProvider to hold a DataSource.

Tuesday, February 24, 2015

Use Reflection for Spring Data Jpa Specification to enable filter on all attributes of entities

Problem:

We have a large number of domain entities that we want to expose to front end. Front end would like the users to be able to filter on those entities using almost all attributes that those entities have.

Also, when the entities form one to one or one to many relations, the users should be able to filter on the attributes of the related entities as well.

If we use Spring Data JpaRepository interface, we can filter entities using Query Methods:

However if the User entity has 5 attributes, then there are 31 possible combinations of those 5 attributes that the user should be able to filter the results, which means 31 methods have to be defined. And we have to decide which method to be invoked in the controller class based on URL parameters. It will be the same situation using @query.

Solution:

So instead we have to use JpaSpecificationExecutor for our repositories and pass a Specification instance to the findAll() method of entity repositories. We create a BaseSpecification class to transform url filter parameters into predicats.

First we try to make every attribute of the entities filterable. To do that, we have to read the attribute name from the url query string and test whether the query key is one of the entity's attributes. We achieve this by using the Reflection API:
But we also want to be able to filter on attributes from the entities that are associated with the entity to be filterd. We design the api as such: http://localhost:8080/demo/api/user?address.street=5th will find all users whose address has the attribute Street and has the value 5th.

At this point the front end rises the requirement that the legacy front end code couldn't provide the entity name that is associated with the entity to be filtered. So the to find the users whose address has attribute street and value 5th, the query string looks like this: http://localhost:8080/demo/api/user?street=5th

In this case we have to search if the url query key is one of the attributes from the filtered entity or from entities associate with it. We can achieve this search using either Breadth First Search or Depth Firsts Search.

Here the search output a Pair. The Path object is later to be used to build the Criteria Predicate. And the Class object tells us the type of the attribute used for filtering. If the type is Date for example, we can generate Date specific Criteria Predicates.