For a while already we have been working with a large enterprise client, helping them to migrate their on-premise workloads to the cloud. Of course, as added value to the process, they are also migrating their legacy development processes to the modern, better, agile DevOps approach. And of course, they have built a modern Continuous Integration/Continuous Delivery (CI/CD) pipeline consisting of Bitbucket, Jenkins, Artifactory, Puppet and some relevant testing frameworks. “It is all great!”, you would say “what is the problem?”.

Because I am on all kind of mailing lists for this client, I noticed recently that my dedicated email inbox started getting more and more emails related to the CI/CD pipeline. Things like unexpected Jenkins build failures, artifacts cannot be downloaded, server outages and so on and so on. You already guessed it – emails that report problems with the CI/CD pipeline and prevent development teams from doing their job.

I don’t want to go into details what exactly went wrong with this client but I will only say that year ago when we designed the pipeline there were few things in the design that never made it into the implementation. The more surprising part for me though is, that if you search on the internet for CI/CD pipelines you will get the exact picture of what our client has in production. The problem is that all the literature about CI/CD is narrowly focused on how the code is delivered to its destination and the operational, security and business side of CI/CD pipeline is completely neglected.

Let’s step back and take a look at how a CI/CD pipeline is currently implemented in the enterprise. Looking at the picture below there are few typical components included in the pipeline:

  • Code repository – typically this is some Git flavor
  • Build tools like Maven
  • Artifacts repository – most of the times this is Nexus or Artifactory
  • CI automation or orchestration server – typically Jenkins in the enterprise
  • Configuration management and deployment automation tools like Puppet, Chef or Ansible
  • Various test automation tools depending on the project requirements and frameworks used
  • Internal or external cloud for deploying the code

Typical CI/CD Pipeline Components

The above set of CI/CD components is absolutely sufficient to getting the code from the developer’s desktop to the front-end servers and can be completely automated. But those components do not answer few very important questions:

  • Are all components of my CI/CD pipeline up and running?
  • Are the components performing according to my initial expectations (sometimes documented as SLAs)?
  • Who is committing code, scheduling builds and deployments?
  • Is the feature quality increasing from one test to another or is it decreasing?
  • How much each component cost me?
  • What is the overall cost of operating the pipeline? Per day, per week or per month? What about per deployment?
  • Which components can be optimized in order to achieve faster time to deployment and lower cost?

Too many questions that none of the typical components listed above can provide a holistic approach to answer. Jenkins may be able to send you a notification if a particular job fails but it will not tell you how much the build costs you. Artifactory may store all your artifacts but will not tell you if you are out of storage or give you the cost of the storage. The test tools can give you individual test reports but rarely build trends based on feature or product.

Hence, in our implementations of a CI/CD pipelines we always include three additional components as shown in the picture below:

  • Monitoring and Alerting Component that is used to collect data from each other component of the pipeline. The purpose of this component is to make sure the pipeline is running uninterrupted as well as to collect data used for the business reporting. If there are some anomalies alerts are sent to the affected parties
  • Security Component used not only to ensure consistent policies for access but to also provide auditing capabilities if there are requirements like HIPAA, PCI, SOX etc
  • Business Dashboarding and Reporting Component used for providing financial and project information to business users and management

Advanced CI/CD Pipeline Components

The way CI/CD pipelines are currently designed and implemented is yet another proof that we as technologists neglect important aspects of the technologies we design – security, reliability, and business (project and financial) reporting are very important to the CI/CD pipeline users and we should make sure that those are included in the design from get go and not implemented as an afterthought.

Since the sprawl of mobile apps and web services began, the need to create new usernames and passwords for each app or service started to become annoying and as it proved out decreases the overall security. Hence we decided to bet our authentication on the popular social media platforms (Facebook, Twitter, and Google) but wanted to make sure that we protect the authentication tokens on our side. Maybe in a later post I will go into more details what the pros and cons of this approach are but for now, I would like to concentrate on the technical side.

Here are the constraints or internal requirements we had to work with:

  • We need to support multiple social media platforms for authentication (Facebook, Twitter, and Google at minimum)
  • We need to support the web as well as mobile clients
  • We need to pass authentication information to our APIs but we also need to follow the REST guidelines for not maintaining state on the server side.
  • We need to make sure that we validate the social media auth token when it first reaches our APIs
  • We need to invalidate our own token after some time
The flow of events is shown on the following picture and the step-by-step explanations are below.

Authenticate with the Social Media site

The first step (step 1.) in the flow is to authenticate with the Social Media Site. They all use OAuth, however, each implementation varies and the information you receive back differs quite a lot. For details how to implement the OAuth authentication with each one of the platforms, take a look at the platform documentation. Here links to some:

Note that those describe the authentication with their APIs but in general the process is the same with clients. The ultimate goal here is to retrieve an authentication token that can be used to verify the user is who she or he claims to be.

We use Tornado Web server that has built-in authentication handler for the above services as well as generic OAuth handler that can be used for implementing authentication with other services supporting OAuth.

Once the user authenticates with the service the client receives information about the user as well as an access token (step 2. in the diagram) that can be used to validate the identity of the user. As mentioned above, each social media platform returns different information in the form of JSON object. Here are anonymized examples for the three services:

It is worth mentioning some differences related to the expiration times. Depending on how you do the authentication you may receive short-lived or long-lived tokens, and you should pay attention to the expiration times. For example, Twitter may respond with an access token that never expires ("x_auth_expires":"0"), while long-lived tokens for Facebook expire in ~60 days. The expiration time is given in seconds and it is approximate, which means it may not be exactly 60 mins or 60 days but a bit less.

Authenticate with the API

Now, that the user has authenticated with the Social Media site we need to make sure that she also exists in our user database before we issue a standardized token that we can handle in our APIs.

We created login APIs for each one of the Social Media platforms like follows

GET https://api.ourproducturl.com/v1.0/users/facebook/{facebook_user_id}
GET https://api.ourproducturl.com/v1.0/users/google/{google_user_id}
GET https://api.ourproducturl.com/v1.0/users/twitter/{twitter_user_id}

Based on what Social Media service was used to authenticate the user, the client submits a GET request to one of those APIs by including the authorization response from step 2 as part of the Authorization header for the request (step 3 in the diagram). It is important that the communication for this request is encrypted (ie. use HTTPS) because the access token should not be revealed to the public.

On the server side, few things happen. After extracting the Authorization header from the request we validate the token with the Social Media service (step 4).

Here are the URLs that you can use to validate the tokens:

  • Facebook (as well as documentation)
    https://graph.facebook.com/debug_token?input_token={token-to-inspect}&access_token={app-token-or-admin-token}
  • Google (as well as documentation)
    https://www.googleapis.com/oauth2/v3/tokeninfo?access_token={token-to-inspect}
  • Twitter (as well as documentation)
    https://api.twitter.com/1/account/verify_credentials.json?oauth_access_token={token-to-inspect}

If the token is valid, we compare the ID extracted from the Authorization header with the one specified in the URL. If any of the above two fail we return a 401 Unauthorized response to the client. If we pass those two checks, we do a lookup in our user database to find the user with the specified Social Media ID (step 5. in the diagram) and retrieve her record. We also retrieve information about her group participation so that we can do authorization later on for each one of the functional calls. If we cannot find the user in our database we return a 404 Not found response to the client.

Create API Token

For the purposes of our APIs, we decided to use encrypted JWT tokens. We include the following information into the JWT token:

  • User information like ID, first name and last name, email, address, city, state, zip code
  • Group membership for the user including roles
  • The authentication token for the Social Media service the user authenticated with
  • Expiration time (we settled on 60 minutes expiration)

Before we send this information back to the client (step. 8 in the diagram) we encrypt it (step. 7) using an encryption key or secret that we keep in Azure Key Vault (step. 6). The JWT token is sent back to the client in the Authorization header.

Call the functional APIs

Now, we replaced the access token the client received from the Social Media site with a JWT token that our application can understand and use for authentication and authorization purposes. Each request to the functional APIs (step 9 in the diagram) is required to have the JWT token in the Authorization header. Each API handler has access to the encryption key that is used to decrypt the token and extract the information from it (step 10).

Here are the checks we do before  every request is handled (step 11):

  • If the token is missing we return 401 Unauthorized to the client
  • If the user ID in the URL doesn’t match the user ID stored in the JWT token we return 401 Unauthorized to the client. All API requests for our product are executed in the context of the user
  • If the JWT token has expired we return 401 Unauthorized to the client. For now, we decided to expire the JWT token every 60 mins and request the client to re-authenticate with the APIs. In the future, we may decide to extend the token for another 60 mins or until the Social Media access token expires, so that we can avoid the user dissatisfaction from frequent logins. Hence we overdesigned the JWT token to store the Social Media access token also
  • If the user has no right to perform certain operation we return 403 Forbidden to the client denoting that the operation is forbidden for this user

Few notes on the implementation. Because we use Python we can easily implement all the authentication and authorization checks using decorators, which make our API handlers much easier to read and also enable an easy extension in the future (like for example, extending the validity of the JWT token). Python has also an easy to use JWT library available at Github at https://github.com/jpadilla/pyjwt.

Some additional resources that you may find useful when implementing JWT web tokens are:

 

You may be wondering, why I chose Python as the language to teach you software engineering practices? There are tons of other languages one can use for that purpose, languages that are much sexier than Python. Well, I certainly have my reasons, and here is a summary:

  • First of all, Python is very easy language to learn, which make it a good choice for beginners
  • Python is an interpretive programming language, which means that you receive immediate feedback from the commands you type
  • Python supports both, functional as well as object-oriented approaches to programming, which is good if you don’t know what path you want to choose
  • Python is a versatile language that can be used to develop all kinds of applications, hence it is used by people in various roles. Here some:
    • Front-end developers can use it to implement dynamic functionality on websites
    • Back-end developers can use it to implement cloud-based services, APIs and communicate with other services
    • IT people can use it to develop infrastructure, application deployment and all kinds of other automation
    • Data scientists can use it to create data models, parse data or implement machine learning algorithms

As you can see Python is a programming language that, if you become good at it, can enable multiple paths for your career. Learning the language as well as establishing good development habits will open many doors for you.

For the past twenty or so years, since I started my career in technology in 1996, almost every book I read about programming, although providing detailed coverage of the particular programming language the book was written about, lacked crucial information educating the reader how to become good Software Engineer. Learning a programming language from such a book is like learning the syntax and the grammar of a foreign language but never understanding the traditions of the native speakers, the idioms they use as well as how to express yourself without offending them. Yes, you can speak the language, but you will need a lot of work to do before you start to fit in.

Learning the letters, the words and how to construct a sentence is just a small part of learning a new language. This is also true for programming languages. Knowing the syntax, the data types, and the control structures will not make you a good software engineer. It is surprising to me that so many books and college classes concentrate only on those things while neglecting fundamental topics like how to design an application, how to write maintainable and performant code, how to debug, troubleshoot, package or distribute it. The lack of understanding in those areas makes new programmers not only inefficient but also establishes bad habits that are hard to change later on.

I’ve seen thousands and thousands of lines of undocumented code, whole applications that log no errors, and nobody can figure out where they break, web pages that take 20 mins to load, and plain silly code that calls a function to sum two numbers (something that can be achieved simply with a plus sign). Hence I decided to write a book that not only explains the Python language in simple and understandable approach but also teaches the fundamental practices of software engineering. Book that will, after reading it, have you ready to jump in and develop high-quality, performant and maintainable code that meets the requirements of your customers. Book, that any person can take, and learn how to become Software Engineer.

I intentionally use the term Software Engineer because I want to emphasize that developing high-quality software involves a lot more than just writing code. I wanted to write a book that will prepare you to be a Software Engineer, and not simply a Coder or Programmer. I hope that with this book I achieved this goal and helped you, the reader, to advance your career.

With our first full time developer on board I had to put some structure around the tools and services we will use to manage our work. In general I don’t want to be too prescriptive on what tools they should use to get the job done but it will be good to put some guidelines for the tool set and outline the mandatory and optional ones. For our development we’ve made the following choices:

  • Microsoft Azure as Cloud Provider
  • TornadoWeb and Python 2.7 as a runtime for our APIs and frontend
  • DocumentDB and Azure storage for our storage tier
  • Azure Machine Learning and Microsoft Cognitive Services for machine learning

Well, those are the mandatory things but as I mentioned in my previous post How to Build a Great Software Development Team?, software development is more than just technology. Nevertheless we had to decide on a toolset to at least start with, so here is the list:

1. Slack

My first impression of Slack was lukewarm, and I preferred the more conservative UI of HipChat. However compared to HipChat, Slack offered multiple teams capability right from the beginning, which allowed me to communicate not only with my team but use it for communication at client site as well as with the advisory team for North Seattle College. In addition HipChat introduced quite a few bugs in their latest versions, which made the team communication quite unreliable and non-productive, and this totally swayed the decision to go with Slack. After some time I got used to Slack’s UI and started linking it, and now it is an integral part of our team’s communication.

2. Outlook 2016

For my personal email I use Google Apps with custom domain however I’ve been long time Outlook user and with the introduction of Office 365 I think the value for the money is in Microsoft’s benefits. Managing multiple email accounts and calendars, scheduling in-person or online meetings using the GoToMeeting and Skype for Business plugins is a snap with Outlook. With the added benefit of using Word, Excel and PowerPoint as part of the subscription, Office 365 is a no-brainer. We use Office 365 E3, which gives each one of us full set of Office capabilities.

3. Dropbox

Sending files via email is an archaic approach, although I see that still being widely done. For that purpose we have set up Dropbox for the team. I have created shared folders for the leadership team as well as each one of the team members, allowing them to easily share files between each other. For the leadership team we settled on Dropbox Pro for the leadership team and the Free Dropbox for the team members. In the future we are considering to move to the Business Edition.

4. Komodo Edit

I have been a long-time fan of Komodo. It is a very lightweight IDE that offers highlighting and type-assist for number of programming languages like Python, HTML5, JavaScript and CSS3. It also allowing you to extend the functionality with third party plugins offering rich capabilities. I use it for most of my development.

5. Visual Studio Code

Visual Studio Code is the new cross-platform IDE from Microsoft. It is a lightweight IDE similar to Sublime Text, and offers lot of nice features that can be very helpful if you develop for Azure. It has built-in debugging, Intellisense and has a plugins extensibility model with growing number of plugin vendors. Great tool for creating mark-down documents, debugging with breakpoints from within IDE and more. Visual Studio Code is an alternative to Visual Studio that allows you to develop for Azure on platforms other than Windows. If you are Visual Studio fan but don’t want to pay hefty amount of money you can give Visual Studio Community Edition a try (unfortunately available for Windows only). Here is a Visual Studio Editions comparison chart that you may find useful.

6. Visual Studio Online

Managing the development project is crucial for the success of your team. The functionality that Visual Studio Online offers for keeping backlogs, tracking sprint work items and reporting is comparable if not better than Jira, and if you are bound to the Microsoft ecosystem it is the obvious choice. For our small team we leverage almost completely the free edition and it gives us all the functionality we need to manage the work.

7. Docker

Being able to deploy a complete development environment with the click of a button is crucial for the development productivity. Creating Docker Compose template consisting of two TornadoWeb workers and NGINX load-balancer in front (very similar configuration to what we plan to use in Production) is less than an hour task with Docker, and reduces the operational overhead for developers multiple times. Not only that but also completely mimics the production configuration, which means the probability of introducing bugs caused by environment differences is practically zero.

With the introduction of Docker for Windows all the above became much easier to do on Windows Desktop, which is an added benefit.

8. Draw.IO

Last but not least being able to visually communicate your application or system design is essential for successful development projects. For that purpose we use Draw.IO. In addition to the standard block diagrams and flowcharts it offers Azure and AWS specific diagrams, creation of UI mockups, and even UML if you want to go so far.

Armed with the above set of tools you are well prepared to move fast with your development project on a lean budget.

For awhile I have been looking for a good sample application in Python that I can use for training purposes. Majority of the sample applications available online cover certain topic like data structures or string manipulation, but so far I have not found one that has more holistic approach. For Basic Python Developer Training I would like to use a real-life application that covers various areas from the language syntax and structures, but can also teach good software development practices. There are minimum requirements for a Software Developer that I believe need to be taught in Basic Development Classes, and the projects used in such classes need to make sure that those minimum requirements are met.

For our new developers training I decided to use a simple Expense Reports application with very basic requirements:

  • I should be able to store receipts information into a file
  • The following information about the receipt should be stored
    • Date
    • Store
    • Amount
    • Tags
  • I should be able to generate a report for my expenses based on the following information
    • Date range
    • Store
    • Tags

My goal with this application is to teach junior developers few things:

  • Python Language Concepts like data types, control structures etc. as well as a bit more complex concepts like data structures, data manipulation, data conversion, file input and output and so on
  • Code Maintainability Practices like naming conventions, comments and code documentation, modularity etc.
  • Basic Application Design including requirements analysis and clarification
  • Basic User Experience concepts like UI designs, user prompts, input feedback, output formatting etc.
  • Application Reliability including error and exception handling
  • Testing that includes unit and functional testing
  • Troubleshooting that includes debugging and logging
  • Interoperability for interactions with other applications
  • Delivery including packaging and distribution

I have started a Python Expenses Sample Github Project for the application, where I will check-in the code from the weekly classes as well as instructions how to use those.

 

We are looking to hire a few interns for the summer, and this made me thinking what approach should I take to provide great experience for them as well as get out some value for us. The culture that I am trying to establish in my company is based on the premise that software development is more than writing code. I know that this is a overused cliche but if you look around there are thousands of educational institutions and private companies that concentrate on teaching just that – how to write code while neglecting everything else that is involved in software development.

For that purpose I decided to create a crash course of good software development practices and just run our new hires through it. Being involved in quite a few technology projects over the last 20+ years, and having seen lot of failures and successes I have developed my own approach that may or may not fit the latest trends but has served me well. Also, having managed one of the best performing teams in the Microsoft Windows Division during Windows 7 (yes, I can claim that :)) I can say that I also have some experience with building great teams.

So, my goal for our interns is at the end of the summer to be real software developers, and for that experience they will get paid instead spend money. Now, here are the things that I want them to know at the end of the summer:

Team

The team is the most important part of software development. The few important things that I want to teach them are that they need to work together, help each other, solve problems together, and NOT throw things over the fence because this is not their area of responsibility. If they learn  this I will accomplish my job as their mentor (I am kidding 🙂 but yes, I think there are too many broken teams around the world).

As a software developer they are responsible for the product and the customer experience, doesn’t matter whether they write the SQL scripts, or the APIs or the client UI. If there is a problem with the code they need to work with their peers to troubleshoot and solve the problem. If one of their peers has a difficulty implementing something and they know the answer they should help them move to the next level, and not keep it for themselves because they are scared that she or he will take their job.

And one more thing – politics are strictly forbidden! 

Communication

Communication is a key. The first thing I standardize in each one of my projects is the communication channels for the team. And this is not as simple as using Slack for everything but regular meetings, who manages the meetings, where documents are saved, what are the naming conventions for folders, files etc., when to use what tool (Slack, email, others) and so on.

Being able to effectively communicate does not mean strictly defining organizational hierarchies, it means keeping everyone informed and being transparent.

Development Process

As a friend of mine once said: “Try to build a car with agile!” We always jump to the latest and greatest but often forget the good parts from the past. Yes, I will teach them agile – Scrum or Kanban – doesn’t really matter, important is that they feel comfortable with their tasks and are able to deliver. And, don’t forget – there will be design tasks for everything. This brings me to:

Software Design

Software Design is integral part of the software development. There are few parts of the design that I put emphasis on:

  • User Interface (UI) design
    They need to be able to understand what purpose the wire-frames play; what is redlining; when do we need one or the other; what are good UI design patterns and how to find those and so on
  • Systems design
    They need to understand how each individual system interacts with the rest, how each system is configured and how their implementation is impacted
  • Software components design
    Every piece of software has dependencies, and they need to learn how to map those dependencies, how different components interact with each other, and where things can break. Things like libraries, packaging, code repository structure etc. all play role here

Testing

The best way to learn good development practices is to test your own software. Yes, this is correct – eat your own dogfood! And I am not talking about testing just the piece of code you wrote (the so called unit testing) but the whole experience you worked on as well as the whole product.

By learning how better to test their code my developers will not only see the results of their work but next time will be more cognizant of the experience they are developing and how can they improve it.

Build and Deployment

Manually building infrastructure and deploying code is painful and waste of time. I want my software developers to think about automated deployment from the beginning. As a small company we cannot afford to have dedicated Operations team, whose only job is to copy bits from one place to another and build environments.

Using tools like Puppet, Chef, Ansible or Salt is not new to the world but having people manually create virtual machines still seems to very common. Learning how to automate their work will allow them to deliver more in less time and become better performers.

Operations

Operating the application is the final step in the software development. Being able to easily troubleshoot issues and fix bugs is crucial for the success of every development team. Incorporating things like logging, performance metrics, analytical data and so on from the beginning is something I want my developers to learn at the start.

One of the areas I particularly put emphasis on is the Business Insights (BI) part of software development. Incorporating code that collects usage information will not only help them better understand the features they implemented but most importantly will prevent them from writing things nobody uses.

The list above is a very rough plan for the crash course I plan for our interns. As it progresses I will post more details how it goes, what they learned, what tools we use and so on. I started sketching the things on the mindmap above and it is growing pretty fast.

It will be interesting experience 🙂