Recently I had to design the backup infrastructure for cloud workloads for a client in order to ensure that we comply with the Business Continuity and Disaster Recovery standards they have set. However, following traditional IT practices in the cloud quite often poses certain challenges. The scenario that we had to satisfy is best shown in the picture below:

Agent-Based Backup Architecture

The picture is quite simple:

  1. Application servers have a backup agent installed
  2. The backup agent submits the data that needs to be backed up to the media server in the cloud
  3. The cloud media server submits the data to the backup infrastructure on premise, where the backups are stored on long-term storage according to the policy

This is a very standard architecture for many of the current backup tools and technologies.

Some of the specifics in the architecture above are that:

  • The application servers and the cloud media server exist in different accounts or VPCs if we use AWS terminology or virtual networks or subscriptions if you consider Microsoft Azure terminology
  • The connectivity between the cloud and on-premise is established through DirectConnect or ExpressRoute and logically those are also considered separate VPCs or virtual networks

This architecture would be perfectly fine if the application servers were long-lived, however, we were transitioning the application team to a more agile DevOps process, which meant that they will use automation to replace the application servers with every new deployment (for more information take a look at the Blue/Green Deployment White Paper published on our company’s website). This, though, didn’t fit well with the traditional process that the IT team, managing the on-premise Netbackup infrastructure, uses.  The main issue was that every time one of the application servers gets terminated, somebody from the on-prem IT team will get paged for failed backup, and trigger an unnecessary investigation.

One option for solving the problem, presented to us by the on-premise IT team, was to use traditional job scheduling solutions to trigger script that will create the backup and submit it to the media server. This approach doesn’t require them to manually whitelist the IP addresses of the application server into their centralized backup tool, and will not generate error event but involved additional tools that would require much more infrastructure and license fees. Another option was to keep the old application servers running longer so that the backup team has enough time to remove the IPs from the white-list. This, though, required manual intervention on both sides (ours and the on-prem IT team) and was prone to errors.

The approach we decided to go with required a little bit more infrastructure but was fully automatable and was relatively cheap compared to the other two options. The picture below shows the final architecture.

The only difference here is that instead of running the backup agents on the actual application instances, we run just one backup agent on a separate instance that has an unlimited lifespan and doesn’t get terminated with every release. This can be a much smaller instance than the ones used for hosting the application, which will save some cost, and its role is only for hosting the backup agent, hence no other connections to it should be allowed. The daily backups for the applications will be stored on a shared drive that is accessible on the instance hosting the agent, and this shared drive is automatically mounted on the new instances during each deployment. Depending on whether you deploy this architecture in AWS or Azure, you can use EFS or Azure Files for the implementation.

Here are the benefits that we achieved with this architecture:

  • Complete automation of the process that supports Blue/Green deployments
  • No changes in the already existing backup infrastructure managed by the IT team using traditional IT processes
  • Predictable, relatively low cost for the implementation

This was a good case study where we bridged the modern DevOps practices and the traditional IT processes to achieve a common goal of continuous application backup.

It is surprising to me that every day I meet developers who do not have a basic understanding of how computers work. Recently I got into an argument of whether this is necessary to become a good cloud software engineer, and the main point of my opponent was that “modern languages and frameworks take care of lots of stuff behind the scenes, hence you don’t need to know about those”. Although the latter is true, it does not release us (the people who develop software) from the responsibility to think when we write software.

The best analogy I can think of is the recent stories with Tesla’s autopilot – because it is called “autopilot” doesn’t mean that it will not run you into a wall. Similar to the Tesla’s driver, as a software engineer, it is your responsibility to understand how your code is executed (ie where your car is taking you), and if you don’t have a basic understanding how computers work (ie how to drive a car or common sense in general:)), you will not know whether it runs well.

If you want to become an advanced Cloud Software Engineer, there are certain things that you need to understand in order to be able to develop applications that run on multiple machines in parallel, across many geographical regions, using third party services and so on. Here is an initial list of things that, I believe, is essential for Cloud Software Engineers to know.

First of all, every Software Engineer (Cloud, Web, Desktop, Mobile etc.), needs to understand the fundamentals of computing. Things like numeric systems and character encoding, bits and bytes are essential knowledge for Software Engineers. Also, you need to understand how operating on bits and bytes is different from operating on decimal digits, and what issues you can face.

Understanding the computer hardware is also important for you as a Software Engineer. In the ages of virtualization, one would say that this is nonsense but knowing that data is fetched from the permanent storage and stored in the operating memory before your application can process it may be quite important for data-heavy applications. Also, this will help you decide what size virtual machine you need – one with more CPU, more memory, more local storage or all of the above.

Basic knowledge of Operating Systems, and particularly processes, execution threads, and environment settings are another thing that Software Engineers must learn, else how would you be able to implement or configure an application that supports multiple concurrent users.

Networking basics like IP addresses, Domain Name Service (DNS), routing and load balancing are used every day in the cloud. Not knowing those terms and how they work is quite often the reason web sites and services go down.

Last, but now least security is very important in order to protect your users from malicious activities. Things like encryption and certificate management are must-know for Software Engineers developing cloud-based applications.

You don’t need to be an expert in the topics above, in order to be a good Cloud Software Engineer, but you need to be able to understand how each of the topics above impacts your application, and tweak your code accordingly. In the next few posts, I will go over the minimum knowledge that you must obtain in order to have a solid background as Cloud Software Engineer. For more in-depth information you can research each one of the topics using your favorite search engine.

 

Since the sprawl of mobile apps and web services began, the need to create new usernames and passwords for each app or service started to become annoying and as it proved out decreases the overall security. Hence we decided to bet our authentication on the popular social media platforms (Facebook, Twitter, and Google) but wanted to make sure that we protect the authentication tokens on our side. Maybe in a later post I will go into more details what the pros and cons of this approach are but for now, I would like to concentrate on the technical side.

Here are the constraints or internal requirements we had to work with:

  • We need to support multiple social media platforms for authentication (Facebook, Twitter, and Google at minimum)
  • We need to support the web as well as mobile clients
  • We need to pass authentication information to our APIs but we also need to follow the REST guidelines for not maintaining state on the server side.
  • We need to make sure that we validate the social media auth token when it first reaches our APIs
  • We need to invalidate our own token after some time
The flow of events is shown on the following picture and the step-by-step explanations are below.

Authenticate with the Social Media site

The first step (step 1.) in the flow is to authenticate with the Social Media Site. They all use OAuth, however, each implementation varies and the information you receive back differs quite a lot. For details how to implement the OAuth authentication with each one of the platforms, take a look at the platform documentation. Here links to some:

Note that those describe the authentication with their APIs but in general the process is the same with clients. The ultimate goal here is to retrieve an authentication token that can be used to verify the user is who she or he claims to be.

We use Tornado Web server that has built-in authentication handler for the above services as well as generic OAuth handler that can be used for implementing authentication with other services supporting OAuth.

Once the user authenticates with the service the client receives information about the user as well as an access token (step 2. in the diagram) that can be used to validate the identity of the user. As mentioned above, each social media platform returns different information in the form of JSON object. Here are anonymized examples for the three services:

It is worth mentioning some differences related to the expiration times. Depending on how you do the authentication you may receive short-lived or long-lived tokens, and you should pay attention to the expiration times. For example, Twitter may respond with an access token that never expires ("x_auth_expires":"0"), while long-lived tokens for Facebook expire in ~60 days. The expiration time is given in seconds and it is approximate, which means it may not be exactly 60 mins or 60 days but a bit less.

Authenticate with the API

Now, that the user has authenticated with the Social Media site we need to make sure that she also exists in our user database before we issue a standardized token that we can handle in our APIs.

We created login APIs for each one of the Social Media platforms like follows

GET https://api.ourproducturl.com/v1.0/users/facebook/{facebook_user_id}
GET https://api.ourproducturl.com/v1.0/users/google/{google_user_id}
GET https://api.ourproducturl.com/v1.0/users/twitter/{twitter_user_id}

Based on what Social Media service was used to authenticate the user, the client submits a GET request to one of those APIs by including the authorization response from step 2 as part of the Authorization header for the request (step 3 in the diagram). It is important that the communication for this request is encrypted (ie. use HTTPS) because the access token should not be revealed to the public.

On the server side, few things happen. After extracting the Authorization header from the request we validate the token with the Social Media service (step 4).

Here are the URLs that you can use to validate the tokens:

  • Facebook (as well as documentation)
    https://graph.facebook.com/debug_token?input_token={token-to-inspect}&access_token={app-token-or-admin-token}
  • Google (as well as documentation)
    https://www.googleapis.com/oauth2/v3/tokeninfo?access_token={token-to-inspect}
  • Twitter (as well as documentation)
    https://api.twitter.com/1/account/verify_credentials.json?oauth_access_token={token-to-inspect}

If the token is valid, we compare the ID extracted from the Authorization header with the one specified in the URL. If any of the above two fail we return a 401 Unauthorized response to the client. If we pass those two checks, we do a lookup in our user database to find the user with the specified Social Media ID (step 5. in the diagram) and retrieve her record. We also retrieve information about her group participation so that we can do authorization later on for each one of the functional calls. If we cannot find the user in our database we return a 404 Not found response to the client.

Create API Token

For the purposes of our APIs, we decided to use encrypted JWT tokens. We include the following information into the JWT token:

  • User information like ID, first name and last name, email, address, city, state, zip code
  • Group membership for the user including roles
  • The authentication token for the Social Media service the user authenticated with
  • Expiration time (we settled on 60 minutes expiration)

Before we send this information back to the client (step. 8 in the diagram) we encrypt it (step. 7) using an encryption key or secret that we keep in Azure Key Vault (step. 6). The JWT token is sent back to the client in the Authorization header.

Call the functional APIs

Now, we replaced the access token the client received from the Social Media site with a JWT token that our application can understand and use for authentication and authorization purposes. Each request to the functional APIs (step 9 in the diagram) is required to have the JWT token in the Authorization header. Each API handler has access to the encryption key that is used to decrypt the token and extract the information from it (step 10).

Here are the checks we do before  every request is handled (step 11):

  • If the token is missing we return 401 Unauthorized to the client
  • If the user ID in the URL doesn’t match the user ID stored in the JWT token we return 401 Unauthorized to the client. All API requests for our product are executed in the context of the user
  • If the JWT token has expired we return 401 Unauthorized to the client. For now, we decided to expire the JWT token every 60 mins and request the client to re-authenticate with the APIs. In the future, we may decide to extend the token for another 60 mins or until the Social Media access token expires, so that we can avoid the user dissatisfaction from frequent logins. Hence we overdesigned the JWT token to store the Social Media access token also
  • If the user has no right to perform certain operation we return 403 Forbidden to the client denoting that the operation is forbidden for this user

Few notes on the implementation. Because we use Python we can easily implement all the authentication and authorization checks using decorators, which make our API handlers much easier to read and also enable an easy extension in the future (like for example, extending the validity of the JWT token). Python has also an easy to use JWT library available at Github at https://github.com/jpadilla/pyjwt.

Some additional resources that you may find useful when implementing JWT web tokens are:

 

Over the last few days, I was looking for a way to automate our deployment environments on Azure and also investigating automation frameworks for a customer. The debate was between Terraform and Ansible and the following article from Gruntwork did really good work on tilting the weight towards Terraform. We have similar considerations like the guys from Gruntwork so everything matched well. Now, the task was to get Terraform working with Azure, which was a small challenge compared to AWS.

For those of you interested in the background here is the Terraform documentation for Azure provider, which is pretty good but missed a small piece about assigning a role as described in this StackOverflow post. Of course, the post points to the Azure documentation about using the CLI to assign the role to the principal, which for me ended up with the following error:

Principals of type Application cannot validly be used in role assignments.

At the end of the day because of time pressure, I wasn’t able to figure out the CLI way to do that but it seems there is a way to do it through the Azure Management Portal so here are the steps with visuals:

Create Application Registration in Your Azure Subscription

  • Go to the new Azure Portal at http://portal.azure.com and select Azure Active Directory in the navigation pane on the left:

  • Select App Registrations from the tasks blade

  • Click on the Add button at the top of the blade and fill in the information for the Terraform app. You can choose any name for the Name field as well as any valid URL string for the Sign-on URL field. Click on the Create button to create the app.

  • Click on the newly created app and in the Settings blade select Required Permissions

  • Click on the Add button at the top of the blade

  • In Step 1 Select an API select the Windows Azure Service Management API and click on the Select button

  • In Step 2 Select Permissions select Access Azure Service Management as organization users (preview) and click on the Select button

  • Click on the Done button to complete the flow

Now, you have your App Registration complete however you still need to assign a role for your application. Here is how this is done.

Assign a Role for Terraform App to Use ARM

Assigning role to your application is done on a Subscription level in Azure Portal.

  • Select Subscriptions in the navigation pane on the left

  • Select the subscription where you have registered the app and select Access Control (IAM) in the task blade

  • Click on the Add button at the top of the blade and in Step 1 Select a role choose the most appropriate role for your Terraform application

Although you may be tempted to choose Owner in this step I would suggest thinking your security policies through and selecting role that has more restrictive access. For example, if you have DevOps people running Terraform scripts you may want to give them Contributor role and prevent them from managing the user access. Also, if you have database team that wants to only manage Azure SQL and DocumentDB you may just restrict them to SQL DB Contributor and DocumentDB Account Contributor. List of built-in RBAC roles for Azure is available here.

  • In Step 2 Add Users type the name of your app in the search field and select it from  the list. Click on the Select button to confirm

  • Click the OK button to complete the flow

Collecting ARM Credentials Information for Terraform

In order for Terraform to connect to Azure and manage the resources using Azure Resource Manager you need to collect the following information:

  • Subscription ID
  • Client ID is also known as Application ID in Azure terminology
  • Client Secret is also known as Key in Azure terminology
  • and Tenant ID is also known as Directory ID in Azure terminology

Here is where to obtain this information from.

Azure Subscription ID

Click on Subscriptions in the navigation pane -> Select the subscription where you created the Terraform app and copy the GUID highlighted in the picture below.

Azure Client ID

What Terraform refers to as Client ID is actually the Application ID for the app that you just registered. You can get it by selecting Azure Active Directory -> App registrations -> select the name of the app you just registered and copy the GUID highlighted in the picture below.

Azure Client Secret

What Terraform refers to as Azure Client Secret is a Key that you create in your App registration. Follow these steps to create the key:

  • From Azure Active Directory -> App registrations select the application that you just created and then select Keys in the Settings blade

  • Fill in the Key description, select Duration and click on the Save button at the top of the blade. The Key value will be shown after you click the Save button.

Note: Copy and save the key value immediately. If you navigate away from the blade you will not be able to see the value anymore. You can delete the key and create a new one in the future if you lose the value.

Azure Tenant ID

The last piece of information you will need to connect Terraform to Azure Resource Manager is a Tenant ID, which is also known as Directory ID in Azure terminology. This is actually the GUID used to identify your Azure Active Directory.

Select Azure Active Directory and scroll down to show the Properties in the tasks blade. Select Properties and copy the GUID highlighted on the picture below.

Terraform documentation describes a different method to obtain the Tenant ID that involves showing the OAuth Authorization Endpoint for the application that you just created and copying the GUID from the URL. I think, their approach is a bit more error prone but if you feel comfortable in your Copy/Paste abilities you may want to give it a try.

I hope that by describing this a bit convoluted registration process you will be able to be more productive managing your resources on Azure.

Sharding the data in your big data storage is often not a trivial problem to solve. Sooner or later you will discover that the sharding schema you used initially may not be the right one long term. Fortunately, we stumbled upon this quite early in our development and decided to redesign our data storage tier before we fill it in with lots of data, which will make the shuffling quite complex.

In our particular case, we started with Microsoft Azure’s DocumentDB, which limits its collection size to 250GB unless you explicitly ask Microsoft for more. DocumentDB provides automatic partitioning of the data but selecting the partition key can still be a challenge, hence you may find the exercise below useful.

The scenario we were trying to plan for was related to our personal finance application allowing users to save receipts information. Briefly, the flow is as follows: user captures receipt with her phone, we convert the receipt to JSON and store the JSON in DocumentDB; users can be part of a group (for example a family can be a group), and the receipts are associated with the group. Here simple relationship model:

zenxpense-data-model

The expectation is that users will upload receipts every day, and they will use the application (we hope:)) for years to come. We did the following estimates for our data growth:

  • We would like to have a sharding schema that can support our needs for the next 5 years; we don’t know what will be our user growth for the next 5 years but as every other startup we hope that this will be something north of 10 million users 🙂
  • A single user or family saves about 50 receipts a week, which will result in approximately 2,500 receipts a year or 12,500 for 5 years
  • Single receipt requires about 10KB storage. We can also store a summary of the receipt, which will require about 250 bytes of storage (but we will still need to store the full receipt somewhere)

Additionally, we don’t need to store the user and group documents in separate collections (i.e. we can put all three in the same collection) but we decided to do so in order to allow easier access to that data for authentication, authorization, and visualization purposes. With all that said we are left to deal with mostly the receipts data that will be growing at a faster pace. Based on the numbers above in a single collection, we can store 25M receipts or 1B summaries. Thus we started looking at different approaches to shard the data.

Using the assumptions above you can easily come up with some staggering numbers. For example for the first year we should project for:

2M users * 2,500 receipts/each * 10KB per receipt = 50TB of storage

Which may even question the choice of DocumentDB (or any other No-SQL database) as the data storage tier. Nevertheless, the point of this post is how to shard the data and what process we went through to do that.

In the below explanations I will use DocumentDB terminology to explain the concepts but you can easily translate this to any other database or storage technology.

Sharding by tenant identifier

One obvious approach for sharding the data is to use the tenant identifier (user_id or group_id in our case) as a key. Thus we will have a single collection where we store the mapping information and multiple collections that will store the receipts for a range of groups. As shown in the picture below, based on group_id we will be able to retrieve the name of the collection where the receipts for this group are stored using the map collection, and then query the resulting collection to retrieve any receipt that belongs to the group.

sharding-by-group-id

Using this approach, though, and taking into account our estimates, each collection will be able to support only 2,000 groups.

2,000 groups * 2,500 receipts/year * 5 years * 10KB = 250GB

Assuming linear growth for our users over 5 years results in 2M users for the first year, which in the best case will be 500K groups (4 users per family for example) or 250 collections. The whole problem is that we will need to create a new collection for every 2000 groups although the previous one is less than 20% full. A bit expensive having in mind that we don’t know what the growth of our user base and the use of our product will be.

A preferred approach would be to create new collection only when the previous one becomes full.

Sharding by timestamp

Because the receipts are submitted over time, another feasible approach would be to shard the data by timestamp. Thus we will end up with a picture similar to the above, however, instead of having group_id as the partition key, we can use the timestamp instead – receipts with timestamps in particular range will be stored in a single partition.

In this case, we would have problems pulling out all the receipts for a particular group but after considering that this is a very rare scenario (but still possible) the trade-off may be warranted. Searching for receipt by properties would also be a challenge though because we will need to scan every collection. For the everyday use, users will request the receipts from the last week or month, which will result in a query to a single collection.

The good side of this approach is that we will need to create new collection only when the previous one is filled in, which means we will not be paying for unused space.

Multi-tier sharding

The previous two approaches assume that there is a single tier for sharding the data. Another approach would be to have two (or more) tiers for sharding the data. In our case, this would look something like this:

multi-tier-sharding

Using this approach we will store the receipt summaries in the first shard tier, which will allow us to save more receipts in a smaller number of collections. We will be able to search by group_id to identify the receipts we need and then pull the full receipt if the user requests it. If we run the numbers it will look something like this for the first year:

2M users -> 500K groups -> 6.25B receipts -> 250 partitions + 7 intermediate partitions

However, we can support 80,000 groups with a single intermediate collection (instead 2000 as in the previous case) and we will fill in both the summary and the full-receipts collections before a new one is created. Also, we will grow the number of collections much slower if our user base grows fast.

The multi-tier sharding approach can also be done using the timestamps or the receipt identifiers as keys for the intermediate collection.

Sharding by receipt identifier

Sharding by receipt_id is obviously the simplest way to shard the data, however, this may not be feasible in a scenario like ours because the receipts are retrieved based on the group_id, and it will result in querying every collection to retrieve all the receipts or find a particular receipt belonging to a group. Well, this is in case the No-SQL provider does not offer automatic partitioning functionality but because DocumentDB does so our problem turned out to be a no problem 🙂 Nevertheless, you need to consider all the implications while choosing the partition key.

As I mentioned above we started with DocumentDB as our choice for storing the data but after running the numbers we may reconsider our choice. DocumentDB is a great choice for storing JSON data and offers amazing features for partitioning and querying it however, looking at our projections the cost of using it may turn out quite high.

You may be wondering, why I chose Python as the language to teach you software engineering practices? There are tons of other languages one can use for that purpose, languages that are much sexier than Python. Well, I certainly have my reasons, and here is a summary:

  • First of all, Python is very easy language to learn, which make it a good choice for beginners
  • Python is an interpretive programming language, which means that you receive immediate feedback from the commands you type
  • Python supports both, functional as well as object-oriented approaches to programming, which is good if you don’t know what path you want to choose
  • Python is a versatile language that can be used to develop all kinds of applications, hence it is used by people in various roles. Here some:
    • Front-end developers can use it to implement dynamic functionality on websites
    • Back-end developers can use it to implement cloud-based services, APIs and communicate with other services
    • IT people can use it to develop infrastructure, application deployment and all kinds of other automation
    • Data scientists can use it to create data models, parse data or implement machine learning algorithms

As you can see Python is a programming language that, if you become good at it, can enable multiple paths for your career. Learning the language as well as establishing good development habits will open many doors for you.

For the past twenty or so years, since I started my career in technology in 1996, almost every book I read about programming, although providing detailed coverage of the particular programming language the book was written about, lacked crucial information educating the reader how to become good Software Engineer. Learning a programming language from such a book is like learning the syntax and the grammar of a foreign language but never understanding the traditions of the native speakers, the idioms they use as well as how to express yourself without offending them. Yes, you can speak the language, but you will need a lot of work to do before you start to fit in.

Learning the letters, the words and how to construct a sentence is just a small part of learning a new language. This is also true for programming languages. Knowing the syntax, the data types, and the control structures will not make you a good software engineer. It is surprising to me that so many books and college classes concentrate only on those things while neglecting fundamental topics like how to design an application, how to write maintainable and performant code, how to debug, troubleshoot, package or distribute it. The lack of understanding in those areas makes new programmers not only inefficient but also establishes bad habits that are hard to change later on.

I’ve seen thousands and thousands of lines of undocumented code, whole applications that log no errors, and nobody can figure out where they break, web pages that take 20 mins to load, and plain silly code that calls a function to sum two numbers (something that can be achieved simply with a plus sign). Hence I decided to write a book that not only explains the Python language in simple and understandable approach but also teaches the fundamental practices of software engineering. Book that will, after reading it, have you ready to jump in and develop high-quality, performant and maintainable code that meets the requirements of your customers. Book, that any person can take, and learn how to become Software Engineer.

I intentionally use the term Software Engineer because I want to emphasize that developing high-quality software involves a lot more than just writing code. I wanted to write a book that will prepare you to be a Software Engineer, and not simply a Coder or Programmer. I hope that with this book I achieved this goal and helped you, the reader, to advance your career.

With our first full time developer on board I had to put some structure around the tools and services we will use to manage our work. In general I don’t want to be too prescriptive on what tools they should use to get the job done but it will be good to put some guidelines for the tool set and outline the mandatory and optional ones. For our development we’ve made the following choices:

  • Microsoft Azure as Cloud Provider
  • TornadoWeb and Python 2.7 as a runtime for our APIs and frontend
  • DocumentDB and Azure storage for our storage tier
  • Azure Machine Learning and Microsoft Cognitive Services for machine learning

Well, those are the mandatory things but as I mentioned in my previous post How to Build a Great Software Development Team?, software development is more than just technology. Nevertheless we had to decide on a toolset to at least start with, so here is the list:

1. Slack

My first impression of Slack was lukewarm, and I preferred the more conservative UI of HipChat. However compared to HipChat, Slack offered multiple teams capability right from the beginning, which allowed me to communicate not only with my team but use it for communication at client site as well as with the advisory team for North Seattle College. In addition HipChat introduced quite a few bugs in their latest versions, which made the team communication quite unreliable and non-productive, and this totally swayed the decision to go with Slack. After some time I got used to Slack’s UI and started linking it, and now it is an integral part of our team’s communication.

2. Outlook 2016

For my personal email I use Google Apps with custom domain however I’ve been long time Outlook user and with the introduction of Office 365 I think the value for the money is in Microsoft’s benefits. Managing multiple email accounts and calendars, scheduling in-person or online meetings using the GoToMeeting and Skype for Business plugins is a snap with Outlook. With the added benefit of using Word, Excel and PowerPoint as part of the subscription, Office 365 is a no-brainer. We use Office 365 E3, which gives each one of us full set of Office capabilities.

3. Dropbox

Sending files via email is an archaic approach, although I see that still being widely done. For that purpose we have set up Dropbox for the team. I have created shared folders for the leadership team as well as each one of the team members, allowing them to easily share files between each other. For the leadership team we settled on Dropbox Pro for the leadership team and the Free Dropbox for the team members. In the future we are considering to move to the Business Edition.

4. Komodo Edit

I have been a long-time fan of Komodo. It is a very lightweight IDE that offers highlighting and type-assist for number of programming languages like Python, HTML5, JavaScript and CSS3. It also allowing you to extend the functionality with third party plugins offering rich capabilities. I use it for most of my development.

5. Visual Studio Code

Visual Studio Code is the new cross-platform IDE from Microsoft. It is a lightweight IDE similar to Sublime Text, and offers lot of nice features that can be very helpful if you develop for Azure. It has built-in debugging, Intellisense and has a plugins extensibility model with growing number of plugin vendors. Great tool for creating mark-down documents, debugging with breakpoints from within IDE and more. Visual Studio Code is an alternative to Visual Studio that allows you to develop for Azure on platforms other than Windows. If you are Visual Studio fan but don’t want to pay hefty amount of money you can give Visual Studio Community Edition a try (unfortunately available for Windows only). Here is a Visual Studio Editions comparison chart that you may find useful.

6. Visual Studio Online

Managing the development project is crucial for the success of your team. The functionality that Visual Studio Online offers for keeping backlogs, tracking sprint work items and reporting is comparable if not better than Jira, and if you are bound to the Microsoft ecosystem it is the obvious choice. For our small team we leverage almost completely the free edition and it gives us all the functionality we need to manage the work.

7. Docker

Being able to deploy a complete development environment with the click of a button is crucial for the development productivity. Creating Docker Compose template consisting of two TornadoWeb workers and NGINX load-balancer in front (very similar configuration to what we plan to use in Production) is less than an hour task with Docker, and reduces the operational overhead for developers multiple times. Not only that but also completely mimics the production configuration, which means the probability of introducing bugs caused by environment differences is practically zero.

With the introduction of Docker for Windows all the above became much easier to do on Windows Desktop, which is an added benefit.

8. Draw.IO

Last but not least being able to visually communicate your application or system design is essential for successful development projects. For that purpose we use Draw.IO. In addition to the standard block diagrams and flowcharts it offers Azure and AWS specific diagrams, creation of UI mockups, and even UML if you want to go so far.

Armed with the above set of tools you are well prepared to move fast with your development project on a lean budget.

For awhile I have been looking for a good sample application in Python that I can use for training purposes. Majority of the sample applications available online cover certain topic like data structures or string manipulation, but so far I have not found one that has more holistic approach. For Basic Python Developer Training I would like to use a real-life application that covers various areas from the language syntax and structures, but can also teach good software development practices. There are minimum requirements for a Software Developer that I believe need to be taught in Basic Development Classes, and the projects used in such classes need to make sure that those minimum requirements are met.

For our new developers training I decided to use a simple Expense Reports application with very basic requirements:

  • I should be able to store receipts information into a file
  • The following information about the receipt should be stored
    • Date
    • Store
    • Amount
    • Tags
  • I should be able to generate a report for my expenses based on the following information
    • Date range
    • Store
    • Tags

My goal with this application is to teach junior developers few things:

  • Python Language Concepts like data types, control structures etc. as well as a bit more complex concepts like data structures, data manipulation, data conversion, file input and output and so on
  • Code Maintainability Practices like naming conventions, comments and code documentation, modularity etc.
  • Basic Application Design including requirements analysis and clarification
  • Basic User Experience concepts like UI designs, user prompts, input feedback, output formatting etc.
  • Application Reliability including error and exception handling
  • Testing that includes unit and functional testing
  • Troubleshooting that includes debugging and logging
  • Interoperability for interactions with other applications
  • Delivery including packaging and distribution

I have started a Python Expenses Sample Github Project for the application, where I will check-in the code from the weekly classes as well as instructions how to use those.

 

We are looking to hire a few interns for the summer, and this made me thinking what approach should I take to provide great experience for them as well as get out some value for us. The culture that I am trying to establish in my company is based on the premise that software development is more than writing code. I know that this is a overused cliche but if you look around there are thousands of educational institutions and private companies that concentrate on teaching just that – how to write code while neglecting everything else that is involved in software development.

For that purpose I decided to create a crash course of good software development practices and just run our new hires through it. Being involved in quite a few technology projects over the last 20+ years, and having seen lot of failures and successes I have developed my own approach that may or may not fit the latest trends but has served me well. Also, having managed one of the best performing teams in the Microsoft Windows Division during Windows 7 (yes, I can claim that :)) I can say that I also have some experience with building great teams.

So, my goal for our interns is at the end of the summer to be real software developers, and for that experience they will get paid instead spend money. Now, here are the things that I want them to know at the end of the summer:

Team

The team is the most important part of software development. The few important things that I want to teach them are that they need to work together, help each other, solve problems together, and NOT throw things over the fence because this is not their area of responsibility. If they learn  this I will accomplish my job as their mentor (I am kidding 🙂 but yes, I think there are too many broken teams around the world).

As a software developer they are responsible for the product and the customer experience, doesn’t matter whether they write the SQL scripts, or the APIs or the client UI. If there is a problem with the code they need to work with their peers to troubleshoot and solve the problem. If one of their peers has a difficulty implementing something and they know the answer they should help them move to the next level, and not keep it for themselves because they are scared that she or he will take their job.

And one more thing – politics are strictly forbidden! 

Communication

Communication is a key. The first thing I standardize in each one of my projects is the communication channels for the team. And this is not as simple as using Slack for everything but regular meetings, who manages the meetings, where documents are saved, what are the naming conventions for folders, files etc., when to use what tool (Slack, email, others) and so on.

Being able to effectively communicate does not mean strictly defining organizational hierarchies, it means keeping everyone informed and being transparent.

Development Process

As a friend of mine once said: “Try to build a car with agile!” We always jump to the latest and greatest but often forget the good parts from the past. Yes, I will teach them agile – Scrum or Kanban – doesn’t really matter, important is that they feel comfortable with their tasks and are able to deliver. And, don’t forget – there will be design tasks for everything. This brings me to:

Software Design

Software Design is integral part of the software development. There are few parts of the design that I put emphasis on:

  • User Interface (UI) design
    They need to be able to understand what purpose the wire-frames play; what is redlining; when do we need one or the other; what are good UI design patterns and how to find those and so on
  • Systems design
    They need to understand how each individual system interacts with the rest, how each system is configured and how their implementation is impacted
  • Software components design
    Every piece of software has dependencies, and they need to learn how to map those dependencies, how different components interact with each other, and where things can break. Things like libraries, packaging, code repository structure etc. all play role here

Testing

The best way to learn good development practices is to test your own software. Yes, this is correct – eat your own dogfood! And I am not talking about testing just the piece of code you wrote (the so called unit testing) but the whole experience you worked on as well as the whole product.

By learning how better to test their code my developers will not only see the results of their work but next time will be more cognizant of the experience they are developing and how can they improve it.

Build and Deployment

Manually building infrastructure and deploying code is painful and waste of time. I want my software developers to think about automated deployment from the beginning. As a small company we cannot afford to have dedicated Operations team, whose only job is to copy bits from one place to another and build environments.

Using tools like Puppet, Chef, Ansible or Salt is not new to the world but having people manually create virtual machines still seems to very common. Learning how to automate their work will allow them to deliver more in less time and become better performers.

Operations

Operating the application is the final step in the software development. Being able to easily troubleshoot issues and fix bugs is crucial for the success of every development team. Incorporating things like logging, performance metrics, analytical data and so on from the beginning is something I want my developers to learn at the start.

One of the areas I particularly put emphasis on is the Business Insights (BI) part of software development. Incorporating code that collects usage information will not only help them better understand the features they implemented but most importantly will prevent them from writing things nobody uses.

The list above is a very rough plan for the crash course I plan for our interns. As it progresses I will post more details how it goes, what they learned, what tools we use and so on. I started sketching the things on the mindmap above and it is growing pretty fast.

It will be interesting experience 🙂