Security Bug Bounty, the good, the bad, the ugly

About a month ago, like any other morning, I woke up and while still in bed I checked my phone. But unlike other mornings, I had a flood emails from Exceptional about un-handled exceptions in Factor.io. It was clear to me that an intruder was looking for security vulnerabilities.

Luckily this was a white-hat hacker. He later got in touch with me and reported his findings. While there were a few functional bugs as well as some “best practices” we didn’t adhere too, there was only one bug that really should have been fixed. Since we fixed the bug, I sent him a t-shirt.

About a week ago I saw the picture of this white-hat hacker on Facebook wearing the t-shirt. I think this was the first step in unleashing the fury of hackers.

A number of other white-hat hackers saw the opportunity for a free t-shirt and social credit, so they too started hacking away. I got another flood of 20 reports.

One of those hackers suggested I setup an account with HackerOne, which I did as it makes managing bugs and relationships much easier.

I failed to realize that once the Factor.io profile would go live on HackerOne, that a ton of hackers would start hacking away on Factor.io. And surely they did. Over the weekend over 200 new accounts were created.

The Good

  • Two different HTML injection vulnerabilities, though neither had a clear security impact. In both cases a user was able to enter values that would later render the injected HTML on the client. In both cases the user could only attack their own account.
  • A few functional bugs were identified too. For example, the password recovery screen after a user is locked out had a functional bug.
  • Lastly, there were a few decent “best practices” recommendations.
  • I can now say that Factor.io has been sufficiently pen tested.

The Bad

  • There were about 150 issues reported total. Of those 150, only about 5 of them warranted a fix. Another 5 or so were practices that probably should be fixed, but didn’t have any direct security implications. As a result, I had to spend a few days already just managing the weak bugs and not having enough time to work on the higher quality ones.

The Ugly

  • One hacker resorted to name calling because I didn’t accept his bug for the bounty.
  • Another wrote “…please make this public so that other security researchers can have a look at it and know your poor security skills." My retort was "Please don’t be rude”. I had to practice being humble, despite having managed major security initiatives at Microsoft impacting hundreds of thousands of people and millions (if not billions) of dollars.
  • Even though DOS attacks were out of scope for the bounty, there were numerous DOS attacks that took place. As a result, Factor.io did experience a total of 3 hours of downtime over the past few days. Furthermore we had to spend extra money to up our email server plan as we reached our mail limit.

IFTTT/Zapier vs Factor.io

When I describe the basic concept of Factor.io to a developer, they say “Oh, so it’s like IFTTT for Dev and DevOps”. As such, when I talk to developers I call it “IFTTT for Dev and DevOps”, when talking to investors I use Zapier instead of IFTTT as they are more familiar with Zapier. Inevitably they ask: “If you are Zapier for Dev and DevOps, then what is the difference between Zapier and Factor.io”. Here are the key differences between the two services:

  • Delivery: Factor.io is primarily self-host. We have Factor.io Core which is the open-source edition, as well as the Pro version for enterprises. Zapier and IFTTT on the other hand are delivered exclusively as a hosted service and also a mobile app (IFTTT only).
  • To code or not to code: While we make the Factor.io Syntax as easy as possible, defining a workflow (example) requires coding. In fact, once supported a drag-and-drop interface which we replaced in favor of a code-based definition per developer feedback. Zapier on the other hand is specifically designed to never require code.
  • Open or Closed source: Factor.io Core is open source though there is proprietary functionality in Factor.io Pro, Zapier/IFTTT are hosted services (thus closed).
  • Integrations: As you can see from Zapier’s list of integrations, they primarily integrate with social and business cloud apps. Factor.io on the other hand only integrates with developers tools and services like Github, Heroku, SSH, Travis, etc. Furthermore, Factor.io has the ability to integrate with on-prem products like Github Enterprise, TeamCity, Bamboo, and even custom in-house apps.
  • Workflow complexity: As the name suggestions, If-This-Then-That, is designed to have a single trigger tied to a single action. Factor.io on the other hand allows you to define a trigger which executes a sequence of chained if-this-then-thats as well as parallel execution. Furthermore, the Factor.io workflows are implemented as a Ruby-based DSL, enabling devs to use Ruby’s functionality for more complex situations like error handling, conditions, enumerations, etc.
  • Pricing: Zapier’s pricing is a monthly subscription between $0 and $99 which pivots the number of workflows (“zaps”), tasks per month, and availability of premium integrations. Factor.io also has a similar business model for Factor.io Cloud; however, the primary Factor.io business is the Pro offering which is an annual contract at a MUCH higher price point.

The only thing these two really have in common, is the high-level concept of integration using triggers and actions. This is what makes the pitch “IFTTT for Dev and DevOps” so easy to digest. However, the similarities end there as the two have different customers, different use cases, different business models, and overall we are very different kind businesses.

I love and use both Zapier and IFTTT, and hopefully one day Zapier’s and IFTTT’s engineering teams will use Factor.io too.

Security Vulnerabilities from Bug Bounty Program

A couple days we launched the Security Bug Bounty Program. Around 20 different white-hat hackers have been pounding away. Over 40 issues were reported; however, most were non-issues, low-priority, or duplicates. But there were a few that we accepted and rewarded a bounty. Here we will discuss these issues.

Before I go into the list, I want to thank all the hackers out there (and many customers) who have reported various product and security issues. I can confidently say they have made Factor.io a far better product. Thank you!

Autocomplete settings on credit card form

The credit card password field in Factor.io doesn’t set the autocomplete attributes. The issue was reported that autocomplete should be turned off. However, the autocomplete recommendations on jQuery.payment actually say to keep them on except the CVC. We sided with the feedback of Stripe and made the change, but still rewarded the bounty for bringing this to our attention.

Replay of password reset token

When you use the “Password Reset” capability in Factor.io you are sent an email with a reset token. The problem with the token is that can work more than once. The proper implementation is for the token to be one-time-use-only and to have a expiration period as well. This issue is not yet fixed and we are investigating the fix now.

Brute force of password

This is actually a relatively low priority bug. Theoretically someone could brute force the users password. Realistically that is very difficult to do given the 500ish ms processing time of requests and limited number of supported concurrent connections. A random 8 character alphanumeric password would take more than a lifetime.

  • Possible passwords: (26+26+10)^8 = 218,340,105,584,896
  • Password attempt rate: 100 concurrent connections with 0.5 response time = 200 passwords/sec
  • Seconds to find password: 218,340,105,584,896/200 = 1,091,700,527,924
  • Years to find password: ~1441

While this would be very difficult to perform on Factor.io, especially since we are monitoring other metrics that would indicate such an attack, we believed that it was good practice to put in counter measures in place.

As such, we implemented locks on the account. After 4 bad attempts you are warned of being locked out, after the 5th the account is locked. It is automatically unlocked after an hour. It can also be unlocked via email.

Session replay (won’t fix)

Log on to Factor.io, copy your session cookies, log out of Factor.io, paste in your session cookies, and even though you are logged out you are able to get access back into the service. I wrote Why session-replay is a won’t fix bug explaining why we are not planning on fixing this issue. This was by far the most reported issue (had about 10 reports).

Circumvent credit card form

If you try to create a new premium organization in Factor.io you are asked to provide credit card information for that organization. We use Stripe to process the credit card which returns a token from the JS Stripe library which then is passed to the server-side form. Once passed in the server validates that token and upgrades the plan. However, in our case we had a bug. After the form was submitted without a valid credit card the form returned an error due to a bad credit card, but the organization was still created. This was a design failure on our part as some of the validation was placed in the controller in an incorrect sequence. That was just bad design and we placed that validation in the model.

Clickjacking

Create a friendly looking website with a button the user wants to click. But over that website you have an iframe with Factor.io with full transparency. The attacker can force you to take actions on Factor.io without you knowing it. The fix is simple, use the x-frame-options in your HTTP headers to instruct the browser not to use the site in an iframe. We used Twitter’s Secure Headers gem to fix this.

Security Bug Bounty Program

We are excited to announce the Factor.io Security Bug Bounty Program. We’ve had a lot of white-hat hackers hacking away on Factor.io over the past week so we wanted to formalize the process a bit.

Reward: Awesome T-Shirt

These aren’t just any t-shirts. These are vertically integrated (end-to-end manufactured) in the United States. The screen printing was handmade locally here in Portland, OR.

Sorry we don’t have very deep pockets, so we can only reward with swag and credit, but no cash.

Criteria

Not all bugs are created equally. As any developer or security expert will tell you, security is a sliding scale. While remote execution is the gold star vulnerability, most issues are less risky. By defining this criteria we are attempting to draw a line in the sand of bugs that qualify as security vulnerabilities we prioritize with a reward.

  • New issues only - A new issue is any issue that hasn’t already been reported by the community or by internal review. i.e. We aren’t tracking the bug already.
  • Remote elevation of privilege - Bugs that enable the intruder to perform actions that they do not have privilege to perform. However, “remote” means that you must assume that the intruder does not have access to the victims computer.
    • qualified: One recent valid vulnerability was identified where a regular use was able to get a list of admin users (i.e. Factor.io devs).
    • not qualified: Session replay after logout is an example of an unqualified bug.
  • Multiplied DOS - Denial of Service attacks are fair game, however, only if they have a multiplier. By “multiplier”, we mean that a single event can trigger downtime. Any flood attack doesn’t count in this case.
  • Proof of concept - provide a proof of concept, either a video demonstrating the vulnerability, code, or guide. In other words, we are looking for vulnerabilities, not threats.
  • Spoofing - If you can take action on behalf of another user without their consent, you are spoofing their identity. This is very similar to EOP, but to act on behalf of another user, not just an elevated user. Again, the attack must be remote.
  • Information Disclosure - You can disclose PII (personally identifiable information) without knowing their PII beforehand. That is, getting a list of users is fair game; however, finding out that they have an account by providing the email address (e.g. reset password), is out-of-play.
  • Scope - Only https://factor.io/ is in scope for the bounty.
  • Final Judgement - While we tried to define the criteria explicitly, I want to reserve the right to make the final judgement whether the bug qualifies.

Reporting issues

To report an issue email security@factor.io

Feedback?

This is just a “stake in the ground” of a program. I welcome any feedback to the criteria.

Build your own chat bot with Gitter.im

Per request from our customers we’ve added support for Gitter.im, the Github-centric chat service. And now you can build your own chat bot with Gitter.im.

What is Gitter.im?

Services like Hipchat and Campfire are popular amongst developers; however, they are designed to be generic team chat services used by all sorts of teams. Gitter.im is a little different. While it provides the general chat capabilities you would expect out of the client, it has some awesome developer-specific features you won’t find anywhere else. For example, you can setup chat rooms around repos, reference commits in your messages, and you can even include Markdown in your message.

What can you do with Gitter.im + Factor.io

Now that we’ve added support to Gitter.im this means you can integrate Gitter with the rest of your development workflow. For example, you can type in “deploy qa” in your Gitter channel for a given repo to automatically deploy the qa branch to the QA environment. Here is a Gist to do exactly that with Github and Heorku.

To get started:

  1. Sign up
  2. Click on “Service” and activate Github, Heroku, and Gitter services
  3. Click “Create Workflow” and select “Custom Workflow”
  4. In the definition paste in the contents of this gist. Replace the appropriate values. And save the repo.
  5. Now in the Gitter.im room you should be able to type in “deploy master” to deploy to the appropriate environment.

Cool, huh? With the Gitter.im Service integration you can received messages as well as send messages to your room. You can also specify regular expression filters in the filter option to filter for specific matches and pull out values. In other word, you can create your own chat bot capable of pulling code form Github/BitBucket/GitLab, compiling with Jekyll/Middleman, and deploying to a server via SSH, Heroku, or BitBalloon.

Happy hacking

Docker Garbage Collector

We use Docker extensively at Factor.io to power two different components. First, it is used to host Factor.io workflows. Secondly, it is used an isolated build environment for when you include things like Middleman or Jekyll builds in your workflows.

In each of the cases we manage the docker containers and images to start, stop, and cleanup after themselves. However, sometimes bugs or deployments prevent cleanup at times. As such, we created a Docker Garbage Collector for Factor.io powered by Factor.io (we like meta).

Here is the workflow:

listen 'timer','every',minutes:10 do |timer_info|
  remove_stopped = 'sudo docker rm $(sudo docker ps -a -q)'
  remove_untagged = 'sudo docker rmi $(sudo docker images -a | grep "^<none>" | awk \'{print $3}\')'
  commands=[remove_stopped,remove_untagged]
  run 'ssh','execute',commands:commands, host:'docker-01.factor.io', username:'ubuntu'
end

As you can see, this runs two different commands, one to remove stopped containers, and the second to remove untagged images. For this example we run it on the server ‘docker-01.factor.io. However, you could just as well use the AWS or Rackspace connectors to list all your Docker servers dynamically so you don’t need to hard-coded those addresses.

In some way this is a glorified cron. Unlike cron, this is fully hosted and doesn’t require you to configure the cron job on the server. It also can run on multiple servers (even in parallel). And lastly, you get the log output of these commands across all the servers in real-time.

If you want to try this out, just create a new workflow in Factor.io and go create a “Custom Workflow” and paste in the code block provided.

New Mini Feature: fixed Web Hook URIs

Now you can create a new Web Hook listener in Factor.io that listens on a fixed URI that looks something like https://connector.factor.io/v0.3/2/my-hook.

If you ever used the web hooks listener in Factor.io you may have noticed that the URL generated for the web hook is dynamically defined. This means that every time the workflow starts it gets a new URI. Every time Factor.io redeploys one of a couple components the workflow may restart; this ends up happening about twice a day. This can be a problem when you configure other tools to try to hit that web hook.

Now you can create a web hook like this and specify the ‘id’ parameter.

listen 'web', 'hook', id:'my-id' do |hook_info|
  info "received hook"
end

This will create two different web hooks. One for this very specific instance, and another at /v0.3/[user_id]/[hook_id], where user_id is your user_id and hook_id is the value specified in ‘hook’ when you created the workflow. This second “fixed” URL can be used by multiple workflows. When you hit that address, all of your workflows listening on that address will be triggered.

Deploy from a git tag with Capistrano 3

Do you use Capistrano and Github? Using git tags is a great way to tag your code for deployment.

I recently read “Deploy from a Git tag with Capistrano" by Nathan Hoad. Which was a great, but it was designed for Capistrano pre-v3. As such, I’d like to update that blog post for Capistrano 3.

Using git tag a new release
git tag -a 02.16.2014.01
You can list your tags like this
git tag
02.12.2014.01
02.16.2014.01
Now make sure you push the tags to the remote repository.
git push origin --tags
Now in your Capistrano deploy script. Note, this is new for V3
ask :branch, proc{`git tag`.split("\n").last}
This replaces the set :branch, 'master' code you had before. This syntax is MUCH shorter than the pre v3.

On deployment, this will ask “Please enter branch: |02.16.2014.01|”.

Bonus. You can use Factor.io to automatically kick off deployments when your code is tagged and pushed to Github, Bitbucket or Gitlab.

Better than cron

Cron was first introduced in Version 7 Unix in 1979. Being 35 years old, it is no surprise that it is the de facto standard as a time-based scheduler. But there are drawbacks when it comes to it’s usage in highly distributed and highly agile environments.

Why not cron?

  • Logs are either dumped to you in email, hidden away on disk, or lost completely.
  • No native way of running remote jobs.
  • Making any changes requires SSHing into a box or a sequence of steps to get it updated through configuration management systems (e.g. Chef/Puppet).
  • Difficult to manage across numerous systems.

Why Factor.io is better

  • Central place to define the “crontab”
  • No need to SSH into boxes to get started
  • No need to make changes to code, checking-in, deploying, just to update a time period or command
  • Run on multiple servers at once
  • Get all the logs in a single place without logging into the servers
  • Super easy to setup
  • Self documenting.

Factor.io isn’t perfect… yet!

The current model for defining workflows using the drag-and-drop interface doesn’t allow for conditions. This means you can’t define actions on different conditions (e.g. pass/fail). The good news is that the release of the new model for defining workflows is just around the corner. You’ll be able to programmatically define workflows with conditions and all the other great stuff Ruby DSL provides.

Deploying using Capistrano, roles and tags

We use Chef for provisioning server, Capistrano for deployments, and most of our code is written in Ruby.

We wanted to fully automate `cap production deploy` so that the servers would be self-identifying. Here is how we did it.

First, we used server tags in AWS and Chef to define the role of a given server during the provisioning process. This is how you can tag a server using knife ec2 server create. More docs from Chef here.

knife ec2 server create --tags role=factord ...

Secondly, we setup our production.rb file in Capistrano like this…

This is using the Fog gem to get the list of servers from AWS. We then identify the role of the server using the tags we defined in the previous step. We use the “role” directive in Capistrano to identify the server and it’s corresponding role. And that’s it. Now when we run `cap production deploy` it gets the current list of running servers and tags each server appropriately.