Transparent HTTP proxy with filter by user agent using golang

This is day 10 out of 100 Days of golang coding!

Idea for today is to build transparent http proxy with ability to filter traffic. As for a filter I will use user agent which is common practice to filter traffic.

So, for the task I have used two go modules: user_agent and goproxy:

go get github.com/mssola/user_agent
go get gopkg.in/elazarl/goproxy.v1

First I have to setup a proxy and set a decide which hosts I want to match:

proxy := goproxy.NewProxyHttpServer()
proxy.Verbose = true
proxy.OnRequest(goproxy.ReqHostMatches(regexp.MustCompile("^.*$"))).DoFunc(
// skipped function body for now
)
log.Fatal(http.ListenAndServe(":8080", proxy))

Regex “^.*$” is set to match all hosts.

Second I setup user agent parser and filter by bot and browser:

func(r *http.Request, ctx *goproxy.ProxyCtx) (*http.Request, *http.Response) {
  //parse user agent string
  ua := user_agent.New(r.UserAgent())
  bro_name, _ := ua.Browser()
  if ua.Bot() || bro_name == "curl" {
    return r, goproxy.NewResponse(r,
      goproxy.ContentTypeText, http.StatusForbidden,
      "Don't waste your time!")
  }
  return r, nil
}

That’s all for coding. Now take a look at test cases:

Use case 1 – curl command no user agent set

Use case 2 – curl with normal browser user agent

http_proxy=http://127.0.0.1:8080 curl -i -H"User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" http://twitter.com

In that case we have got requests from twitter server which was 301 to https.

Use case 3 – curl with bot user agent

http_proxy=http://127.0.0.1:8080 curl -i -H"User-Agent:Googlebot" http://twitter.com

Full version of proxy with verbose information of user agent parsing:

package main

import (
	"fmt"
	"log"
	"net/http"
	"regexp"

	"github.com/mssola/user_agent"
	"gopkg.in/elazarl/goproxy.v1"
)

func main() {
	proxy := goproxy.NewProxyHttpServer()
	proxy.Verbose = true
	proxy.OnRequest(goproxy.ReqHostMatches(regexp.MustCompile("^.*$"))).DoFunc(
		func(r *http.Request, ctx *goproxy.ProxyCtx) (*http.Request, *http.Response) {
			//parse user agent string
			ua := user_agent.New(r.UserAgent())
			fmt.Printf("Is mobile: %v\n", ua.Mobile()) // => false
			fmt.Printf("Is bot: %v\n", ua.Bot())       // => false
			fmt.Printf("Mozilla: %v\n", ua.Mozilla())  // => "5.0"

			fmt.Printf("Platform: %v\n", ua.Platform()) // => "X11"
			fmt.Printf("OS: %v\n", ua.OS())             // => "Linux x86_64"

			nameE, versionE := ua.Engine()
			fmt.Printf("Engine: %v\n", nameE)            // => "AppleWebKit"
			fmt.Printf("Engine version: %v\n", versionE) // => "537.11"

			nameB, versionB := ua.Browser()
			fmt.Printf("Browser: %v\n", nameB)            // => "Chrome"
			fmt.Printf("Browser version: %v\n", versionB) // => "23.0.1271.97"

			if ua.Bot() || nameB == "curl" {
				return r, goproxy.NewResponse(r,
					goproxy.ContentTypeText, http.StatusForbidden,
					"Don't waste your time!")
			}
			return r, nil
		})
	log.Fatal(http.ListenAndServe(":8080", proxy))
}

 

Source code available at GitHub.