This is day 10 out of 100 Days of golang coding!
Idea for today is to build transparent http proxy with ability to filter traffic. As for a filter I will use user agent which is common practice to filter traffic.
So, for the task I have used two go modules: user_agent and goproxy:
go get github.com/mssola/user_agent
go get gopkg.in/elazarl/goproxy.v1
First I have to setup a proxy and set a decide which hosts I want to match:
proxy := goproxy.NewProxyHttpServer()
proxy.Verbose = true
proxy.OnRequest(goproxy.ReqHostMatches(regexp.MustCompile("^.*$"))).DoFunc(
// skipped function body for now
)
log.Fatal(http.ListenAndServe(":8080", proxy))
Regex “^.*$” is set to match all hosts.
Second I setup user agent parser and filter by bot and browser:
func(r *http.Request, ctx *goproxy.ProxyCtx) (*http.Request, *http.Response) {
//parse user agent string
ua := user_agent.New(r.UserAgent())
bro_name, _ := ua.Browser()
if ua.Bot() || bro_name == "curl" {
return r, goproxy.NewResponse(r,
goproxy.ContentTypeText, http.StatusForbidden,
"Don't waste your time!")
}
return r, nil
}
That’s all for coding. Now take a look at test cases:
Use case 1 – curl command no user agent set
Use case 2 – curl with normal browser user agent
http_proxy=http://127.0.0.1:8080 curl -i -H"User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" http://twitter.com
In that case we have got requests from twitter server which was 301 to https.
Use case 3 – curl with bot user agent
http_proxy=http://127.0.0.1:8080 curl -i -H"User-Agent:Googlebot" http://twitter.com
Full version of proxy with verbose information of user agent parsing:
package main
import (
"fmt"
"log"
"net/http"
"regexp"
"github.com/mssola/user_agent"
"gopkg.in/elazarl/goproxy.v1"
)
func main() {
proxy := goproxy.NewProxyHttpServer()
proxy.Verbose = true
proxy.OnRequest(goproxy.ReqHostMatches(regexp.MustCompile("^.*$"))).DoFunc(
func(r *http.Request, ctx *goproxy.ProxyCtx) (*http.Request, *http.Response) {
//parse user agent string
ua := user_agent.New(r.UserAgent())
fmt.Printf("Is mobile: %v\n", ua.Mobile()) // => false
fmt.Printf("Is bot: %v\n", ua.Bot()) // => false
fmt.Printf("Mozilla: %v\n", ua.Mozilla()) // => "5.0"
fmt.Printf("Platform: %v\n", ua.Platform()) // => "X11"
fmt.Printf("OS: %v\n", ua.OS()) // => "Linux x86_64"
nameE, versionE := ua.Engine()
fmt.Printf("Engine: %v\n", nameE) // => "AppleWebKit"
fmt.Printf("Engine version: %v\n", versionE) // => "537.11"
nameB, versionB := ua.Browser()
fmt.Printf("Browser: %v\n", nameB) // => "Chrome"
fmt.Printf("Browser version: %v\n", versionB) // => "23.0.1271.97"
if ua.Bot() || nameB == "curl" {
return r, goproxy.NewResponse(r,
goproxy.ContentTypeText, http.StatusForbidden,
"Don't waste your time!")
}
return r, nil
})
log.Fatal(http.ListenAndServe(":8080", proxy))
}
Source code available at GitHub.