Transparent HTTP proxy with filter by user agent using golang

This is day 10 out of 100 Days of golang coding!

Idea for today is to build transparent http proxy with ability to filter traffic. As for a filter I will use user agent which is common practice to filter traffic.

So, for the task I have used two go modules: user_agent and goproxy:

go get github.com/mssola/user_agent
go get gopkg.in/elazarl/goproxy.v1

First I have to setup a proxy and set a decide which hosts I want to match:

proxy := goproxy.NewProxyHttpServer()
proxy.Verbose = true
proxy.OnRequest(goproxy.ReqHostMatches(regexp.MustCompile("^.*$"))).DoFunc(
// skipped function body for now
)
log.Fatal(http.ListenAndServe(":8080", proxy))

Regex “^.*$” is set to match all hosts.

Second I setup user agent parser and filter by bot and browser:

func(r *http.Request, ctx *goproxy.ProxyCtx) (*http.Request, *http.Response) {
  //parse user agent string
  ua := user_agent.New(r.UserAgent())
  bro_name, _ := ua.Browser()
  if ua.Bot() || bro_name == "curl" {
    return r, goproxy.NewResponse(r,
      goproxy.ContentTypeText, http.StatusForbidden,
      "Don't waste your time!")
  }
  return r, nil
}

That’s all for coding. Now take a look at test cases:

Use case 1 – curl command no user agent set

Use case 2 – curl with normal browser user agent

http_proxy=http://127.0.0.1:8080 curl -i -H"User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" http://twitter.com

In that case we have got requests from twitter server which was 301 to https.

Use case 3 – curl with bot user agent

http_proxy=http://127.0.0.1:8080 curl -i -H"User-Agent:Googlebot" http://twitter.com

Full version of proxy with verbose information of user agent parsing:

package main

import (
	"fmt"
	"log"
	"net/http"
	"regexp"

	"github.com/mssola/user_agent"
	"gopkg.in/elazarl/goproxy.v1"
)

func main() {
	proxy := goproxy.NewProxyHttpServer()
	proxy.Verbose = true
	proxy.OnRequest(goproxy.ReqHostMatches(regexp.MustCompile("^.*$"))).DoFunc(
		func(r *http.Request, ctx *goproxy.ProxyCtx) (*http.Request, *http.Response) {
			//parse user agent string
			ua := user_agent.New(r.UserAgent())
			fmt.Printf("Is mobile: %v\n", ua.Mobile()) // => false
			fmt.Printf("Is bot: %v\n", ua.Bot())       // => false
			fmt.Printf("Mozilla: %v\n", ua.Mozilla())  // => "5.0"

			fmt.Printf("Platform: %v\n", ua.Platform()) // => "X11"
			fmt.Printf("OS: %v\n", ua.OS())             // => "Linux x86_64"

			nameE, versionE := ua.Engine()
			fmt.Printf("Engine: %v\n", nameE)            // => "AppleWebKit"
			fmt.Printf("Engine version: %v\n", versionE) // => "537.11"

			nameB, versionB := ua.Browser()
			fmt.Printf("Browser: %v\n", nameB)            // => "Chrome"
			fmt.Printf("Browser version: %v\n", versionB) // => "23.0.1271.97"

			if ua.Bot() || nameB == "curl" {
				return r, goproxy.NewResponse(r,
					goproxy.ContentTypeText, http.StatusForbidden,
					"Don't waste your time!")
			}
			return r, nil
		})
	log.Fatal(http.ListenAndServe(":8080", proxy))
}

 

Source code available at GitHub.

Count words frequency reading standard input with go

Hi guys,
today is day 8 out of 100 days of code.

The task for today is to count words frequency reading data from stdin.  Input is scanned using bufio *Scanner with a split by words(ScanWords).
Lets see the code:

package main

import (
	"bufio"
	"fmt"
	"os"
)

func main() {
	//map to store words frequency
	words := make(map[string]int)
	//we will read from file
	in := bufio.NewReader(os.Stdin)
	scanner := bufio.NewScanner(in)
	//we ask scanner to split input by words for us
	scanner.Split(bufio.ScanWords)
	count := 0
	//scan the inpurt
	for scanner.Scan() {
		//get input token - in our case a word and update it's frequence
		words[scanner.Text()]++
		count++
	}
	if err := scanner.Err(); err != nil {
		fmt.Fprintln(os.Stderr, "reading input:", err)
	}
	fmt.Printf("Total words: %d\n", count)
	fmt.Printf("Words frequency: \n")
	//todo: sort words by values for nice print
	for k, v := range words {
		fmt.Printf("%s:%d\n", k, v)
	}
}

Results are stored in a map words. The property of map that it doesn’t have defined order of keys, so the bonus here is to sort words map by frequency of words(by values).

See source code at GitHub

Simple example of packages and imports in golang

This is day 7 out of 100 days of golang coding.

By requests of my friends this post is about packages, files and imports.

Packages in go is the same as modules and libraries in other languages. To make a package you need two decide two things: package path and package name.

In my example package name is helloworld and because it is day7 it is inside day7 folder. Full path looks like: github.com/vorozhko/go-tutor/day7/helloworld.
Full path is calculated based on $GOPATH variable which is ~/go by default, but can be set to any other in .bashrc file.

All files with the same package name will belong to one package and will share private and public variable like it is one single file. To make variable or function public it’s name must start with capital letter. Lets see an example:

hello.go

 

// Package helloworld - test example
package helloworld

import "fmt"

// prefix - define default prefix 
const prefix = "Hello World!"

// say - print string
func say(str string) {
	fmt.Print(str)
}

world.go

// Package helloworld - test example
package helloworld

// SayName print str text with predefined prefix
// prefix and say is visible inside helloworld package 
func SayName(str string) {
	say(prefix + str)
}

Function ‘say’ and constant  ‘prefix’ only visible inside package helloworld.
Function SayName will be visible outside of the package.

You can import new package from any place in your workspace using following code:

main.go

package main

import "github.com/vorozhko/go-tutor/day7/helloworld"

func main() {
	helloworld.SayName(" - test")
}

To import a package you have to use full path to the package relatively to your $GOPATH folder.
From here we can call helloworld.SayName, because it is visible outside of the package.

To manage imports in your go files it is recommended to use goimports tool. It will automatically insert package declaration in import as necessary. It also supported by many code editors.

 

very basic web crawler on golang

Hi guys,

this is day 6 out 100 days of code on go lang.

Following code implement recursive fetching of internal links on given web page.

What I want to do next is to add goroutines in action, because fetching process is nice to have run in parallel. Any suggestion how to do it?

package main

//todo:
// - every link must be visited only once [done]
// - keep a map of visited links [done]
// - fix links and page concatination [done]
// - extract domain from request uri to simplify crawling [done]
// - rework how internal links are selected for crawling
// - fix how crawling settings are set like depth and maxLinks

import (
	"flag"
	"fmt"
	"io"
	"log"
	"net/http"
	"time"

	"golang.org/x/net/html"
)

//hash of visited links to prevent double visit
var visitedLinks map[string]bool
var baseURL = flag.String("url", "", "start url")

func main() {
	visitedLinks = make(map[string]bool)
	flag.Parse()

	if *baseURL == "" {
		log.Fatal("--url paramters is required")
	}

	visitedLinks[*baseURL] = false

	//set parameters for crawling
	crawl("/")
}

func crawl(link string) {
	//check if link already visited
	if visitedLinks[link] {
		return
	}
	//set link as visited
	visitedLinks[link] = true
	fmt.Printf("Crawling %s ..................\n\n", *baseURL+link)
	resp, err := http.Get(*baseURL + link)
	if err != nil {
		log.Fatal(err)
	}
	defer resp.Body.Close()

	linkCounter := 0
	for _, href := range getLinks(resp.Body) {
		//todo: rework how links are selected
		if len(href) > 0 && string(href[0]) == "/" && // only internal links
			href != link { //skip current page
			if len(href) > 1 && href[1] == '/' { //skip external links which start with //
				continue
			}
			linkCounter++
			//fmt.Printf("Found: %s\n", href)
			crawl(href)
			time.Sleep(time.Second * 1)
		}
	}
}

//Collect all links from response body and return it as an array of strings
func getLinks(body io.Reader) []string {
	var links []string
	z := html.NewTokenizer(body)
	for {
		tt := z.Next()

		switch tt {
		case html.ErrorToken:
			return links
		case html.StartTagToken, html.EndTagToken:
			token := z.Token()
			if "a" == token.Data {
				for _, attr := range token.Attr {
					if attr.Key == "href" {
						links = append(links, attr.Val)
					}

				}
			}

		}
	}
}

Source code on github

Get all links from html page with go lang

Hi guys,
this is day 5 out of 100 days of code!

Today I have coded html href links parser. This is part of web crawler project about which I will post in following days.

package main

import (
	"io"
	"log"
	"net/http"
        "fmt"

	"golang.org/x/net/html"
)

func main() {
    resp, err := http.Get("https://golang.org/")
    if err != nil {
        log.Fatal(err)
    }
    for _, v := range getLinks(resp.Body) {
        fmt.Println(v)
    }
}

//Collect all links from response body and return it as an array of strings
func getLinks(body io.Reader) []string {
	var links []string
	z := html.NewTokenizer(body)
	for {
		tt := z.Next()

		switch tt {
		case html.ErrorToken:
			//todo: links list shoudn't contain duplicates
			return links
		case html.StartTagToken, html.EndTagToken:
			token := z.Token()
			if "a" == token.Data {
				for _, attr := range token.Attr {
					if attr.Key == "href" {
						links = append(links, attr.Val)
					}

				}
			}

		}
	}
}

checking current time with go lang channels

This is day 4 out of 100 days of code.
Two code snippets today. First is very simple go channel.
It get current system time and post it to a channel then on other end message retrieved out of the channel and printed.

package main

import (
	"fmt"
	"time"
)

func main() {
	messages := make(chan string)
	go func() {
		for {
			messages <- fmt.Sprintf("Time now %s\n", time.Now())
			time.Sleep(100 * time.Millisecond)
		}
	}()
	for {
		msg := <-messages
		fmt.Println(msg)
	}
}

Second code snippet is a program which check two strings for anagram. When one string is equal reverse of other it is anagram.

package main

import (
	"flag"
	"fmt"
)

func main() {
	str1 := flag.String("first", "", "first string for anagram check")
	str2 := flag.String("second", "", "second string for anagram check")
	flag.Parse()
	checkForAnagrams(*str1, *str2)
}

func checkForAnagrams(str1, str2 string) {
	if len(str1) != len(str2) {
		fmt.Printf("%s and %s are not anagrams", str1, str2)
		return
	}

	for i := 0; i < len(str1); i++ {
		if str1[i] != str2[len(str2)-i-1] {
			fmt.Printf("%s and %s are not anagrams", str1, str2)
			return
		}
	}
	fmt.Printf("%s and %s are anagrams", str1, str2)
}

 

Recursive search through tree of files with golang

This is day 3 out of 100 days of code in go lang.

This code snippet does recursive search through directory tree. One improvement which can be made is to replace custom recursive function readFiles with filepath.Walk

Full code:

package main

import (
	"flag"
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"os"
	"strings"
)

var path = flag.String("path", "", "file path to search in")
var search = flag.String("search", "", "search string to look for")

func main() {
	flag.Parse()
	fi, err := os.Stat(*path)
	if err != nil {
		log.Fatal(err)
	}
	//fix path if directory
	if fi.Mode().IsDir() {
		*path = strings.TrimRight(*path, "/") + "/"
		readFiles(*path, *search)
	} else {
		log.Fatal("path must be a directory, but file was provided: ", *path)
	}
}

func readFiles(path, search string) {
	files, err := ioutil.ReadDir(path)
	if err != nil {
		log.Fatal(err)
	}
	for _, file := range files {
		fullpath := path + file.Name()
		if file.Mode().IsDir() {
			readFiles(fullpath+"/", search)
		} else if file.Mode().IsRegular() {
			searchInFile(fullpath, search)
		}
	}
}

func searchInFile(fullpath, search string) {
	data, err := ioutil.ReadFile(fullpath)
	if err != nil {
		log.Fatal(err)
	}
	//need to check for file type to detect filter off non-text files
	fileType := http.DetectContentType(data)
	if strings.Index(fileType, "text") == -1 {
		//skip all non text files
		return
	}
	for _, line := range strings.Split(string(data), "\n") {
		if strings.Index(line, search) > -1 {
			fmt.Printf("%s: %s\n", fullpath, line)
		}
	}
}

Source code at Github

very simple grep tool in GO – search substring in files

Hi guys,
here is day 2 of 100 days of code.

This time it is very simple implementation of grep tool. What it does it search string in specific directory and report matching lines.
What have I learned is that strings.Index is very useful to work with text and http.DetectContentType was the way to detect binary files.

Full code:

package main

import (
	"flag"
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"strings"
)

func main() {
	path := flag.String("path", "", "file path to search in")
	search := flag.String("search", "", "search string to look for")
	flag.Parse()

	files, err := ioutil.ReadDir(*path)
	if err != nil {
		log.Fatal(err)
	}
	for _, file := range files {
		if file.Mode().IsDir() {
			//to do: traverse directories recursively
			continue
		}
		data, err := ioutil.ReadFile(file.Name())
		if err != nil {
			log.Fatal(err)
		}
		//need to check for file type to detect binary content
		fileType := http.DetectContentType(data)
		for _, line := range strings.Split(string(data), "\n") {
			if strings.Index(line, *search) > -1 {
				if strings.Index(fileType, "text/plain") > -1 {
					fmt.Printf("%s: %s\n", file.Name(), line)
				} else {
					//best guess it is binary file
					//no need to go through all lines in binary file
					fmt.Printf("Binary file %s matches\n", file.Name())
					break
				}
			}
		}
	}
}

Source code at Github

Simple example of golang os.Args and flag module

Hi all,
this is day 1 of 100 days I am coding one go program a day. Wish me good luck!

In this example I will show how to parse command line arguments with flag module and os.Args array.

Flag module is very powerful and can be as simple as this:

//...
import "flag"
//...
name := flag.String("user", "", "user name")
flag.Parse()
fmt.Printf("user name is %s\n", *name)

To have the “same” functionality with os.Args you have to do some work. In following example I added support for –user, -user and user=<value> arguments.

//...
import "os"
//...
var userArg string
for index, arg := range os.Args {
	pattern := "-user="
	x := strings.Index(arg, pattern)
	if x > -1 {
		userArg = arg[x+len(pattern):]
		continue
	}
	if arg == "-user" || arg == "--user" {
		userArg = os.Args[index+1]
		continue
	}
}
fmt.Printf("user name is %s", userArg)

Full code:

package main

import (
	"flag"
	"fmt"
	"os"
	"strings"
)

func main() {
	name := flag.String("user", "", "user name")
	flag.Parse()
	fmt.Printf("user name is %s\n", *name)

	var userArg string
	for index, arg := range os.Args {
		pattern := "-user="
		x := strings.Index(arg, pattern)
		if x > -1 {
			userArg = arg[x+len(pattern):]
			continue
		}
		if arg == "-user" || arg == "--user" {
			userArg = os.Args[index+1]
			continue
		}
	}
	fmt.Printf("user name is %s", userArg)
}

Latest code examples see on GitHub

Continuous integration and deployment with Google Cloud Builder and Kubernetes

Pipeline of Continuous Integration(CI) for containers has several basic steps. Lets see what they are:

Setup a trigger

Listen to a change in repositories(github, bitbucket) such like pull request, new tag or new branch.

It is basic step for any CI/CD tool and for google cloud builder it is pretty trivial task to setup. Check out Container Registry – Build Triggers tool in google cloud console.

Build an image

When change to repository occur we want to start build of new Docker container image for a change. Good practice is to tag new image with branch name and git reference hash. E.g. master-00covfefe

With cloud builder you face two choices: use a Dockerfile or cloudbuild.yaml file. With Dockerfile option steps are predetermined and don’t give you too much flexibility.
With cloudbuild.yaml you can customise every step of your pipeline.
In the following example first command is doing a build step using Dockerfile and second command tag new image with branch-revision pattern(remember master-00covfefe):

steps:
- name: 'gcr.io/cloud-builders/docker'
  args: [ 'build', '-t', 'eu.gcr.io/$PROJECT_ID/my-nodejs-app', '.' ]

- name: 'gcr.io/cloud-builders/docker'
  args: [ 'tag', 'eu.gcr.io/$PROJECT_ID/my-nodejs-app', 'eu.gcr.io/$PROJECT_ID/my-nodejs-app:$BRANCH_NAME-$REVISION_ID']

Push new image to Container Registry

One important note that cloudbuild.yaml file has special directive “image” which publish image to registry, but that directive only executed at the end of all steps. So, in order to perform deployment step you need to publish image as a separate step.

- name: 'gcr.io/cloud-builders/docker'
  args: ['push', 'eu.gcr.io/$PROJECT_ID/my-nodejs-app:$BRANCH_NAME-$REVISION_ID']

Deploy new image to Kubernetes

When new image is in registry it’s time to trigger deployment step. In this example it is deployment to Kubernetes cluster.
This step require Google Cloud Builder user to have Edit permissions to kubernetes cluster. In google cloud it is a user with “@cloudbuild.gserviceaccount.com” domain. You need to give that user Edit access to kubernetes using IAM console.
Second requirement is to specify zone and cluster cloudbuild.yaml using env variables. That will tell kubectl command to which cluster to deploy.

- name: 'gcr.io/cloud-builders/kubectl'
  args: ['set', 'image', 'deployment/my-nodejs-app-deployment', 'my-nodejs-app=eu.gcr.io/$PROJECT_ID/my-nodejs-app:$BRANCH_NAME-$REVISION_ID']
  env:
  - 'CLOUDSDK_COMPUTE_ZONE=europe-west1-d'
  - 'CLOUDSDK_CONTAINER_CLUSTER=staging-cluster'

What next

At this point the CI/CD job is done. Possible next steps to improve your pipeline can be:

  1. Send notification to Slack or Hipchat to let everyone know about new version deployment.
  2. Run user acceptance tests to check that all functions perform well.
  3. Run load tests and stress tests to check that new version has no degradation in performance.

Full cloudbuild.yaml file example

steps:
#build steps
- name: 'gcr.io/cloud-builders/docker'
  args: [ 'build', '-t', 'eu.gcr.io/$PROJECT_ID/my-nodejs-app', '.' ]

- name: 'gcr.io/cloud-builders/docker'
  args: [ 'tag', 'eu.gcr.io/$PROJECT_ID/my-nodejs-app', 'eu.gcr.io/$PROJECT_ID/my-nodejs-app:$BRANCH_NAME-$REVISION_ID']

- name: 'gcr.io/cloud-builders/docker'
  args: ['push', 'eu.gcr.io/$PROJECT_ID/my-nodejs-app:$BRANCH_NAME-$REVISION_ID']

#deployment step
- name: 'gcr.io/cloud-builders/kubectl'
  args: ['set', 'image', 'deployment/my-nodejs-app-deployment', 'my-nodejs-app=eu.gcr.io/$PROJECT_ID/my-nodejs-app:$BRANCH_NAME-$REVISION_ID']
  env:
  - 'CLOUDSDK_COMPUTE_ZONE=europe-west1-d'
  - 'CLOUDSDK_CONTAINER_CLUSTER=staging-cluster'

#image update steps(two tags: latest and branch-revision)
images:
- 'eu.gcr.io/$PROJECT_ID/my-nodejs-app'
- 'eu.gcr.io/$PROJECT_ID/my-nodejs-app:$BRANCH_NAME-$REVISION_ID'

#tags for container builder
tags:
  - "frontend"
  - "nodejs"
  - "dev-team-1"