Count words frequency reading standard input with go

Hi guys,
today is day 8 out of 100 days of code.

The task for today is to count words frequency reading data from stdin.  Input is scanned using bufio *Scanner with a split by words(ScanWords).
Lets see the code:

package main

import (
	"bufio"
	"fmt"
	"os"
)

func main() {
	//map to store words frequency
	words := make(map[string]int)
	//we will read from file
	in := bufio.NewReader(os.Stdin)
	scanner := bufio.NewScanner(in)
	//we ask scanner to split input by words for us
	scanner.Split(bufio.ScanWords)
	count := 0
	//scan the inpurt
	for scanner.Scan() {
		//get input token - in our case a word and update it's frequence
		words[scanner.Text()]++
		count++
	}
	if err := scanner.Err(); err != nil {
		fmt.Fprintln(os.Stderr, "reading input:", err)
	}
	fmt.Printf("Total words: %d\n", count)
	fmt.Printf("Words frequency: \n")
	//todo: sort words by values for nice print
	for k, v := range words {
		fmt.Printf("%s:%d\n", k, v)
	}
}

Results are stored in a map words. The property of map that it doesn’t have defined order of keys, so the bonus here is to sort words map by frequency of words(by values).

See source code at GitHub

Simple example of packages and imports in golang

This is day 7 out of 100 days of golang coding.

By requests of my friends this post is about packages, files and imports.

Packages in go is the same as modules and libraries in other languages. To make a package you need two decide two things: package path and package name.

In my example package name is helloworld and because it is day7 it is inside day7 folder. Full path looks like: github.com/vorozhko/go-tutor/day7/helloworld.
Full path is calculated based on $GOPATH variable which is ~/go by default, but can be set to any other in .bashrc file.

All files with the same package name will belong to one package and will share private and public variable like it is one single file. To make variable or function public it’s name must start with capital letter. Lets see an example:

hello.go

 

// Package helloworld - test example
package helloworld

import "fmt"

// prefix - define default prefix 
const prefix = "Hello World!"

// say - print string
func say(str string) {
	fmt.Print(str)
}

world.go

// Package helloworld - test example
package helloworld

// SayName print str text with predefined prefix
// prefix and say is visible inside helloworld package 
func SayName(str string) {
	say(prefix + str)
}

Function ‘say’ and constant  ‘prefix’ only visible inside package helloworld.
Function SayName will be visible outside of the package.

You can import new package from any place in your workspace using following code:

main.go

package main

import "github.com/vorozhko/go-tutor/day7/helloworld"

func main() {
	helloworld.SayName(" - test")
}

To import a package you have to use full path to the package relatively to your $GOPATH folder.
From here we can call helloworld.SayName, because it is visible outside of the package.

To manage imports in your go files it is recommended to use goimports tool. It will automatically insert package declaration in import as necessary. It also supported by many code editors.

 

very basic web crawler on golang

Hi guys,

this is day 6 out 100 days of code on go lang.

Following code implement recursive fetching of internal links on given web page.

What I want to do next is to add goroutines in action, because fetching process is nice to have run in parallel. Any suggestion how to do it?

package main

//todo:
// - every link must be visited only once [done]
// - keep a map of visited links [done]
// - fix links and page concatination [done]
// - extract domain from request uri to simplify crawling [done]
// - rework how internal links are selected for crawling
// - fix how crawling settings are set like depth and maxLinks

import (
	"flag"
	"fmt"
	"io"
	"log"
	"net/http"
	"time"

	"golang.org/x/net/html"
)

//hash of visited links to prevent double visit
var visitedLinks map[string]bool
var baseURL = flag.String("url", "", "start url")

func main() {
	visitedLinks = make(map[string]bool)
	flag.Parse()

	if *baseURL == "" {
		log.Fatal("--url paramters is required")
	}

	visitedLinks[*baseURL] = false

	//set parameters for crawling
	crawl("/")
}

func crawl(link string) {
	//check if link already visited
	if visitedLinks[link] {
		return
	}
	//set link as visited
	visitedLinks[link] = true
	fmt.Printf("Crawling %s ..................\n\n", *baseURL+link)
	resp, err := http.Get(*baseURL + link)
	if err != nil {
		log.Fatal(err)
	}
	defer resp.Body.Close()

	linkCounter := 0
	for _, href := range getLinks(resp.Body) {
		//todo: rework how links are selected
		if len(href) > 0 && string(href[0]) == "/" && // only internal links
			href != link { //skip current page
			if len(href) > 1 && href[1] == '/' { //skip external links which start with //
				continue
			}
			linkCounter++
			//fmt.Printf("Found: %s\n", href)
			crawl(href)
			time.Sleep(time.Second * 1)
		}
	}
}

//Collect all links from response body and return it as an array of strings
func getLinks(body io.Reader) []string {
	var links []string
	z := html.NewTokenizer(body)
	for {
		tt := z.Next()

		switch tt {
		case html.ErrorToken:
			return links
		case html.StartTagToken, html.EndTagToken:
			token := z.Token()
			if "a" == token.Data {
				for _, attr := range token.Attr {
					if attr.Key == "href" {
						links = append(links, attr.Val)
					}

				}
			}

		}
	}
}

Source code on github

Get all links from html page with go lang

Hi guys,
this is day 5 out of 100 days of code!

Today I have coded html href links parser. This is part of web crawler project about which I will post in following days.

package main

import (
	"io"
	"log"
	"net/http"
        "fmt"

	"golang.org/x/net/html"
)

func main() {
    resp, err := http.Get("https://golang.org/")
    if err != nil {
        log.Fatal(err)
    }
    for _, v := range getLinks(resp.Body) {
        fmt.Println(v)
    }
}

//Collect all links from response body and return it as an array of strings
func getLinks(body io.Reader) []string {
	var links []string
	z := html.NewTokenizer(body)
	for {
		tt := z.Next()

		switch tt {
		case html.ErrorToken:
			//todo: links list shoudn't contain duplicates
			return links
		case html.StartTagToken, html.EndTagToken:
			token := z.Token()
			if "a" == token.Data {
				for _, attr := range token.Attr {
					if attr.Key == "href" {
						links = append(links, attr.Val)
					}

				}
			}

		}
	}
}

checking current time with go lang channels

This is day 4 out of 100 days of code.
Two code snippets today. First is very simple go channel.
It get current system time and post it to a channel then on other end message retrieved out of the channel and printed.

package main

import (
	"fmt"
	"time"
)

func main() {
	messages := make(chan string)
	go func() {
		for {
			messages <- fmt.Sprintf("Time now %s\n", time.Now())
			time.Sleep(100 * time.Millisecond)
		}
	}()
	for {
		msg := <-messages
		fmt.Println(msg)
	}
}

Second code snippet is a program which check two strings for anagram. When one string is equal reverse of other it is anagram.

package main

import (
	"flag"
	"fmt"
)

func main() {
	str1 := flag.String("first", "", "first string for anagram check")
	str2 := flag.String("second", "", "second string for anagram check")
	flag.Parse()
	checkForAnagrams(*str1, *str2)
}

func checkForAnagrams(str1, str2 string) {
	if len(str1) != len(str2) {
		fmt.Printf("%s and %s are not anagrams", str1, str2)
		return
	}

	for i := 0; i < len(str1); i++ {
		if str1[i] != str2[len(str2)-i-1] {
			fmt.Printf("%s and %s are not anagrams", str1, str2)
			return
		}
	}
	fmt.Printf("%s and %s are anagrams", str1, str2)
}

 

Recursive search through tree of files with golang

This is day 3 out of 100 days of code in go lang.

This code snippet does recursive search through directory tree. One improvement which can be made is to replace custom recursive function readFiles with filepath.Walk

Full code:

package main

import (
	"flag"
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"os"
	"strings"
)

var path = flag.String("path", "", "file path to search in")
var search = flag.String("search", "", "search string to look for")

func main() {
	flag.Parse()
	fi, err := os.Stat(*path)
	if err != nil {
		log.Fatal(err)
	}
	//fix path if directory
	if fi.Mode().IsDir() {
		*path = strings.TrimRight(*path, "/") + "/"
		readFiles(*path, *search)
	} else {
		log.Fatal("path must be a directory, but file was provided: ", *path)
	}
}

func readFiles(path, search string) {
	files, err := ioutil.ReadDir(path)
	if err != nil {
		log.Fatal(err)
	}
	for _, file := range files {
		fullpath := path + file.Name()
		if file.Mode().IsDir() {
			readFiles(fullpath+"/", search)
		} else if file.Mode().IsRegular() {
			searchInFile(fullpath, search)
		}
	}
}

func searchInFile(fullpath, search string) {
	data, err := ioutil.ReadFile(fullpath)
	if err != nil {
		log.Fatal(err)
	}
	//need to check for file type to detect filter off non-text files
	fileType := http.DetectContentType(data)
	if strings.Index(fileType, "text") == -1 {
		//skip all non text files
		return
	}
	for _, line := range strings.Split(string(data), "\n") {
		if strings.Index(line, search) > -1 {
			fmt.Printf("%s: %s\n", fullpath, line)
		}
	}
}

Source code at Github

very simple grep tool in GO – search substring in files

Hi guys,
here is day 2 of 100 days of code.

This time it is very simple implementation of grep tool. What it does it search string in specific directory and report matching lines.
What have I learned is that strings.Index is very useful to work with text and http.DetectContentType was the way to detect binary files.

Full code:

package main

import (
	"flag"
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"strings"
)

func main() {
	path := flag.String("path", "", "file path to search in")
	search := flag.String("search", "", "search string to look for")
	flag.Parse()

	files, err := ioutil.ReadDir(*path)
	if err != nil {
		log.Fatal(err)
	}
	for _, file := range files {
		if file.Mode().IsDir() {
			//to do: traverse directories recursively
			continue
		}
		data, err := ioutil.ReadFile(file.Name())
		if err != nil {
			log.Fatal(err)
		}
		//need to check for file type to detect binary content
		fileType := http.DetectContentType(data)
		for _, line := range strings.Split(string(data), "\n") {
			if strings.Index(line, *search) > -1 {
				if strings.Index(fileType, "text/plain") > -1 {
					fmt.Printf("%s: %s\n", file.Name(), line)
				} else {
					//best guess it is binary file
					//no need to go through all lines in binary file
					fmt.Printf("Binary file %s matches\n", file.Name())
					break
				}
			}
		}
	}
}

Source code at Github

Simple example of golang os.Args and flag module

Hi all,
this is day 1 of 100 days I am coding one go program a day. Wish me good luck!

In this example I will show how to parse command line arguments with flag module and os.Args array.

Flag module is very powerful and can be as simple as this:

//...
import "flag"
//...
name := flag.String("user", "", "user name")
flag.Parse()
fmt.Printf("user name is %s\n", *name)

To have the “same” functionality with os.Args you have to do some work. In following example I added support for –user, -user and user=<value> arguments.

//...
import "os"
//...
var userArg string
for index, arg := range os.Args {
	pattern := "-user="
	x := strings.Index(arg, pattern)
	if x > -1 {
		userArg = arg[x+len(pattern):]
		continue
	}
	if arg == "-user" || arg == "--user" {
		userArg = os.Args[index+1]
		continue
	}
}
fmt.Printf("user name is %s", userArg)

Full code:

package main

import (
	"flag"
	"fmt"
	"os"
	"strings"
)

func main() {
	name := flag.String("user", "", "user name")
	flag.Parse()
	fmt.Printf("user name is %s\n", *name)

	var userArg string
	for index, arg := range os.Args {
		pattern := "-user="
		x := strings.Index(arg, pattern)
		if x > -1 {
			userArg = arg[x+len(pattern):]
			continue
		}
		if arg == "-user" || arg == "--user" {
			userArg = os.Args[index+1]
			continue
		}
	}
	fmt.Printf("user name is %s", userArg)
}

Latest code examples see on GitHub