dimanche 12 juin 2016

goquery's ReplaceWithHtml sees html entity where there is none

I try to replace a href with a string using goquerys function ReplaceWithHtml(). The href contains queries, one of which is &region=. Somehow, it is identified by goquery as the html entity for registered trademark ® although the ; is missing:

got :   <html><head></head><body>href{http://www.nytimes.com/action=keypress®ion=FixedLeft}{Text}</body></html>
                                                                          ---^---
want:   <html><head></head><body>href{http://www.nytimes.com/action=keypress&region=FixedLeft}{Text}</head><body>

Minium working example:

package main

import (
        "fmt"
        "strings"

        "github.com/PuerkitoBio/goquery"
        "golang.org/x/net/html"
)

func main() {

        test := `<a href="http://www.nytimes.com/action=keypress&amp;region=FixedLeft">Text</a>`

        node, _ := html.Parse(strings.NewReader(test))
        doc := goquery.NewDocumentFromNode(node)

        convertLink(doc)

        got, _ := doc.Html()
        want := `<html><head></head><body>href{http://www.nytimes.com/action=keypress&region=FixedLeft}{Text} After</head><body>`

        fmt.Println("got :   " + got)
        fmt.Println("want:   " + want)
}

func convertLink(doc *goquery.Document) {
        //html, _ := doc.Html()
        //fmt.Println("Before : " + html)

        doc.Find("a").Each(func(_ int, s *goquery.Selection) {
                href, _ := s.Attr("href")
                text := s.Text()

                replace := "\href{" + href + "}{" + text + "}"

                //fmt.Println("Replace:                             " + replace)

                s.ReplaceWithHtml(replace)
        })

        //html, _ = doc.Html()
        //fmt.Println("After  :    " + html)
        //fmt.Println("")
}

When uncommenting the comments you can see, that the string replace is still correct. Only after calling ReplaceWithHtml the document doc converted it to the trademark sign.

What am I doing wrong?

Aucun commentaire:

Enregistrer un commentaire