pushState and SEO

javascript web-applications seo hashbang pushstate

Many people have been saying, use pushState rather than hashbang.

What I don't understand is, how would you be search-engine friendly without using hashbang?

Presumably your pushState content is generated by client-side JavaScript code.

The scenario is thusly:

I'm on example.com. My user clicks a link: href="example.com/blog"

pushState captures the click, updates the URL, grabs a JSON file from somewhere, and creates the listing of blog posts in the content area.

With hashbangs, google knows to go to the escaped_fragment URL to get their static content.

With pushState, Google just sees nothing as it can't use the JavaScript code to load the JSON and subsequently create the template.

The only way to do it I can see is to render the template on the server side, but that completely negates the benefits of pushing the application layer to the client.

So am I getting this right, pushState is not SEO friendly for client-side applications at all?

Note to future readers: this question is obsolete. Read the official Google statement - in short, googlebot support JS now.

Stephen Ostermiller

Is pushState bad if you need search engines to read your content?

No, the talk about pushState is geared around accomplishing the same general process to hashbangs, but with better-looking URLs. Think about what really happens when you use hashbangs...

You say:

With hashbangs, Google knows to go to the escaped_fragment URL to get their static content.

So in other words,

Google sees a link to example.com/#!/blog Google requests example.com/?_escaped_fragment_=/blog You return a snapshot of the content the user should see

As you can see, it already relies on the server. If you aren't serving a snapshot of the content from the server, then your site isn't getting indexed properly.

So how will Google see anything with pushState?

With pushState, Google just sees nothing as it can't use the JavaScript to load the JSON and subsequently create the template.

Actually, Google will see whatever it can request at site.example/blog. A URL still points to a resource on the server, and clients still obey this contract. Of course, for modern clients, JavaScript has opened up new possibilities for retrieving and interacting with content without a page refresh, but the contracts are the same.

So the intended elegance of pushState is that it serves the same content to all users, old and new, JS-capable and not, but the new users get an enhanced experience.

How do you get Google to see your content?

The Facebook approach — serve the same content at the URL site.example/blog that your client app would transform into when you push /blog onto the state. (Facebook doesn't use pushState yet that I know of, but they do this with hashbangs) The Twitter approach — redirect all incoming URLs to the hashbang equivalent. In other words, a link to "/blog" pushes /blog onto the state. But if it's requested directly, the browser ends up at #!/blog. (For Googlebot, this would then route to _escaped_fragment_ as you want. For other clients, you could pushState back to the pretty URL).

So do you lose the _escaped_fragment_ capability with pushState?

In a couple of different comments, you said

escaped fragment is completely different. You can serve pure unthemed content, cached content, and not be put under the load that normal pages are. The ideal solution is for Google to either do JavaScript sites or implement some way of knowing that there's an escaped fragment URL even for pushstate sites (robots.txt?).

The benefits you mentioned are not isolated to _escaped_fragment_. That it does the rewriting for you and uses a specially-named GET param is really an implementation detail. There is nothing really special about it that you couldn't do with standard URLs — in other words, rewrite /blog to /?content=/blog on your own using mod_rewrite or your server's equivalent.

What if you don't serve server-side content at all?

If you can't rewrite URLs and serve some kind of content at /blog (or whatever state you pushed into the browser), then your server is really no longer abiding by the HTTP contract.

This is important because a page reload (for whatever reason) will pull content at this URL. (See https://wiki.mozilla.org/Firefox_3.6/PushState_Security_Review — "view-source and reload will both fetch the content at the new URI if one was pushed.")

It's not that drawing user interfaces once on the client-side and loading content via JS APIs is a bad goal, its just that it isn't really accounted for with HTTP and URLs and it's basically not backward-compatible.

At the moment, this is the exact thing that hashbangs are intended for — to represent distinct page states that are navigated on the client and not on the server. A reload, for example, will load the same resource which can then read, parse, and process the hashed value.

It just happens to be that they have also been used (notably by Facebook and Twitter) to change the history to a server-side location without a page refresh. It is in those use cases that people are recommending abandoning hashbangs for pushState.

If you render all content client-side, you should think of pushState as part of a more convenient history API, and not a way out of using hashbangs.

@Harry - Did you read the rest of my answer? A URL is a URL - meaning a resource locator. Does the server believe that content exists at site.com/blog? If not, then it doesn't exist to Search Engines. The purpose of pushState is not to work around that. It's for convenience. Hashbangs don't fix this either, and _escaped_fragment_ is a complicated workaround that still relies on the server having a snapshot of the JS generated content (seen by normal users, as you put it). pushState actually simplifies all of this.

@Harry - Until URLs are designed to serve client side content, they still refer to a resource on the server, and clients will treat them that way, including bots. It doesn't mean your goal to do as much as possible on the client is an invalid one, but for the time being it might have to be accomplished using (ugly) hashbangs. I've updated my answer for your use case.

@Harry First of all I'm only going off of what Google says they do for _escaped_fragment_, and I don't know what you do specifically. But from what Google says, I assume you must be serving some kind of content by the server when you see that query param. In your case it would require some trickery, but you could serve some <noscript> content or something else from /blog and then have JS build the page you want. Or, you could attempt to detect bots and intentionally serve entirely different content.

Once again the correct and best answer is not picked as correct... bad, bad.

If I have a link like: <a href="product/productName" onclick="showProduct(product)">A product</a>, and the onclick starts with "preventDefault()", then AJAXly loads the new content about the product into the page, and I make sure that the link ".../product/productName" will load a version of the page where the specific product content will be included on the response from the server --- so, the site will still work dynamically but will also still have static content available by going to a product link directly right? No need for pushState or hashbang in this way, no?

Nicole

What about using the meta tag that Google suggests for those who don't want hash-bangs in their URLs: <meta name="fragment" content="!">

See here for more info: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started

Unfortunately I don't think Nicole clarified the issue that I thought the OP was having. The problem is simply that we don't know who we are serving content to if we don't use the hash-bang. Pushstate does not solve this for us. We don't want search engines telling end-users to navigate to some URL that spits out unformatted JSON. Instead, we create URLs (that trigger other calls to more URLs) that retrieve data via AJAX and present it to the user in the manner we prefer. If the user is not a human, then as an alternative we can serve an html-snapshot, so that search engines can properly direct human users to the URL that they would expect to find the requested data at (and in a presentable manner). But the ultimate challenge is how do we determine the type of user? Yes we can possibly use .htaccess or something to rewrite the URL for search engine bots we detect, but I'm not sure how fullproof and futureproof this is. It may also be possible that Google could penalize people for doing this sort of thing, but I have not researched it fully. So the (pushstate + google's meta tag) combo seems to be a likely solution.

@NickC, Ok I see, so now I think that a better solution is to display the content initially without any JS. But at the top of your JS (after page loaded and dom ready) have some code immediately ran to hide the HTML content that was initially displayed or replace it with the JS enhancement. For example, I use jquery datagrids, so I would display an HTML table first, then load the JS immediately to transform/hide/replace the normal tabular data displayed to the JS grid version. Then from that point on, any other ajax requests can be served as JSON paired with the URL updating via pushstate.

How is your experience with the solution you suggested? Did Google index this 'temporary' HTML? Does it show up properly in the relevant google search? Also does it not mean that the experience is a little 'jittery' as the initial HTML page is 'refreshed' with html generated by JS?

@NileshKale Here's the solution I worked up and it accomplishes the job very well: stackoverflow.com/questions/22824991/…. I just pass an HTML table and also jqgrid with the JSON equivalent (to what's in the HTML). SEO reads the HTML, and the user gets an upgraded experience and all subsequent requests via ajax. Using pushstate, I can update the URL based on how the user sorts/pages the grid (without needing a hashbang). This allows the user to save the URL and get back to the same results.

I'll try in a few days to do an EDIT on my answer to better explain.

The AJAX-crawling scheme is now deprecated: developers.google.com/webmasters/ajax-crawling/docs/…. It is advised to change sites that use it: plus.google.com/+JohnMueller/posts/LT4fU7kFB8W

Peter Mortensen

All interesting talk about pushState and #!, and I still cannot see how pushState replaces #!'s purpose as the original poster asks.

Our solution to make our 99% JavaScript-based Ajax site/application SEOable is using #! of course. Since client rendering is done via HTML, JavaScript and PHP we use the following logic in a loader controlled by our page landing. The HTML files are totally separated from the JavaScript and PHP because we want the same HTML in both (for most part). The JavaScript and PHP do mostly the same thing, but the PHP code is less complicated as the JavaScript is a much richer user experience.

The JavaScript uses jQuery to inject into HTML the content it wants. PHP uses PHPQuery to inject into the HTML the content it wants - using 'almost' the same logic, but much simpler as the PHP version will only be used to display an SEOable version with SEOable links and not be interacted with like the JavaScript version.

All are the three components that make up a page, page.htm, page.js and page.php exist for anything that uses the escaped fragment to know whether to load the PHP version in place of the JavaScript version. The PHP version doesn't need to exist for non-SEOable content (such as pages that can only be seen after user login). All is straightforward.

I'm still puzzled how some front end developers get away developing great sites (with the richness of Google Docs) without using server-side technologies in conjunction with browser ones... If JavaScript is not even enabled, then our 99% JavaScript solution will of course not do anything without the PHP in place.

It is possible to have a nice URL to land on a PHP served page and redirect to a JavaScript version if JavaScript is enabled, but that is not nice from a user perspective since users are the more important audience.

On a side note. If you are making just a simple website that can function without any JavaScript, then I can see pushState being useful if your want to progressively enhance your user experience from a simple statically rendered content into something better, but if you want to give your user the best experience from the go get... let's say your latest game written in JavaScript or something like Google Docs then it's use for this solution is somewhat limiting as gracefully falling back can only go so far before the user experience is painful compared to the vision of the site.

pushState and SEO

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US