Wednesday, December 10, 2014

Cloudify3 Blueprint + Plugin for dummies - 3 minutes, 7 steps

cloudify 3 is out for a while now, and it is creating a lot of buzz.
I decided to give it a go and write my first blueprint and plugin.
Cloudify has a great documentation but I found their "quick start" to qualify for TL;DR classification.
So I decided to my version, and to make it for "dummy" level users such as myself.

In this post you will create everything from scratch, please note that cloudify also has a plugin template which they recommend to use. But then again, how will you learn without getting your hands dirty?
This post assumes you have a cloudify manager available!
If you don't - know that it might not be an easy task. Some people get it working under 10 minutes, and some (like me) take over 2 hours.
The Cloudify team said they will have a "try it now" version for the manager - so you will have one available within a single click.
Currently, the best way to get one working is to get their vagrant resources and run it.

So how do I write a plugin + blueprint under 3 minutes?

Step 1 - start your project

When I start my project I like to do the following:

  • create a folder
  • run git init
  • add .gitignore file with .idea and .cloudify in it
  • add LICENSE file to it
  • add a file (content below). Cloudify uses pip to install your plugin
  • create a directory for your sources, and write an file in it. This file has only 1 line __author__ = 'guym'.
The looks something like this:
__author__ = 'guym'

from setuptools import setup

    description='Playground plugins',

Step 2 - Write your plugin

Writing your plugin is easy. Simply add a python file under your sources directory.
A plugin file is simply python. Define a function in it - later we will map it to the blueprint.

from cloudify import ctx
from cloudify.decorators import operation

def create( **kwargs ):'ECHO PLUGIN: create running')
Congratulations - you wrote your plugin! Lets define it
Now is the time to commit your code!
Cloudify uses URLs to get your plugin and install it, so you need some online repository to host.
I use github.

Step 3 - Define your plugin

add plugin.yaml file. This is the definition file for your plugin
The file is very simple:

#               MY PLUGINS                   #

    executor: central_deployment_agent

You need to change the URL to your git repository.

Step 4 - test you did everything right by running pip install

Lets test that you defined everything well by running the command

virtualenv test-plugin
source test-plugin/bin/activate
pip install
You will need to change the URL to match your scenario.

You are now ready to start using your plugin.
Cloudify Team suggests a different way than I do to accomplish this, but I found this one better for me as a beginner.

Step 5 - Write a small blueprint that uses your plugin

  • Create a folder for the blueprint
  • Create file blueprint.yaml
The blueprint.yaml file will look something like this
tosca_definitions_version: cloudify_dsl_1_0


    type: mogi.Echo
    derived_from: cloudify.nodes.Root
          implementation: mogi.lib.echo.create   
You can move the "node_types" part to your plugin's yaml file if you wish, and then deliver a plugin with some node types that use it.
You will need to change the "import" URL to your yaml file.

Step 6 - upload, deploy, execute

I assume you know this part.
If you don't you will need to go to the blueprint's directory and do the following:
  • Install cloudify CLI
  • Run cfy init in your blueprint's directory
  • run cfy use -t MANAGER_IP to connect to your manager
  • run cfy upload -b myblueprint blueprint.yaml to upload the blueprint
  • run cfy deployments create -d mydeployment -b myblueprint to create a deployment from your blueprint
  • run cfy executions start -d mydeployment -w install to install your deployment

Step 7 - validate success

Assuming all went well, you should see the log print ECHO PLUGIN: create running in the console.
Yey! it worked!


The new cloudify3 project looks exciting. We were able to add a plugin to it fairly quickly.
Since a plugin is python code - the options are endless.

Wednesday, October 29, 2014

json parse is insane

json parse is insane

every project i've been in so far had the same stupid bug with JSON.parse.
at the beginning someone got a string instead of an object and they used JSON.parse to convert it to an object.
then after a while, someone fixed it upstream, and now JSON.parse is getting an object and fails with a very cryptic error message.
i always make sure my code is safe and i write it as such:
item =;
if ( typeof(item) === 'string' ) { 
   item = JSON.parse(item);
however, each time i write this piece of code, it seems to me like it is a bug in javascript.
seems to me like 1 of 2 things should happen:
  • javaScript should throw an error you cannot parse an object
  • javaScript should simply return the object as there's nothing to parse.
i think the latter is better and more aligned with the rest of JavaScript behavior.
but what happens now is simply insane - JavaScript implicitly converts the object toString and tries to parse that.
which is insane because nowhere in the world of JavaScript is toString meant to return a JSON.
this is why you have JSON.stringify to begin with.
so if anything at all JSON.parse should use JSON.stringify instead of toString
- but what would be the point of that? simply return the object you got.
another reason why this is insane is that toString returns [object Object]
which ironically enough starts like an array (which is a valid input for JSON.parse) and so the error developers get is invalid token o.
and last reason for insanity is that this has been the situation for quite a while now.
i tried to see what other libraries are doing with this insanity :
turns out that JQuery doesn't try to fix this issue. $.JSONparse is just as insane.
lodash does not offer anything in this matter.
i know that angular behaves nicely - like i expected - but for projects that don't use angular, it would be an overkill to use angular just for this.
other than that, I could not find any references to this problem anywhere.
this problem seems to me a lot like the console.log problem - that it does not exist in some browsers - and should have a similar fix.
my current recommendation to fix this issue by replacing JSON.parse method with something like
JSON.parseString = JSON.parse
JSON.parse = function ( o ) {
    if ( typeof(o)  === 'string' ){ 
       return JSON.parseString.apply(arguments);
    } else { 
        return o; 
but i keep getting strong objections on such a solution as it is intrusive. what do you think? leave comments below.

Wednesday, October 22, 2014

seo with phantomjs part 3

this article is part of a series of articles explaining about seo and single page applications.
the stack i use contains node, angular, nginx etc.
in this part we are tying up all the loose ends.
in this article you will have a single page application that supports seo.

writing the sitemap.xml and adding index.html to your paths

before we reach the final part of hooking it all together there are 2 seo things we should do.
the first one is to add index.html to your path.
it will make your life easier handling redirects and default index page etc..
it is not a requirement, but i recommend it and i assume you applied this in the rest of the post.
plus - developers are not usually aware of this, but not specifying index.html will cause problems when deadling with iframes.
i am not going to dwell on this here, but only mention that i had 2 iframes in my application that did not work until i added index.html to the src attribute.

the other thing is adding a sitemap.
adding a sitemap tells the crawlers which pages they should crawl.
it improves the search results.
i strongly recommend using a sitemap and not relying on the crawling behavior of following links.


adding index.html is done on your front-server and is quite easy.
all you need to do is add a redirect rule from / to /index.html
in nginx it looks like so :

rewrite ^/$ \$scheme://\$host/index.html break;


sitemaps are xml files that are returns when you approach /sitemap.xml path.
you can also expose them on a different path, and then submit sitemaps to the search engines - this is usually what i do.

you can maintain a file called sitemap.xml, but what is the fun in that?
if you have public content generated at runtime you should use an auto-generated sitemap.
simply use sitemap module from npm.
it has a pretty straight forward api.
even though you can omit the index.html from the path as crawlers will now be redirected automatically,
i recommend you specify it anyway.
your output should look like so :

<urlset xmlns="">
<url> <loc>!/public/item/53f0f7b250dab2f71901abf8/intro</loc> <lastmod>2014-09-03</lastmod> <changefreq>hourly</changefreq> <priority>0.5</priority> </url>

note that sitemaps has a size limit.
it can contain upto 50K entries.
when i helped implementing seo to a site, i prefered to publish the last 10K records that were updated.
my reasons were:

  • i don't want the crawlers to crawl the entire site every time
  • i don't want to construct a huge sitemap. it will consume a lot of memory and might crush the system
i might be wrong about these 2 assumptions, but i prefered to take the safe road.

hooking it all up

to hook everything up, you need to tell your front server to redirect all requests with _escaped_fragment_ to your backend.
since we are dealing with a single page application these requests will actually be to an index.html file - as there is no other route
in nginx you can add the following

location ~ /index.html {
    if (\$args ~ "_escaped_fragment_") {
         rewrite ^(.*)$ /backend/crawler;
just change /backend/crawler to your path.

in your express code map this url to the code that uses phantom.

app.get('/backend/crawler', function(req, res){
    var url = req.param('_escaped_fragment_');
    url = req.absoluteUrl('/index.html#!' + decodeURIComponent(url) );'prerendering url : ' + url ) ;

    var phantom = require('phantom');
    phantom.create(function (ph) {

        setTimeout( function(){
                logger.debug('unable to close phantom',e);
        }, 30000);

        return ph.createPage(function (page) {
  , function ( status ) {
                if ( status === 'fail'){
                    res.send(500,'unable to open url');
                }else {
                    page.evaluate(function () {
                        return document.documentElement.innerHTML;
                    }, function (result) {
                        res.send( result);


there are 2 things in this script that might seem weird
the first is the method absoluteUrl. i assign this method to the request in a middleware.
this is the implementation

exports.origin = function origin( req, res, next){
    var _origin = req.protocol + '://' +req.get('Host')  ;
    req.origin = _origin;

    // expects a URL from root "/some/page" which will result in "protocol://host:port/some/page"
    req.absoluteUrl = function( relativeUrl ){
        return _origin + relativeUrl;

the other thing to note is the 30 seconds timeout that i have, in which i invoke exit on phantom.
this is a safety valve..
since this code spawns new phantomjs processes, i want to make sure these processes will die eventually.
i had the unfortunate opportunity to see it go haywire and bringing the machine to 100% cpu.

on the same note, i suggest you add killall phantomjs to your start commands so that every time you stop/start your application, it will make sure no orphan phantomjs processes are left.

so now there is only 1 thing left.
take a url from the sitemap, replace the #! with _escaped_fragment_ and use wget or curl on it and see if you get the entire html

if this worked, you can also go to facebook and try to share the url.

The end

thank you for reading this serie. i hope it helped you.
please feel free to comment and give feedback.

Wednesday, October 15, 2014

seo with phantomjs part 2

this article is part of a series of articles explaining about seo and single page applications.
the stack i use contains node, angular, nginx etc.
in this part we are focusing on crawlers and single page applications

identifying a crawler and providing the prerender version

so now that we know how to generate a prerendered version of a page using phantomjs
all we need to is to identify a crawler and redirect them to the prerendered version.
turns out this is the tricky part..

url fragments

turns out a lot of people don't know this part so i decided to take a minute and explain..
urls have a general structure of http://domain:port/path/?query#fragment
the part we are interested in this post is the fragment.
if you are dealing with angularjs, you know that part very well.
a lot of developers do not know that fragments are client side only and will not reach the backend.
so you cannot write code in the backend to check if there is a fragment..

another important thing with fragment you should know is that when you change it in javascript, it does not cause a refresh to the entire page.
if you change any other part in the url, you will see the entire page refreshes.
but fragments will not do that..
and so - single page applications, like the ones that use angularjs, rely heavily on fragments.
this method allows them to keep a state on the url without reloading the page.
saving the state is important - it allows you to copy paste the url and send it to someone else - and not refreshing the page gives you a nice user experience

it is important to also note that recently, since html5, browsers now support changing the url without refreshing the entire page..
and so no need for fragments anymore..
in angularjs application you can simply define: $locationProvider.html5Mode(true)

personally, i am still not confident enough to use the html5mode, so i keep using fragments. more on this soon
however - you should consider using html5 mode as some crawlers support only that method.

and so the single page applications live happily ever after.. until seo comes to the picture..

how do crawlers handle single page applications ?

by the name you can understand that crawling a single page application is very easy - as there is only a single page.. but is misleading.
in fact there are a lot of pages in single page applications, but all the pages are loaded lazily in the background while one does not - the single page.
this actually causes a lot of issues to crawlers as they do not have that "lazy background" feature as it requires running javascript and invoking ajax calls.

so when a crawler comes to a single page application it should somehow create a request to a prerendered version of the page.

google's solution to the problem

along came google and declared a new standard.

if the url contains '#!' (hash-bang), this hash bang will be replaced with _escaped_fragment_
and so if your url looks like so http://domain:port/path/?query#!fragment (note the ! that was not there before) it will be crawled as
http://domain:port/path/?query∧_escaped_fragment_=escaped_fragment where escaped fragment is essentially the fragment without special characters that has other meaning when they are not after a hash tag.

html 5 mode

another, more modern option today is to use html 5 mode.
this essentially tells angularjs to stop using this format http://domain:port/path/?query#fragment and to start using http://domain:port/fragment.
browsers can now support changing the url without refreshing, the backend recieves the entire path, and everyone are happy..
i chose not to use this method as it is relatively new and there is still some trust i need to have in this before i use it.

but not all crawlers following google's standard.
if you'll try to share your page on linkedin you will have problems unless you use html5.
you can still expose specific urls for sharing, but it would be nicer to have it right out of the box.
i encourage you to try using html5 mode.

adding the hash-bang

now comes the sad part of adding a '!' too all your urls..
define the following to angular : $locationProvider.html5Mode(false).hashPrefix('!'); and go over all the links you wrote and change them.

for backward compatibility, you should also add a script to your header to redirect from # to #!:

  try {
      if ( window.location.hash.indexOf('#/') === 0 ){
          window.location.hash = '#!/' + window.location.hash.substring(2);
          console.log('unable to redirect to new url',e);


Next time

the next article will help you set up a sitemap and serve prerendered version of your pages
it bascially applies everything we learned until now.
that will be the last article in this serie.

Wednesday, October 8, 2014

seo wtih phantomjs part 1

this article is part of a series of articles explaining about seo and single page applications.
the stack i use contains node, angular, nginx etc.
in this part we are focusing on phantomjs and how to use it to prerender a page with javascript.

phantomjs to the rescue

phantomjs is a browser that runs in the memory (no graphics required).
you install it by running npm -g install phantomjs and then verify it is available by running phantomjs --version.
since it is a browser, it can do whatever a browser can such as render css, execute javascript and so on

first thing i did was write a small snippet of code testing out phantomjs.
here is a great snippet you find when searching phantomjs get html from phantomjs official site

var webPage = require('webpage');
var page = webPage.create();'', function (status) {
  var content = page.content;
  console.log('Content: ' + content);

so i wrote a file call phantom_example.js and obviously did the same mistake like everyone else
and i ran node phantom_example.js.
and i got the following error

    throw err;
Error: Cannot find module 'phantom'
    at Function.Module._resolveFilename (module.js:338:15)
    at Function.Module._load (module.js:280:25)
    at Module.require (module.js:364:17)
    at require (module.js:380:17)
    at Object.<anonymous> (/full/path/scripts/phantom_example.js:2:19)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
after digging around i found the obvious solution.
phantomjs is a command line, and not a library i include.
so running the command phantomjs phantom_example.js resolved it for me and i got the html.

running this from within my server

so this script required me to run phantomjs while i wanted to get the same result without leaving the server.
this was even simpler.
turns out there are a lot of libraries that do just that
my personal favorite is phantom.
these libraries essentially run the phantomjs commandline for you.
so when you invoke them, you will see a phantomjs process running in the background
and so some of them might require you to pre-install phantomjs.
phantom is a library that requires it to be installed.
here is a script with phantom that does the same thing. running node phantom_example.js will produce the right result.
var phantom = require('phantom');
var url = '';
phantom.create(function (ph) {
    return ph.createPage(function (page) {, function ( status ) {
            if ( status === 'fail'){
                console.log('unable to open url', status);
            }else {
                page.evaluate(function () {
                    return document.documentElement.innerHTML;
                }, function (result) {
                    console.log( result);


Next part

the next article will talk about crawlers and single page application where we will understand the problem and 2 solutions introduced by google and html5.

Wednesday, October 1, 2014

seo with phantomjs

seo with phantomjs

angularjs, seo, nginx, phantomjs, facebook share, node and sitemaps - in 60 minutes or less
when i was asked about how to make an angular site seo friendly, i was shocked to discover that
even though googlebot is supposed to support javascript, angular apps still have placeholders
where values should be, making your search result display as {{title}}.
really? 2014 is almost over, and we have to deal with prerendering still? omg..
as i was getting dizzy with the thought of having some jade template engine in my beautiful mean stack code,
i decided to risk everything and write a solution with phantomjs.
you will not believe how simple it is
i was then shocked again to discover that there are services doing just that, and they charge a lot of money!
i was unimpressed by services like
and what they offer. escpecially when i knew i was going to have a lot of pages soon.
and besides, why pay when it is so darn easy?

sharing in facebook doesn't work too, so who cares about google crawler?

even if google has javascript support, i want to be able to share my pages on facebook and other social networks..
so i need a better solution.

in the next couple of articles i will talk in depth about how to add seo support for single page applications.

Monday, November 4, 2013

Gruntfile.js - adding another HTML file to usemin

Adding an HTML to the Gruntfile usemin

Recently I started using node and with it yo, grunt and bower.
It is nice to get a quick kickstart
But now when I have to add/modify something in the build process,
I get stumped a lot.

You usually have a single base html file called index.html
and then you have Angular with ng-view to change content
thus generating a Single Page Application.

However, I always find it necessary to have error pages which are self contained,
which means index.html is not involved.

While yo's generators take care of index.html, your error page does not load correctly.
The reason for this is the usemin task in grunt which turn your
href attributes to point to the minified version of the file.
For example, if index.html has the following in the header

<!-- build:css({.tmp,app}) styles/main.css -->
<link rel="stylesheet" href="styles/main.css">
<link rel="stylesheet" href="styles/page1.css">
<!-- endbuild -->  
grunt usemin will turn it into this
<link rel="stylesheet" href="styles/1b62fe48.main.css">
note that page1 is not included in the output, and that is because the new main.css
contains them both.

So the question is what should I do if I have index2.html.
How would I get it to work here too.


The trick is to look at useminPrepare which by default looks like so

useminPrepare: {
    html: '<%= %>/index.html',
    options: {
        dest: '<%= yeoman.dist %>'

If you modify the HTML field to include some other file, that file will be picked to the build process too.
You do that by simply turning that field into an array like so:

useminPrepare: {
    html: ['<%= %>/index.html','<%= %>/index2.html'],
    options: {
        dest: '<%= yeoman.dist %>'

Assuming your index2.html has something similar to index.html

<!-- build:css({.tmp,app}) styles/main2.css -->
<link rel="stylesheet" href="styles/main2.css">
<link rel="stylesheet" href="styles/page2.css">
<!-- endbuild -->  
it will get picked up and processed accordingly.