RosaeNLG Tutorial for English

This is the documentation for 4.2.2 version, which is not the latest version. Consider upgrading to 4.4.0.

Our goal

This tutorial will guide you through a basic use of RosaeNLG. You will apply RosaeNLG to a basic usecase which is generating texts (Natural Language texts) to describe OnePlus smartphones.

You should read a basic Pug tutorial as a prerequisite.

Our OnePlus phone data will come from this site.

Node environment setup

This tutorial focuses on server-side rendering using node.js. But you can run your templates client side in the browser.

With the integrated editor you will also be able to run directly the tutoriel in your browser.

You can skip this part if you are familiar with node.js environment setup as it’s completely standard.

install node.js and npm in your environment
create a tutorial folder somewhere
npm init and just accept whatever it says/asks
npm install rosaenlg will download rosaenlg and end up with something like + rosaenlg@x.x.x
create an tuto.js file, just put console.log("hello NLG"); inside of it
node tuto.js should output hello NLG (PS that’s not really Natural Language Generation yet)

Initial data

Our initial data. Put it in your tuto.js file.

let phones = [
  {
    name: 'OnePlus 5T',
    colors: ['Black', 'Red', 'White'],
    displaySize: 6,
    screenRatio: 80.43,
    battery: 3300,
  },
  {
    name: 'OnePlus 5',
    colors: ['Gold', 'Gray'],
    displaySize: 5.5,
    screenRatio: 72.93,
    battery: 3300,
  },
  {
    name: 'OnePlus 3T',
    colors: ['Black', 'Gold', 'Gray'],
    displaySize: 5.5,
    screenRatio: 73.15,
    battery: 3400,
  },
];

Plumbing & first texts

You need the rosaenlg lib, thus, add this at the beginning of your tuto.js file:

const rosaenlgPug = require('rosaenlg');

In the same file, call a pug template (we will create the template just after):

let res = rosaenlgPug.renderFile('tuto.pug', {
    language: 'en_US',
    phone: phones[0]
});
console.log( res );

This will render the tuto.pug template. Parameters:

choosing a language (here language: 'en_US') is mandatory.
cache: true tells Pug that it does not need to recompile the template at each call (in practice it is faster).
for the other properties you can organize them as you want; here we just put a phone property with our first phone.

Create a tuto.pug file with this content:

p #{phone.name}

This first template is just standard Pug syntax: we output the name of the phone.

When you render the template (using node tuto.js) you should get:
<p>OnePlus 5T</p>

(ok, it’s not really NLG yet)

List elements with the `eachz` structure

Let’s talk about the colors of the phone: we want to output Available colors are aaa, bbb and ccc.

Create a mixin dedicated to listing colors (in your tuto.pug file):

mixin colors
  | the phone's available colors are
  eachz color in phone.colors with { separator:',', last_separator:'and', end:'.' }
    | #{color}

eachz is a RosaeNLG structure. It’s like a foreach loop, with additionnal NLG features.
{ separator:',', last_separator:'and', end:'.' } tells eachz that:
- the standard separator is the comma
- and should be used between the two last colors
- we should end with a dot

Call the mixin:

p #{phone.name} . #[+colors]

Run it. Output should be: OnePlus 5T. The phone’s available colors are Black, Red and White.

See how RosaeNLG managed the spacing between words and the automatic capitalization. This is called "Surface Realization" in NLG.

Now we are doing Natural Language Generation 🚀

Looping on all the phones

Let’s generate some text for each phone. In your tuto.js file:

const res = rosaenlgPug.renderFile('tuto.pug', {
  language: 'en_US',
  phones: phones,
  cache: true,
});
console.log(res);

In tuto.pug:

- let phone;
each phoneElt in phones
  - phone = phoneElt;
  p #{phone.name} . #[+colors]

Here we have put the main loop directly in the Pug template. In real cases, it is better to loop outside (directly in the JavaScript caller), as this allows an easy reset of RosaeNLG and Pug between each rendering, which is much better for performance.

You should get:
OnePlus 5T. The phone’s available colors are Black, Red and White.
OnePlus 5. The phone’s available colors are Gold and Gray.
OnePlus 3T. The phone’s available colors are Black, Gold and Gray.

Basic synonyms

Readers love when texts are not repetitive. Let’s add some very basic synonyms: tints and tones are synonyms of colors. Change your colors mixin:

mixin colors
  | the phone's available #[+syn('colors', 'tints', 'tones')]
  | are
  ...

Run it multiple times and you should have different outputs.

More synonyms

The syn mixin is perfect for words or part of sentences. But let’s say we want create some introduction texts, and that we want to have diversity.

Let’s put all these different introductions in a dedicated mixin:

mixin intro
  synz
    syn
      | the #{phone.name} is really a fantastic phone.
    syn
      | i really love the new #{phone.name}.
    syn
      | #{phone.name} : a great phone !

The synz > syn structure simply lists synonymic alternatives. You can put whatever you want in each alternative (in each syn): conditions, more synonyms etc.

Let’s call this new mixin:

mixin printPhone
  | #[+intro] .
  | #[+colors] .

- let phone;
each phoneElt in phones
  - phone = phoneElt;
  p #[+printPhone]

You should get:
I really love the new OnePlus 5T. The phone’s available tints are Black, Red and White.
I really love the new OnePlus 5. The phone’s available tints are Gold and Gray.
OnePlus 3T: a great phone! The phone’s available tones are Black, Gold and Gray.

Intros are chosen randomly so you might have repetitions.

List parts of a sentence

Let’s talk about the display: physical size and screen-to-body ratio. We want to output something like it has a physical size of 6 inches and a screen-to-body ratio of 80.43 %. We could build a big static sentence, but structuring the code will give us more flexibility.

Let’s cut our big sentence in chunks, one for each property:

mixin display
  itemz { separator:',', last_separator:'and' }
    item
      | a physical size of #[+value(phone.displaySize)] inches
    item
      | a screen-to-body ratio of #[+value(phone.screenRatio)] %

value is a mixin that will output the value respecting the locale.
itemz > item is much like synz > syn, except that it will not choose one alternative, but list all the items.
The js object after itemz tells RosaeNLG how to assemble elements. It is mandatory. separator and last_separator work exactly the same way as in the eachz structure.

Do not forget to call this mixin:

mixin printPhone
  | #[+intro] .
  | #[+colors] .
  | #[+display] .

The result is not that bad, but the beginning of the text is missing. Let’s fix that:

mixin display
  itemz { begin_with_general: 'it has a display', separator:',', last_separator:'and' }
    item
      | a physical size of #[+value(phone.displaySize)] inches
    item
      | a screen-to-body ratio of #[+value(phone.screenRatio)] %

begin_with_general tells RosaeNLG what the texts should begin with. You could have put it outside the mixin (just before), but it’s a good practice to put them inside: for instance, when the list of the elements is empty, RosaeNLG will not output the begin_with_general content.

You should get better texts:
The OnePlus 5T is really a fantastic phone. The phone’s available tones are Black, Red and White. It has a display with a physical size of 6 inches and a screen-to-body ratio of 80.43 %.
OnePlus 5: a great phone! The phone’s available tones are Gold and Gray. It has a display with a physical size of 5.5 inches and a screen-to-body ratio of 72.93 %.
OnePlus 3T: a great phone! The phone’s available colors are Black, Gold and Gray. It has a display with a physical size of 5.5 inches and a screen-to-body ratio of 73.15 %.

You can add some diversity by randomly changing the order of the output by adding the mix parameter:

mixin display
  itemz { begin_with_general: 'it has a display', separator:',', last_separator:'and', mix:true }
    item
      | a physical size of #[+value(phone.displaySize)] inches
    item
      | a screen-to-body ratio of #[+value(phone.screenRatio)] %

The OnePlus 5T is really a fantastic phone. The phone’s available colors are Black, Red and White. It has a display with a screen-to-body ratio of 80.43 % and a physical size of 6 inches.
The OnePlus 5 is really a fantastic phone. The phone’s available tints are Gold and Gray. It has a display with a physical size of 5.5 inches and a screen-to-body ratio of 72.93 %.
I really love the new OnePlus 3T. The phone’s available colors are Black, Gold and Gray. It has a display with a physical size of 5.5 inches and a screen-to-body ratio of 73.15 %.

Even more variety

First let’s add text some about the battery:

  | this phone has a battery of #[+value(phone.battery)] mAh .

Now we have a decent volume of text. But we would like to have more variability: we always talk about colors, the display, and the battery, in this order, but it could be in any order. Let’s put all our text chunks in an itemz > item structure, and add a mix:

mixin phone_chunks
  itemz {separator: '.', end:'.', mix:true}
    item
      | #[+colors]
    item
      | #[+display]
    item
      | this phone has a battery of #[+value(phone.battery)] mAh

mixin printPhone
  | #[+intro] .
  | #[+phone_chunks]

Referring expressions

There is a hidden structure behind the way we talk about the phone :

The first time we talk about it we use the name of the phone.
The next times we use either the phone, it, or this phone.

This is called referring expressions in NLG. The first time we talk about something we use its representant representation and after we use the referring expression representation. We want RosaeNLG to care for that automatically.

Let’s create 2 mixins, one for each kind of representant:

mixin phone_ref(obj, params)
  | #{obj.name}

mixin phone_refexpr(obj, params)
  | #[+syn('the phone', 'this phone', 'it')]

The first parameter, obj, is the phone itself. {obj.name} is exactly the same as {phone.name}.

We also have to register them:

- let phone;
each phoneElt in phones
  - phone = phoneElt;

  p
    -
      phone.ref = phone_ref;
      phone.refexpr = phone_refexpr;
    | #[+printPhone]

Now we can use them everywhere:

mixin colors
  | #[+value(phone)]'s available #[+syn('colors', 'tints', 'tones')]
  | are
  ...

mixin intro
  synz
    syn
      | the #[+value(phone)] is really a fantastic phone.
    syn
      | i really love the new #[+value(phone)].
    syn
      | #[+value(phone)] : a great phone !

In the phone_chunks mixin:

      | #[+value(phone)] has a battery of #[+value(phone.battery)] mAh

We have to change the structure for the it has a display with, as we cannot put a value directly in the begin_with_general structure. It has to be a string or a mixin:

mixin itHasADisplay
  | #[+value(phone)] has a display with
...
  itemz { begin_with_general: itHasADisplay, separator:',', last_separator:'and', mix:true }

This is what you should get:
OnePlus 5T: a great phone! It has a display with a physical size of 6 inches and a screen-to-body ratio of 80.43 %. It has a battery of 3300 mAh. It’s available tones are Black, Red and White.
I really love the new OnePlus 5. This phone has a display with a physical size of 5.5 inches and a screen-to-body ratio of 72.93 %. The phone has a battery of 3300 mAh. This phone’s available tints are Gold and Gray.
The OnePlus 3T is really a fantastic phone. This phone’s available colors are Black, Gold and Gray. It has a display with a physical size of 5.5 inches and a screen-to-body ratio of 73.15 %. This phone has a battery of 3400 mAh.

It’s pretty decent, but there’s one issue: you can trigger It’s available tones are which is wrong. It should be either the phone’s, this phone’s, or its.

Conditional texts

We could use different techniques to address that, but a pretty straightforward solution is just to forbid the use of it at this specific place.

Let’s add a flag when calling the referring expression: we just don’t want it to be triggered:

  | #[+value(phone, {'NOT_IT':true})]'s available #[+syn('colors', 'tints', 'tones')]

Now we have to:

catch this flag in our referring expression mixin
use the synz > syn structure instead of syn to be able to write the condition

mixin phone_refexpr(obj, params)
  synz
    syn
      | the phone
    syn
      | this phone
    syn
      if !hasFlag(params, 'NOT_IT')
        | it

Generate the texts and you should see that the It’s have disappeared.

When an empty synonym is triggered (which can happen here), RosaeNLG will just choose another one.

Still we can have this kind of output:
The OnePlus 5T is really a fantastic phone. This phone has a display with a screen-to-body ratio of 80.43 % and a physical size of 6 inches. This phone’s available tints are Black, Red and White. This phone has a battery of 3300 mAh.

We have 3 times This phone here which is not perfect. How could we avoid that?

Change synonym mode

Instead of choosing synonyms randomly, we can just trigger them in sequence. This will avoid close repetitions:

mixin phone_refexpr(obj, params)
  synz {mode:'sequence'}
    syn
      ...

Now we should have less repetitions in our synonyms for the phone.

Fancier sentences and "has said"

Let’s generate a fancier sentence combining display size and battery capacity: The phone has a display with a screen-to-body ratio of 73.15 % and a physical size of 5.5 inches along with a battery of 3400 mAh.

This is quite straightforward:

| #[+display]
| along with a battery of #[+value(phone.battery)] mAh

The problem is, we don’t want to talk about the battery twice. We could just remove the standard battery sentence (The phone has a battery of 3400 mAh), but let’s try to trigger the battery sentence only if we have not talked about the battery before. This is where hasSaid and recordSaid come in.

    item
      | #[+display]
      
      if !hasSaid('BATTERY')
        | along with a battery of #[+value(phone.battery)] mAh
        recordSaid('BATTERY')
    item
      if !hasSaid('BATTERY')
        | #[+value(phone)] has a battery of #[+value(phone.battery)] mAh
        recordSaid('BATTERY')

The pattern hasSaid/recordSaid pattern, here used twice, is the following: if we haven’t talked about something:

We talk about it
We record that we talked about it

You must use these built-in mechanisms and not rely on your own variables or hashmaps that you would set along text generation, as RosaeNLG goes back and forth in the text rendering.

You also need a deleteSaid('BATTERY') in the main loop, as we must talk of the battery for each phone.

You should get those nice sentences:+ OnePlus 5T: a great phone! The phone has a battery of 3300 mAh. This phone’s available tints are Black, Red and White. It has a display with a physical size of 6 inches and a screen-to-body ratio of 80.43 %.
OnePlus 5: a great phone! The phone has a battery of 3300 mAh. This phone has a display with a physical size of 5.5 inches and a screen-to-body ratio of 72.93 %. The phone’s available colors are Gold and Gray.

Congratulations!

Sincere Congratulations! 🎆

Even more

We have gone through some aspects of NLG with this tutorial.

There are some other features you can explore, for instance:

automatic a / an : a apple ⇒ an apple, a hour ⇒ an hour
agreement of verbs (especially the irregular ones)
agreement of words: tomato ⇒ tomatoes
etc.

Final version of the code

tuto.js

const rosaenlgPug = require('rosaenlg');
let phones = [
  {
    name: 'OnePlus 5T',
    colors: ['Black', 'Red', 'White'],
    displaySize: 6,
    screenRatio: 80.43,
    battery: 3300,
  },
  {
    name: 'OnePlus 5',
    colors: ['Gold', 'Gray'],
    displaySize: 5.5,
    screenRatio: 72.93,
    battery: 3300,
  },
  {
    name: 'OnePlus 3T',
    colors: ['Black', 'Gold', 'Gray'],
    displaySize: 5.5,
    screenRatio: 73.15,
    battery: 3400,
  },
];
const res = rosaenlgPug.renderFile('tuto.pug', {
  language: 'en_US',
  phones: phones,
  cache: true,
});
console.log(res);

tuto.pug

//- Copyright 2019 Ludan Stoecklé
//- SPDX-License-Identifier: Apache-2.0

//- tag::displayMixin[]
mixin display
  itemz { begin_with_general: itHasADisplay, separator:',', last_separator:'and', mix:true }
    item
      | a physical size of #[+value(phone.displaySize)] inches
    item
      | a screen-to-body ratio of #[+value(phone.screenRatio)] %
//- end::displayMixin[]

//- tag::colorsMixin[]
mixin colors
  //- tag::colorsMixinNotIt[]
  | #[+value(phone, {'NOT_IT':true})]'s available #[+syn('colors', 'tints', 'tones')]
  //- end::colorsMixinNotIt[]
  | are
  eachz color in phone.colors with { separator:',', last_separator:'and', end:'.' }
    | #{color}
//- end::colorsMixin[]

//- tag::introMixin[]
mixin intro
  synz
    syn
      | the #[+value(phone)] is really a fantastic phone.
    syn
      | i really love the new #[+value(phone)].
    syn
      | #[+value(phone)] : a great phone !
//- end::introMixin[]

//- tag::mixinItHasADisplay[]
mixin itHasADisplay
  | #[+value(phone)] has a display with
//- end::mixinItHasADisplay[]

mixin phone_chunks
  itemz {separator: '.', end:'.', mix:true}
    item
      | #[+colors]
    //- tag::hasSaid[]
    item
      | #[+display]
      
      if !hasSaid('BATTERY')
        | along with a battery of #[+value(phone.battery)] mAh
        recordSaid('BATTERY')
    item
      if !hasSaid('BATTERY')
        | #[+value(phone)] has a battery of #[+value(phone.battery)] mAh
        recordSaid('BATTERY')
    //- end::hasSaid[]

mixin phone_ref(obj, params)
  | #{obj.name}

mixin phone_refexpr(obj, params)
  synz {mode:'sequence'}
    syn
      | the phone
    syn
      | this phone
    syn
      if !hasFlag(params, 'NOT_IT')
        | it

//- tag::phoneMixin[]
mixin printPhone
  | #[+intro] .
  | #[+phone_chunks]
//- end::phoneMixin[]

//- tag::main[]
- let phone;
each phoneElt in phones
  - phone = phoneElt;
  
  p
    -
      phone.ref = phone_ref;
      phone.refexpr = phone_refexpr;
    | #[+printPhone]
    deleteSaid('BATTERY')
//- end::main[]

RosaeNLG Tutorial for English

Our goal

Node environment setup

Initial data

Plumbing & first texts

List elements with the eachz structure

Looping on all the phones

Basic synonyms

More synonyms

List parts of a sentence

Even more variety

Referring expressions

Conditional texts

Change synonym mode

Fancier sentences and "has said"

Congratulations!

Even more

Final version of the code

List elements with the `eachz` structure