RosaeNLG Tutorial for English
Node environment setup
This tutorial focuses on server-side rendering using node.js. But you can run your templates client side in the browser. |
With the integrated editor you will also be able to run directly the tutoriel in your browser. |
You can skip this part if you are familiar with node.js environment setup as it’s completely standard.
-
install
node.js
andnpm
in your environment -
create a
tutorial
folder somewhere -
npm init
and just accept whatever it says/asks -
npm install rosaenlg
will download rosaenlg and end up with something like+ rosaenlg@x.x.x
-
create an
tuto.js
file, just putconsole.log("hello NLG");
inside of it -
node tuto.js
should outputhello NLG
(PS that’s not really Natural Language Generation yet)
Initial data
Our initial data. Put it in your tuto.js
file.
let phones = [
{
name: 'OnePlus 5T',
colors: ['Black', 'Red', 'White'],
displaySize: 6,
screenRatio: 80.43,
battery: 3300,
},
{
name: 'OnePlus 5',
colors: ['Gold', 'Gray'],
displaySize: 5.5,
screenRatio: 72.93,
battery: 3300,
},
{
name: 'OnePlus 3T',
colors: ['Black', 'Gold', 'Gray'],
displaySize: 5.5,
screenRatio: 73.15,
battery: 3400,
},
];
Plumbing & first texts
You need the rosaenlg
lib, thus, add this at the beginning of your tuto.js
file:
const rosaenlgPug = require('rosaenlg');
In the same file, call a pug template (we will create the template just after):
let res = rosaenlgPug.renderFile('tuto.pug', {
language: 'en_US',
phone: phones[0]
});
console.log( res );
This will render the tuto.pug
template. Parameters:
-
choosing a language (here
language: 'en_US'
) is mandatory. -
cache: true
tells Pug that it does not need to recompile the template at each call (in practice it is faster). -
for the other properties you can organize them as you want; here we just put a
phone
property with our first phone.
Create a tuto.pug
file with this content:
p #{phone.name}
This first template is just standard Pug syntax: we output the name of the phone.
When you render the template (using node tuto.js
) you should get:
<p>OnePlus 5T</p>
(ok, it’s not really NLG yet)
List elements with the eachz
structure
Let’s talk about the colors of the phone: we want to output Available colors are aaa, bbb and ccc.
Create a mixin dedicated to listing colors (in your tuto.pug
file):
mixin colors | the phone's available colors are eachz color in phone.colors with { separator:',', last_separator:'and', end:'.' } | #{color}
-
eachz
is a RosaeNLG structure. It’s like a foreach loop, with additionnal NLG features. -
{ separator:',', last_separator:'and', end:'.' }
tellseachz
that:-
the standard separator is the comma
-
and
should be used between the two last colors -
we should end with a dot
-
Call the mixin:
p #{phone.name} . #[+colors]
Run it. Output should be: OnePlus 5T. The phone’s available colors are Black, Red and White.
See how RosaeNLG managed the spacing between words and the automatic capitalization. This is called "Surface Realization" in NLG.
Now we are doing Natural Language Generation 🚀 |
Looping on all the phones
Let’s generate some text for each phone. In your tuto.js
file:
const res = rosaenlgPug.renderFile('tuto.pug', {
language: 'en_US',
phones: phones,
cache: true,
});
console.log(res);
In tuto.pug
:
- let phone; each phoneElt in phones - phone = phoneElt; p #{phone.name} . #[+colors]
Here we have put the main loop directly in the Pug template. In real cases, it is better to loop outside (directly in the JavaScript caller), as this allows an easy reset of RosaeNLG and Pug between each rendering, which is much better for performance. |
You should get:
OnePlus 5T. The phone’s available colors are Black, Red and White.
OnePlus 5. The phone’s available colors are Gold and Gray.
OnePlus 3T. The phone’s available colors are Black, Gold and Gray.
Basic synonyms
Readers love when texts are not repetitive. Let’s add some very basic synonyms: tints and tones are synonyms of colors.
Change your colors
mixin:
mixin colors | the phone's available #[+syn('colors', 'tints', 'tones')] | are ...
Run it multiple times and you should have different outputs.
More synonyms
The syn
mixin is perfect for words or part of sentences. But let’s say we want create some introduction texts, and that we want to have diversity.
Let’s put all these different introductions in a dedicated mixin:
mixin intro synz syn | the #{phone.name} is really a fantastic phone. syn | i really love the new #{phone.name}. syn | #{phone.name} : a great phone !
The synz > syn
structure simply lists synonymic alternatives. You can put whatever you want in each alternative (in each syn
): conditions, more synonyms etc.
Let’s call this new mixin:
mixin phone | #[+intro] . | #[+colors] . - let phone; each phoneElt in phones - phone = phoneElt; p #[+phone]
You should get:
I really love the new OnePlus 5T. The phone’s available tints are Black, Red and White.
I really love the new OnePlus 5. The phone’s available tints are Gold and Gray.
OnePlus 3T: a great phone! The phone’s available tones are Black, Gold and Gray.
Intros are chosen randomly so you might have repetitions.
List parts of a sentence
Let’s talk about the display: physical size and screen-to-body ratio. We want to output something like it has a physical size of 6 inches and a screen-to-body ratio of 80.43 %. We could build a big static sentence, but structuring the code will give us more flexibility.
Let’s cut our big sentence in chunks, one for each property:
mixin display itemz { separator:',', last_separator:'and' } item | a physical size of #[+value(phone.displaySize)] inches item | a screen-to-body ratio of #[+value(phone.screenRatio)] %
-
value
is a mixin that will output the value respecting the locale. -
itemz > item
is much likesynz > syn
, except that it will not choose one alternative, but list all the items. -
The js object after
itemz
tells RosaeNLG how to assemble elements. It is mandatory.separator
andlast_separator
work exactly the same way as in theeachz
structure.
Do not forget to call this mixin:
mixin phone | #[+intro] . | #[+colors] . | #[+display] .
The result is not that bad, but the beginning of the text is missing. Let’s fix that:
mixin display itemz { begin_with_general: 'it has a display', separator:',', last_separator:'and' } item | a physical size of #[+value(phone.displaySize)] inches item | a screen-to-body ratio of #[+value(phone.screenRatio)] %
begin_with_general
tells RosaeNLG what the texts should begin with. You could have put it outside the mixin (just before), but it’s a good practice to put them inside: for instance, when the list of the elements is empty, RosaeNLG will not output the begin_with_general
content.
You should get better texts:
The OnePlus 5T is really a fantastic phone. The phone’s available tones are Black, Red and White. It has a display with a physical size of 6 inches and a screen-to-body ratio of 80.43 %.
OnePlus 5: a great phone! The phone’s available tones are Gold and Gray. It has a display with a physical size of 5.5 inches and a screen-to-body ratio of 72.93 %.
OnePlus 3T: a great phone! The phone’s available colors are Black, Gold and Gray. It has a display with a physical size of 5.5 inches and a screen-to-body ratio of 73.15 %.
You can add some diversity by randomly changing the order of the output by adding the mix
parameter:
mixin display itemz { begin_with_general: 'it has a display', separator:',', last_separator:'and', mix:true } item | a physical size of #[+value(phone.displaySize)] inches item | a screen-to-body ratio of #[+value(phone.screenRatio)] %
The OnePlus 5T is really a fantastic phone. The phone’s available colors are Black, Red and White. It has a display with a screen-to-body ratio of 80.43 % and a physical size of 6 inches.
The OnePlus 5 is really a fantastic phone. The phone’s available tints are Gold and Gray. It has a display with a physical size of 5.5 inches and a screen-to-body ratio of 72.93 %.
I really love the new OnePlus 3T. The phone’s available colors are Black, Gold and Gray. It has a display with a physical size of 5.5 inches and a screen-to-body ratio of 73.15 %.
Even more variety
First let’s add text some about the battery:
| this phone has a battery of #[+value(phone.battery)] mAh .
Now we have a decent volume of text. But we would like to have more variability: we always talk about colors, the display, and the battery, in this order, but it could be in any order. Let’s put all our text chunks in an itemz > item
structure, and add a mix
:
mixin phone_chunks itemz {separator: '.', end:'.', mix:true} item | #[+colors] item | #[+display] item | this phone has a battery of #[+value(phone.battery)] mAh mixin phone | #[+intro] . | #[+phone_chunks]
Referring expressions
There is a hidden structure behind the way we talk about the phone :
-
The first time we talk about it we use the name of the phone.
-
The next times we use either
the phone
,it
, orthis phone
.
This is called referring expressions in NLG. The first time we talk about something we use its representant representation and after we use the referring expression representation. We want RosaeNLG to care for that automatically.
Let’s create 2 mixins, one for each kind of representant:
mixin phone_ref(obj, params) | #{obj.name} mixin phone_refexpr(obj, params) | #[+syn('the phone', 'this phone', 'it')]
The first parameter, obj , is the phone itself. {obj.name} is exactly the same as {phone.name} .
|
We also have to register them:
- let phone; each phoneElt in phones - phone = phoneElt; p - phone.ref = 'phone_ref'; phone.refexpr = 'phone_refexpr'; | #[+phone]
Now we can use them everywhere:
mixin colors | #[+value(phone)]'s available #[+syn('colors', 'tints', 'tones')] | are ... mixin intro synz syn | the #[+value(phone)] is really a fantastic phone. syn | i really love the new #[+value(phone)]. syn | #[+value(phone)] : a great phone !
In the phone_chunks
mixin:
| #[+value(phone)] has a battery of #[+value(phone.battery)] mAh
We have to change the structure for the it has a display with
, as we cannot put a value
directly in the begin_with_general
structure. It has to be a string or a mixin:
mixin itHasADisplay | #[+value(phone)] has a display with ... itemz { begin_with_general: 'itHasADisplay', separator:',', last_separator:'and', mix:true }
This is what you should get:
OnePlus 5T: a great phone! It has a display with a physical size of 6 inches and a screen-to-body ratio of 80.43 %. It has a battery of 3300 mAh. It’s available tones are Black, Red and White.
I really love the new OnePlus 5. This phone has a display with a physical size of 5.5 inches and a screen-to-body ratio of 72.93 %. The phone has a battery of 3300 mAh. This phone’s available tints are Gold and Gray.
The OnePlus 3T is really a fantastic phone. This phone’s available colors are Black, Gold and Gray. It has a display with a physical size of 5.5 inches and a screen-to-body ratio of 73.15 %. This phone has a battery of 3400 mAh.
It’s pretty decent, but there’s one issue: you can trigger It’s available tones are which is wrong. It should be either the phone’s, this phone’s, or its.
Conditional texts
We could use different techniques to address that, but a pretty straightforward solution is just to forbid the use of it
at this specific place.
Let’s add a flag when calling the referring expression: we just don’t want it to be triggered:
| #[+value(phone, {'NOT_IT':true})]'s available #[+syn('colors', 'tints', 'tones')]
Now we have to:
-
catch this flag in our referring expression mixin
-
use the
synz > syn
structure instead ofsyn
to be able to write the condition
mixin phone_refexpr(obj, params) synz syn | the phone syn | this phone syn if !hasFlag(params, 'NOT_IT') | it
Generate the texts and you should see that the It’s have disappeared.
When an empty synonym is triggered (which can happen here), RosaeNLG will just choose another one. |
Still we can have this kind of output:
The OnePlus 5T is really a fantastic phone. This phone has a display with a screen-to-body ratio of 80.43 % and a physical size of 6 inches. This phone’s available tints are Black, Red and White. This phone has a battery of 3300 mAh.
We have 3 times This phone here which is not perfect. How could we avoid that?
Change synonym mode
Instead of choosing synonyms randomly, we can just trigger them in sequence. This will avoid close repetitions:
mixin phone_refexpr(obj, params) synz {mode:'sequence'} syn ...
Now we should have less repetitions in our synonyms for the phone.
Fancier sentences and "has said"
Let’s generate a fancier sentence combining display size and battery capacity: The phone has a display with a screen-to-body ratio of 73.15 % and a physical size of 5.5 inches along with a battery of 3400 mAh.
This is quite straightforward:
| #[+display] | along with a battery of #[+value(phone.battery)] mAh
The problem is, we don’t want to talk about the battery twice. We could just remove the standard battery sentence (The phone has a battery of 3400 mAh), but let’s try to trigger the battery sentence only if we have not talked about the battery before. This is where hasSaid
and recordSaid
come in.
item | #[+display] if !hasSaid('BATTERY') | along with a battery of #[+value(phone.battery)] mAh recordSaid('BATTERY') item if !hasSaid('BATTERY') | #[+value(phone)] has a battery of #[+value(phone.battery)] mAh recordSaid('BATTERY')
The pattern hasSaid/recordSaid pattern, here used twice, is the following: if we haven’t talked about something:
-
We talk about it
-
We record that we talked about it
You must use these built-in mechanisms and not rely on your own variables or hashmaps that you would set along text generation, as RosaeNLG goes back and forth in the text rendering. |
You also need a deleteSaid('BATTERY') in the main loop, as we must talk of the battery for each phone.
|
You should get those nice sentences:+
OnePlus 5T: a great phone! The phone has a battery of 3300 mAh. This phone’s available tints are Black, Red and White. It has a display with a physical size of 6 inches and a screen-to-body ratio of 80.43 %.
OnePlus 5: a great phone! The phone has a battery of 3300 mAh. This phone has a display with a physical size of 5.5 inches and a screen-to-body ratio of 72.93 %. The phone’s available colors are Gold and Gray.
Even more
We have gone through some aspects of NLG with this tutorial.
There are some other features you can explore, for instance:
-
automatic a / an :
a apple
⇒an apple
,a hour
⇒an hour
-
agreement of verbs (especially the irregular ones)
-
agreement of words:
tomato
⇒tomatoes
-
etc.
Final version of the code
tuto.js
const rosaenlgPug = require('rosaenlg');
let phones = [
{
name: 'OnePlus 5T',
colors: ['Black', 'Red', 'White'],
displaySize: 6,
screenRatio: 80.43,
battery: 3300,
},
{
name: 'OnePlus 5',
colors: ['Gold', 'Gray'],
displaySize: 5.5,
screenRatio: 72.93,
battery: 3300,
},
{
name: 'OnePlus 3T',
colors: ['Black', 'Gold', 'Gray'],
displaySize: 5.5,
screenRatio: 73.15,
battery: 3400,
},
];
const res = rosaenlgPug.renderFile('tuto.pug', {
language: 'en_US',
phones: phones,
cache: true,
});
console.log(res);
tuto.pug
//- tag::displayMixin[] mixin display itemz { begin_with_general: 'itHasADisplay', separator:',', last_separator:'and', mix:true } item | a physical size of #[+value(phone.displaySize)] inches item | a screen-to-body ratio of #[+value(phone.screenRatio)] % //- end::displayMixin[] //- tag::colorsMixin[] mixin colors //- tag::colorsMixinNotIt[] | #[+value(phone, {'NOT_IT':true})]'s available #[+syn('colors', 'tints', 'tones')] //- end::colorsMixinNotIt[] | are eachz color in phone.colors with { separator:',', last_separator:'and', end:'.' } | #{color} //- end::colorsMixin[] //- tag::introMixin[] mixin intro synz syn | the #[+value(phone)] is really a fantastic phone. syn | i really love the new #[+value(phone)]. syn | #[+value(phone)] : a great phone ! //- end::introMixin[] //- tag::mixinItHasADisplay[] mixin itHasADisplay | #[+value(phone)] has a display with //- end::mixinItHasADisplay[] mixin phone_chunks itemz {separator: '.', end:'.', mix:true} item | #[+colors] //- tag::hasSaid[] item | #[+display] if !hasSaid('BATTERY') | along with a battery of #[+value(phone.battery)] mAh recordSaid('BATTERY') item if !hasSaid('BATTERY') | #[+value(phone)] has a battery of #[+value(phone.battery)] mAh recordSaid('BATTERY') //- end::hasSaid[] mixin phone_ref(obj, params) | #{obj.name} mixin phone_refexpr(obj, params) synz {mode:'sequence'} syn | the phone syn | this phone syn if !hasFlag(params, 'NOT_IT') | it //- tag::phoneMixin[] mixin phone | #[+intro] . | #[+phone_chunks] //- end::phoneMixin[] //- tag::main[] - let phone; each phoneElt in phones - phone = phoneElt; p - phone.ref = 'phone_ref'; phone.refexpr = 'phone_refexpr'; | #[+phone] deleteSaid('BATTERY') //- end::main[]