Synonyms
Use synonyms to vary your output is a good practice. For some background about the approaches read Ehud Reiter’s article about synonyms in NLG.
You can output simple synonyms (words etc.) with syn
and complex ones (which use mixins or other synonyms etc.) using synz > syn
structure.
The algorithm that chooses the synonym to output works like that:
-
It is random based (nothing fancy but efficient), or sequence based (one after the other), depending on the mode.
-
It eliminates empty alternatives.
-
You can ask the algorithm to globally choose the best alternative.
You should not use your own random numbers in your mixins, because it will break RosaeNLG’s ability to predict the next outputs. More about RosaeNLG and random numbers.
Basic synonyms using syn
The syn
mixin is perfect for very basic synonyms.
Arguments can be single words, multiple words or anything, but not mixins. Please note that the argument is not an array: the mixin takes a variable number of arguments.
With this mixin the choice is always random. Use synz > syn
structure if you want more options like sequential output.
The syn_fct
function
The syn_fct
is not a mixin but a standard JavaScript function. Its argument is an array.
It is useful when you want random arguments in some other mixins. Remind, do not use you own random function.
Will randomly output the apple or an apple:
Complex synonyms with synz > syn
structure
First example
When each synonymic alternative is complex text using mixins, syn
doesn’t fit. You have to use the synz > syn
structure.
You can put sentences or words or whatever you want in each syn
.
Note on empty alternatives
RosaeNLG will always try to find a non-empty alternative in a synz > syn
structure. When it triggers an empty one, it will go back and try to find a new one.
Choose randomly but try not to repeat
Pure random mode is default but has a drawback: the same alternative can trigger again, leading to non harmonious text.
To avoid that use mode:'once'
. It will trigger each alternative randomly, but will try not to repeat
the same alternative. When all alternatives have been triggered, it will reset.
In general you should favor once instead of default random .
|
Force a specific synonym to trigger
To force a specific synonym to trigger, use synz {force:3}
(to trigger the 3rd one):
This is useful while developping. |
if the forced alternative is empty, it will not trigger it (and will trigger a non empty one). |
Weights of each alternative
If you want to favor an alternative more than the other, you can put a higher weight on the one you prefer:
The the one I prefer option will be triggered much more often (probability is 3/5).
weight
must be a strictly positive integer.
It is generally a bad practice to use weight and mode: 'once' in the same structure: once an alternative has been triggered, it will be avoided whatever its weight.
|
Choose each synonym alternative one after the other
Sometimes random is not the right way. You might prefer to trigger the first alternative, then the second one, etc. Put the mode
parameter to sequence
to do that.
When called 5 times, will output: first second third fourth first
weight
parameter is meaningless in sequence
mode.
Global synonym mode
Possible values for more are:
-
random
(default) -
sequence
-
once
By default, the synonyms are choosen randomly (random
), and you can locally change this behavior using sequence
or once
mode. But you can change the behavior globally using defaultSynoMode
.
When you have changed defaultSynoMode
, you can still change the default behavior locally using another mode.
using once as defaultSynoMode and setting sequence locally is a popular setting.
|
Choosing the best alternative globally with choosebest
Introduction and first example
The standard synonym algorithm is and should be good enough for most usages. When there are non elegant repetitions in the generated texts, the first reflex should be to do local fixes with using {mode:'sequence'} .
|
choosebest
works the following:
-
it generates dozens of texts on a section, whatever its size or what is contains
-
it chooses the textual alternative that contains the least close repetitions
For instance, if stone gem and jewel are synonyms, ranking from best to worst: stone gem jewel / stone gem stone / stone stone gem / stone stone stone.
Let’s take a first example:
eachz i in [1,2,3] with {separator: ' '} synz syn | stone syn | jewel syn | gem
If you run that, you will get randomly gem jewel jewel or stone gem stone etc. - sometimes gem jevel stone if you are lucky.
Let’s use choosebest
:
It will generate a 100 times the same text and take the best alternative. Unless you are very unlucky, you are sure to get gem jevel stone (still in a random order).
Usage
You can put choosebest
anywhere to optimize synonyms in a section of text but you should use it at a paragraph level.
choosebest has a heavy impact on performance as the texts are generated multiple times. Use it cautiously only when required.
|
you cannot imbricate choosebest structures. But in a same template you can use multiple choosebest structures one after the other, for instance on each paragraph.
|
Advanced options
How it works
The scoring algorithm works like this:
-
single words are extracted thanks to a tokenizer
wink-tokenizer
, and lowercased -
stopwords are removed (you can customize the list of stopwords)
-
when the same word appears multiples times, it raises the score depending on the distance of the two occurrences (if the occurrences are closes it raises the score a lot).
Max attempt
To indicate the maximum attempts to find the best alternative:
-
among
local parameter:choosebest {among:20}
-
defaultAmong
global parameter:rosaenlgPug.render(myTemplate, { language: 'en_US', defaultAmong:10 })
-
default is 5
Stop words customization
You can customize locally the list of stop words with:
-
stop_words_add
string[]: list of stopwords to add to the standard stopwords list (NB:stop_words_add
will be automatically lowercased) -
stop_words_remove
string[]: list of stopwords to remove to the standard stopwords list -
stop_words_override
string[]: replaces the standard stopword list (which is per language)
will output newStopWord newStopWord AAA newStopWord BBB.
choosebest param synz syn | thus thus thus AAA BBB syn | AAA AAA
will output AAA AAA, because thus is not considered as a stop word no more.
The standard list of stop words per language is here. |
Force identical elements
Sometimes you want to say that 2 or more words should be considered as identical in terms of synonyms even if they are not. Often for plurals: diamonds diamond, as there is no integrated lemmatizer, or for similar words like phone cellphone smartphone.
Use identicals
string[][] with list of words that should be considered as beeing identical:
will output diamonds and pearl systematically.