Text Simplification using Natural Language Processing
Certain activities in society require precise language, using technical terms. On other occasions, and for historical reasons, ways of communicating and expressing ideas have appeared within specific fields of knowledge that pose a barrier to access for the uninitiated. This is the case, for example, of legal language, administrative language or the language of health.
This situation is worse in the case of certain people with different abilities (those with a low level of education, those with learning difficulties, the deaf, blind and deaf-blind or those for whom Spanish is not their mother tongue) and who are gradually being expelled from the common spaces of social activity. Simplified content also benefits us all, as it reduces the time needed to understand texts.
In the case of the government, its activity has a major impact on our lives, and awareness of the importance of its message reaching the public seems to be gradually increasing. This fact is evidenced by legislative initiatives around the world, such as the “An Act To enhance citizen access to Government information and services by establishing that Government documents issued to the public must be written clearly, and for other purposes.” in the United States and initiatives in the same direction within the European Union (“How to write clearly“, Directorate-General for Translation, European Commission). Or, in Spain, the “Manual de estilo de lenguaje claro: Criterios para la redacción de textos claros en internet” (“Redacted by the government of Aragon”).
The problem of simplification is not easy to solve because, what is simple for some, may not be so for others. We can approximate this idea of what is “simple” in terms of what it allows us to do.
Thus, a text would be simple if when we access it we can:
- Find what we need
- Understand what we find the first time we read or listen to it.
- Use what we find to satisfy our needs.
In writing, different techniques could be used to achieve this, such as:
- Organising the content with the reader in mind
- Speaking as “you”, to reduce the distance from the author.
- Try to avoid using the passive voice as much as possible.
- Use short sentences and paragraphs
- Use common, everyday words
- Use layout resources that make the information easy to follow: text boxes, lists, tables, highlighted headings…
But what to do with all that has already been written?
One wonders whether text could be automatically simplified to make accessible the information that has already been generated.
What would this simplification consist of? As Horacio Saggion explains in Automatic text simplification, “automatic text simplification is the process of transforming a text into a text that, while ideally conveying the same message, is easier to read and understand by a wider audience”.
The good news is: yes, text can be simplified using language technologies. Not all the problems have been solved, and indeed it is a developing field, with open problems, but it can be said that natural language processing techniques make it possible to do things like:
- Grammar correction: detect and correct grammatical errors, such as concordance errors, incorrect sentence structure or improper use of punctuation. This helps ensure that the text is coherent and understandable.
- Language simplification: analysing and rewriting complex sentences or paragraphs in clearer and more accessible language. It can identify complicated structures or technical terminology and suggest simpler ways of expressing the information.
- Improving coherence: helps to improve the coherence of the text by analysing the relationship between sentences and paragraphs. They can detect logical inconsistencies, unnecessary repetition or lack of smooth transitions between ideas. This contributes to greater comprehension and fluency in the text.
- Synthesis and summarising: summarising or synthesising large amounts of text into a more concise and clear form. This is useful to provide an overview of a topic or to present information in a more digestible and easily understood way.
- Detecting and suggesting appropriate vocabulary: PLN systems can identify words or terms that may be complicated or unfamiliar to readers and suggest clearer or more widely understood alternatives. This helps to improve comprehension of the text.
There is still a lot of research to be done and a lot to be solved. At LHF Labs we believe that there is a great opportunity to make communication between the Administration and citizens more accessible, which would benefit everyone and would reduce the social exclusion of people who have problems relating to institutions or understanding how the public affects them, would reduce the disaffection of citizens towards their Administration and would help, among other things, to mitigate errors in administrative procedures.
What do you think: would you like to have a human-administration translator?