Repository with sources and generator of https://larlet.fr/david/ https://larlet.fr/david/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

widont.py 2.7KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
  1. import re
  2. def widont(text):
  3. """Replaces the space between the last two words in a string with `` ``
  4. Works in these block tags ``(h1-h6, p, li, dd, dt)`` and also accounts for
  5. potential closing inline elements ``a, em, strong, span, b, i, mark``
  6. Extracted from:
  7. https://github.com/mintchaos/typogrify/blob/
  8. 20f693cbbb232ebc27733d9f1721a2cf1e7b25e3/typogrify/filters.py#L315-L368
  9. >>> widont('A very simple test')
  10. 'A very simple test'
  11. Single word items shouldn't be changed
  12. >>> widont('Test')
  13. 'Test'
  14. >>> widont(' Test')
  15. ' Test'
  16. >>> widont('<ul><li>Test</p></li><ul>')
  17. '<ul><li>Test</p></li><ul>'
  18. >>> widont('<ul><li> Test</p></li><ul>')
  19. '<ul><li> Test</p></li><ul>'
  20. >>> widont('<p>In a couple of paragraphs</p><p>paragraph two</p>')
  21. '<p>In a couple of&nbsp;paragraphs</p><p>paragraph&nbsp;two</p>'
  22. >>> widont('<h1><a href="#">In a link inside a heading</i> </a></h1>')
  23. '<h1><a href="#">In a link inside a&nbsp;heading</i> </a></h1>'
  24. >>> widont('<h1><a href="#">In a link</a> followed by other text</h1>')
  25. '<h1><a href="#">In a link</a> followed by other&nbsp;text</h1>'
  26. Empty HTMLs shouldn't error
  27. >>> widont('<h1><a href="#"></a></h1>')
  28. '<h1><a href="#"></a></h1>'
  29. >>> widont('<div>Divs get no love!</div>')
  30. '<div>Divs get no love!</div>'
  31. >>> widont('<pre>Neither do PREs</pre>')
  32. '<pre>Neither do PREs</pre>'
  33. >>> widont('<div><p>But divs with paragraphs do!</p></div>')
  34. '<div><p>But divs with paragraphs&nbsp;do!</p></div>'
  35. Adaptations:
  36. * add the mark element as a potential inline
  37. * avoid insertion of a nbsp if the sentence is a single word
  38. >>> widont("<p>Avec <mark>mon ami Marc.</mark></p>")
  39. '<p>Avec <mark>mon ami&nbsp;Marc.</mark></p>'
  40. >>> widont("Vraiment. Bien.")
  41. 'Vraiment. Bien.'
  42. """
  43. widont_finder = re.compile(
  44. r"""((?:</?(?:a|em|span|strong|i|b|mark)[^>]*>)|[^<>\s\.]) # must be preceded by an approved inline opening or closing tag or a nontag/nonspace
  45. \s+ # the space to replace
  46. ([^<>\s]+ # must be followed by non-tag non-space characters
  47. \s* # optional white space!
  48. (</(a|em|span|strong|i|b|mark)>\s*)* # optional closing inline tags with optional white space after each
  49. ((</(p|h[1-6]|li|dt|dd)>)|$)) # end with a closing p, h1-6, li or the end of the string
  50. """,
  51. re.VERBOSE,
  52. )
  53. output = widont_finder.sub(r"\1&nbsp;\2", text)
  54. return output