New Lines to Paragraphs
WordPress, the software behind this site, uses a function called autop(), originally written by Photo Matt, to convert new lines to proper HTML paragraphs. It didn’t really handle some common HTML structures too well, so tonight I took a stab at hacking it a bit. Code follows.
function wpautop($pee, $br=1) {
// replace existing line breaks with newlines
$pee = preg_replace('|<br\s*/>|', "\n", $pee);
// make other kinds of line ends into unix-style newlines
$pee = preg_replace("/(\r\n|\r)/", "\n", $pee);
// remove duplicate newlines
$pee = preg_replace("/\n\n+/", "\n\n", $pee);
// extract block-tagged content
$nm = preg_match_all('!<(table|ul|ol|pre|form|blockquote|h[1-6])[^>]*>.*</\1[^>]*>!s', $pee, $blocks);
// split out non-block-tagged content
$split_pee = preg_split('!<(table|ul|ol|li|pre|form|blockquote|h[1-6])[^>]*>.*</\1[^>]*>!s', $pee);
$pee = '';
foreach ($split_pee as $i => $pee_part)
{
// make paragraphs
$pee_part = preg_replace('/\n?(.+?)(?:\n\s*\n|\z)/s', "\t<p>$1</p>\n", $pee_part);
// under certain strange conditions it could create a P of entirely whitespace - remove it
$pee_part = preg_replace('|<p>\s*?</p>|', '', $pee_part);
// optionally make line breaks
if ($br)
{
$pee_part = preg_replace('|(?<!<br />)\s*\n|', "<br />\n", $pee_part);
}
// remove unwanted line breaks
$pee_part = preg_replace('!(</?(?:dl|dd|dt|select|p)[^>]*>)\s*<br />!', "$1", $pee_part);
$pee_part = preg_replace('!<br />(\s*</?p>)!', '$1', $pee_part);
// add block-tagged code back in
$pee = $pee.$pee_part."\n".$blocks[0][$i];
}
// replace ampersand thingies
$pee = preg_replace('/&([^#])(?![a-z]{1,8};)/', '&$1', $pee);
return $pee;
}
My tactic was to only make paragraphs outside of common block tags. I see it works on much of my stuff – I’m certain it won’t be too hard to break either. Might be a fun problem to tackle in more depth if I get time.
9 Comments »
RSS feed for comments on this post. TrackBack URI


Interesting. I’ll experiment a bit with this and run it against the test suite and if it works well this or something like this may make 1.0.
Comment by Matt — Wed, 24 Dec 2003 @ 02:38 am
I’d be interested to know how it tests too. I tried to make use of your work to actually make the p tags so I wouldn’t have to think through that problem too – but I can’t be sure I didn’t break your logic somewhere.
It could be done a bit faster in PHP 4.3.0, by adding a flag to preg_split and eliminating the preg_match_all.
Comment by cyberhobo — Wed, 24 Dec 2003 @ 08:51 am
I tested this a little, it does a great job of preserving the pre tags, which is my #1 complain of the current autop code. The only thing I see it not doing is adding br and p tags within li tags, etc.
If you add that, let me know and I’ll do more testing for you.
Comment by alex — Thu, 15 Jan 2004 @ 02:52 pm
I’ve since found that this code will sometimes run paragraphs together. I don’t have time to debug it now. If you add a blank line between everything you want as a paragraph, it seems to always work.
Comment by cyberhobo — Mon, 08 Mar 2004 @ 11:29 am
This rocks… I was just having this problem last night.
Thanks
Comment by jfred — Tue, 11 May 2004 @ 07:35 am
You’re code seems to have magic_quotes on adding unneeded escaping of “s. Other than that, time to test this out. Thanks.
Comment by cypherjf — Wed, 29 Jun 2005 @ 07:35 am
You’re right – if you add a get_magic_quotes() test, I’d like to see it.
I haven’t actually been using this code in recent releases of WordPress (1.5.1 here), because it seems to do a better job now. I’m curious if there is continued interest in this code what the motivation is?
Comment by cyberhobo — Wed, 29 Jun 2005 @ 12:37 pm
Is there a way to diable this *feature* altogether from the management GUI in 1.5.1 or will I have to comment the function out?
I don’t need anybody correcting my tagging behind me. I use extensive tagging in my posts and there no way one simple function can handle it unless it is a XHTML parser itself.
Comment by Ahmad gharbeia — Thu, 22 Dec 2005 @ 08:40 am
I’ve read that calling remove_filter(‘the_content’, ‘wpautop’) will do the trick. You could do it before “The Loop” in your template, or make a little plugin that calls it…
Comment by cyberhobo — Thu, 22 Dec 2005 @ 09:11 am