Sun, 21 Dec 2003

New Lines to Paragraphs

Filed under:  — cyberhobo at 08:07 pm

WordPress, the software behind this site, uses a function called autop(), originally written by Photo Matt, to convert new lines to proper HTML paragraphs. It didn’t really handle some common HTML structures too well, so tonight I took a stab at hacking it a bit. Code follows.

function wpautop($pee, $br=1) {
   // replace existing line breaks with newlines
   $pee = preg_replace('|<br\s*/>|', "\n", $pee);
   // make other kinds of line ends into unix-style newlines
   $pee = preg_replace("/(\r\n|\r)/", "\n", $pee); 
   // remove duplicate newlines
   $pee = preg_replace("/\n\n+/", "\n\n", $pee); 
   // extract block-tagged content
   $nm = preg_match_all('!<(table|ul|ol|pre|form|blockquote|h[1-6])[^>]*>.*</\1[^>]*>!s', $pee, $blocks);
   // split out non-block-tagged content
   $split_pee = preg_split('!<(table|ul|ol|li|pre|form|blockquote|h[1-6])[^>]*>.*</\1[^>]*>!s', $pee);
   $pee = '';
   foreach ($split_pee as $i => $pee_part)
   {
      // make paragraphs
      $pee_part = preg_replace('/\n?(.+?)(?:\n\s*\n|\z)/s', "\t<p>$1</p>\n", $pee_part); 
      // under certain strange conditions it could create a P of entirely whitespace - remove it 
      $pee_part = preg_replace('|<p>\s*?</p>|', '', $pee_part); 
      // optionally make line breaks
      if ($br) 
      {
         $pee_part = preg_replace('|(?<!<br />)\s*\n|', "<br />\n", $pee_part); 
      }
      // remove unwanted line breaks
      $pee_part = preg_replace('!(</?(?:dl|dd|dt|select|p)[^>]*>)\s*<br />!', "$1", $pee_part);
      $pee_part = preg_replace('!<br />(\s*</?p>)!', '$1', $pee_part);
      // add block-tagged code back in
      $pee = $pee.$pee_part."\n".$blocks[0][$i];
   }   
   // replace ampersand thingies
   $pee = preg_replace('/&([^#])(?![a-z]{1,8};)/', '&#038;$1', $pee);
   
   return $pee; 
}

My tactic was to only make paragraphs outside of common block tags. I see it works on much of my stuff – I’m certain it won’t be too hard to break either. Might be a fun problem to tackle in more depth if I get time.


9 Comments »

  1. Interesting. I’ll experiment a bit with this and run it against the test suite and if it works well this or something like this may make 1.0.

    Comment by Matt — Wed, 24 Dec 2003 @ 02:38 am

  2. I’d be interested to know how it tests too. I tried to make use of your work to actually make the p tags so I wouldn’t have to think through that problem too – but I can’t be sure I didn’t break your logic somewhere.

    It could be done a bit faster in PHP 4.3.0, by adding a flag to preg_split and eliminating the preg_match_all.

    Comment by cyberhobo — Wed, 24 Dec 2003 @ 08:51 am

  3. I tested this a little, it does a great job of preserving the pre tags, which is my #1 complain of the current autop code. The only thing I see it not doing is adding br and p tags within li tags, etc.

    If you add that, let me know and I’ll do more testing for you.

    Comment by alex — Thu, 15 Jan 2004 @ 02:52 pm

  4. I’ve since found that this code will sometimes run paragraphs together. I don’t have time to debug it now. If you add a blank line between everything you want as a paragraph, it seems to always work.

    Comment by cyberhobo — Mon, 08 Mar 2004 @ 11:29 am

  5. This rocks… I was just having this problem last night.

    Thanks

    Comment by jfred — Tue, 11 May 2004 @ 07:35 am

  6. You’re code seems to have magic_quotes on adding unneeded escaping of “s. Other than that, time to test this out. Thanks. :)

    Comment by cypherjf — Wed, 29 Jun 2005 @ 07:35 am

  7. You’re right – if you add a get_magic_quotes() test, I’d like to see it.

    I haven’t actually been using this code in recent releases of WordPress (1.5.1 here), because it seems to do a better job now. I’m curious if there is continued interest in this code what the motivation is?

    Comment by cyberhobo — Wed, 29 Jun 2005 @ 12:37 pm

  8. Is there a way to diable this *feature* altogether from the management GUI in 1.5.1 or will I have to comment the function out?

    I don’t need anybody correcting my tagging behind me. I use extensive tagging in my posts and there no way one simple function can handle it unless it is a XHTML parser itself.

    Comment by Ahmad gharbeia — Thu, 22 Dec 2005 @ 08:40 am

  9. I’ve read that calling remove_filter(‘the_content’, ‘wpautop’) will do the trick. You could do it before “The Loop” in your template, or make a little plugin that calls it…

    Comment by cyberhobo — Thu, 22 Dec 2005 @ 09:11 am

RSS feed for comments on this post. TrackBack URI

Leave a comment