New Lines to Paragraphs

WordPress, the software behind this site, uses a function called autop(), originally written by Photo Matt, to convert new lines to proper HTML paragraphs. It didn’t really handle some common HTML structures too well, so tonight I took a stab at hacking it a bit. Code follows.

function wpautop($pee, $br=1) {
   // replace existing line breaks with newlines
   $pee = preg_replace('|<br\s*/>|', "\n", $pee);
   // make other kinds of line ends into unix-style newlines
   $pee = preg_replace("/(\r\n|\r)/", "\n", $pee); 
   // remove duplicate newlines
   $pee = preg_replace("/\n\n+/", "\n\n", $pee); 
   // extract block-tagged content
   $nm = preg_match_all('!<(table|ul|ol|pre|form|blockquote|h[1-6])[^>]*>.*</\1[^>]*>!s', $pee, $blocks);
   // split out non-block-tagged content
   $split_pee = preg_split('!<(table|ul|ol|li|pre|form|blockquote|h[1-6])[^>]*>.*</\1[^>]*>!s', $pee);
   $pee = '';
   foreach ($split_pee as $i => $pee_part)
   {
      // make paragraphs
      $pee_part = preg_replace('/\n?(.+?)(?:\n\s*\n|\z)/s', "\t<p>$1</p>\n", $pee_part); 
      // under certain strange conditions it could create a P of entirely whitespace - remove it 
      $pee_part = preg_replace('|<p>\s*?</p>|', '', $pee_part); 
      // optionally make line breaks
      if ($br) 
      {
         $pee_part = preg_replace('|(?<!<br />)\s*\n|', "<br />\n", $pee_part); 
      }
      // remove unwanted line breaks
      $pee_part = preg_replace('!(</?(?:dl|dd|dt|select|p)[^>]*>)\s*<br />!', "$1", $pee_part);
      $pee_part = preg_replace('!<br />(\s*</?p>)!', '$1', $pee_part);
      // add block-tagged code back in
      $pee = $pee.$pee_part."\n".$blocks[0][$i];
   }   
   // replace ampersand thingies
   $pee = preg_replace('/&([^#])(?![a-z]{1,8};)/', '&#038;$1', $pee);
   
   return $pee; 
}

My tactic was to only make paragraphs outside of common block tags. I see it works on much of my stuff – I’m certain it won’t be too hard to break either. Might be a fun problem to tackle in more depth if I get time.

Categorized in: Web Development

Posted on

Leave a Reply

Your email address will not be published.