Global Hosting Talk

Exporting Html to CSV help needed.

by Feeder on Feb.28, 2010, under Web Hosting Talk

Ok so I’m scraping content on my site.(It’s from nhl 10 for xbox nothing devious) I wanted to have our stats scraped and then exported to CSV twice a day. If I can figure out how to export I’m sure a cron job could be set to run the CSV export twice a day.

What I have thus far is this.

Code:

<html>
 <head>
  <title>PHP Scrape</title>
 </head>
 <body> 
 <?php 
   
    // Read html file to be processed into $data variable
    $data = file_get_contents('http://www.easportsworld.com/en_US/clubs/partial/762A0001/129082/members-list');
   
    // Commented regex to extract contents from <div class="scrolling">contents</div>
    //  where "contents" may contain nested <div>s.
    //  Regex uses PCRE's recursive (?1) sub expression syntax to recurs group 1
    $pattern_long = '{                    # recursive regex to capture contents of "scrolling" DIV
  <div\s+class="scrolling"\s*>          # match the "scrolling" class DIV opening tag
    (                                    # capture "config" DIV contents into $1
      (?:                                # non-cap group for nesting * quantifier
        (?: (?!<div[^>]*>|</div>). )++    # possessively match all non-DIV tag chars
      |                                  # or
        <div[^>]*>(?1)</div>              # recursively match nested <div>xyz</div>
      )*                                  # loop however deep as necessary
    )                                    # end group 1 capture
  </div>                                # match the "scrolling" class DIV closing tag
  }six';  // single-line (dot matches all), ignore case and free spacing modes ON
   
  // short version of same regex
  $pattern_short = '{<div\s+class="scrolling"\s*>((?:(?:(?!<div[^>]*>|</div>).)++|<div[^>]*>(?1)</div>)*)</div>}si';
   
  $matchcount = preg_match_all($pattern_long, $data, $matches);
  // $matchcount = preg_match_all($pattern_short, $data, $matches);
  echo("<pre>\n");
  if ($matchcount > 0) {
      echo("$matchcount matches found.\n");
  //  print_r($matches);
      for($i = 0; $i < $matchcount; $i++) {
          echo("\nMatch #" . ($i + 1) . ":\n");
          echo($matches[1][$i]); // print 1st capture group for match number i
      }
  } else {
      echo('No Matches');
  }
  echo("\n</pre>");
  ?>
  </body>
</html>


The output can be seen on my personal server at. <a href="http://74.117.63.249/test.php” target=”_blank”>http://74.117.63.249/test.php

If anyone could help me create a script that could take this output and export to a CSV I would greatly appreciate it. I am looking to keep a historical track of our stats so the CSV will be appended each time the Cron job runs.


Comments are closed.

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...