Scrape A Facebook Page And Turn It Into WordPress Posts Using PHP

A friend of mine, who runs the facebook page borkborkiamdoggo, approached me a few months ago wanting to set up a WordPress site that would consist of posts from the Facebook page. Each Facebook post with an image, should turn into a post on the WordPress site, helping to engage SEO and visibility to the Facebook page. The site should also be self sufficient and update itself with any new future Facebook posts. Sounds like fun! (the site can be found here: borkborkiamdoggo.com)

Getting Started

The first step is to register a user and application on the Facebook developers site. This provides you with a few critical values that you will need later in the PHP script: App ID, App secret key, and App version.

This allows you to call the Facebook API in order to get any/all post information from the page programmatically.

Setting Up The WordPress Scheduler

In WordPress, you can set up a PHP script to run at scheduled intervals using the hook wp_schedule_event. In my case, I want a scheduled script to check the Facebook page for new posts every hour, and if there are posts that exist on Facebook but not WordPress, turn those into WordPress posts.

WordPress theme’s functions.php file:

require_once("inc/fb-scraper.php");

add_action( 'wp', 'doggo_cron_scrape_from_fb_activation' );
add_action( 'doggo_cron_scrape_from_fb_event', 'StartFacebookUploader' );

function doggo_cron_scrape_from_fb_activation() {
    if ( !wp_next_scheduled( 'doggo_cron_scrape_from_fb_event' ) ) {
        wp_schedule_event( time(), 'hourly', 'doggo_cron_scrape_from_fb_event' );
    }
}

The Code Explained

The scheduled event defined above calls the method StartFacebookUploader() to do the heavy lifting. This method utilizes Facebook’s PHP API client to fetch 100 posts from the page, loop through each, and add it as a WordPress post if 1) it hasn’t been added before, and 2) was posted by any predefined user.

function StartFacebookUploader()
{
    global $fb, $limitPages, $acceptedFbIds, $wpdb;

    $appId = '[your app id]';
    $appSecret = '[your app secret]';
    $fb = new \Facebook\Facebook([
      'app_id' => $appId,
      'app_secret' => $appSecret,
      'default_graph_version' => 'v2.8',
      'default_access_token' => $appId.'|'.$appSecret, // optional
    ]);

    //start looping through the posts
    $limitPages = 3;
    $acceptedFbIds = array([add any user IDs here]);
    GetFbPosts('/borkborkiamdoggo/feed?fields=id,full_picture,message,created_time,link,from&limit=100');
}

The GetFbPosts() method then takes over for the set of Facebook posts currently being processed. It fetches the Facebook information, and conditionally wires up the WordPress hooks necessary in order to turn each FB post into a WP post.

Ensure that the FB post is by one of the FB page admins and not by a user posting to the page:

//if not posted by us
if (!in_array($post["from"]["id"], $acceptedFbIds))
{
	continue;
}

Also make sure that this FB post has a picture associated with it and is not just a text post:

//not good data on this post
if (!array_key_exists('full_picture', $post) || !array_key_exists('message', $post))
{
	continue;
}

I don’t want the scheduler to constantly add the same post over and over, so I am storing the FB post id value in metadata to check for duplicity:

//check if exists on site already
$postExists = $wpdb->get_var( "SELECT COUNT(*) FROM $wpdb->postmeta where meta_key = 'fb_id' && meta_value = '".$post["id"]."'" );
if ($postExists > 0)
{
	continue;
}

The last tricky part is uploading the FB image as the WP post thumbnail/featured image. This makes use of the WordPress hook media_handle_sideload to download the image from FB and then attach it as the post’s featured image:

$attachmentId = media_handle_sideload( $file_array, $postId );
if ( is_wp_error( $id ) ) {
	@unlink( $file_array['tmp_name'] );
	print_r($attachmentId->get_error_message());
	return $attachmentId;
}

set_post_thumbnail($postId, $attachmentId);

Full Code

function GetFbPosts($url, $depth = 0)
{
    global $fb, $limitPages, $acceptedFbIds, $wpdb;
    
    $responseRaw = $fb->get($url);
        
    $responseData = $responseRaw->getDecodedBody();
    
    //flip this array so the newest post is added last, just like FB shows them
    $responseData[data] = array_reverse($responseData[data]);
    
    foreach($responseData[data] as $post)
    {
        //if not posted by us
        if (!in_array($post["from"]["id"], $acceptedFbIds))
        {
            continue;
        }

        //not good data on this post
        if (!array_key_exists('full_picture', $post) || !array_key_exists('message', $post))
        {
            continue;
        }

        //check if exists on site already
        $postExists = $wpdb->get_var( "SELECT COUNT(*) FROM $wpdb->postmeta where meta_key = 'fb_id' && meta_value = '".$post["id"]."'" );
        if ($postExists > 0)
        {
            continue;
        }
        
        //massage
        $friendlyTitle = preg_replace("/[^a-z0-9 ]+/i", "", $post["message"]);
        $friendlyTitle = strtolower(str_replace(" ", "-", $friendlyTitle)).".jpg";

        // Create post object
        $my_post = array(
          'post_title'    => wp_strip_all_tags( $post["message"] ),
          'post_content'  => $post["message"],
          'post_status'   => 'publish',
          'post_author'   => 3,
          //'post_category' => array( 8,39 )
        );

        // Insert the post into the database
        $postId = wp_insert_post( $my_post );

        // post post processing
        wp_set_object_terms( $postId, 3, 'category');
        $tags  = array(4, 5, 6);
        wp_set_object_terms( $postId, $tags, 'post_tag');

        //custom fields on input
        foreach ($post as $i => $value)
        {
            if (is_array($value))
                continue;
            
            add_post_meta($postId, 'fb_'.$i, $value, true);
        }
        
        add_post_meta($postId, 'fb_from_id', $post["from"]["id"], true);

        $url = $post["full_picture"]; 
        $tmp = download_url( $url );
        //print_r($tmp);
        $file_array = array(
            'name' => $friendlyTitle,
            'tmp_name' => $tmp
        );
//        print_r($file_array);

        if ( is_wp_error( $tmp ) ) {
            @unlink( $file_array[ 'tmp_name' ] );
            print_r($tmp->get_error_message());
            return $tmp;
        }

        $attachmentId = media_handle_sideload( $file_array, $postId ); //https://developer.wordpress.org/reference/functions/media_handle_sideload/
        if ( is_wp_error( $id ) ) {
            @unlink( $file_array['tmp_name'] );
            print_r($attachmentId->get_error_message());
            return $attachmentId;
        }
        //$value = wp_get_attachment_url( $id );

        set_post_thumbnail($postId, $attachmentId);
        
    }
    
    $pagingString = $responseData['paging']['next'];
    $urlParts = parse_url($pagingString);
    
    if ($depth <= $limitPages)
    {
        GetFbPosts('/borkborkiamdoggo/feed?'.$urlParts['query'], ++$depth);
    }
}

Leave a Reply

Your email address will not be published. Required fields are marked *