Exclude first 5 posts of specific categories in the main loop

I read this article about how to set an offset for one specific category in the loop. I was wondering about if this was possible for more than just one category.

My problem with the code suggested in the article linked above is that I am using a single widget to show five posts from a category I can set in the widget options by ID.
I cannot wrap my head around how to collect the post_IDs of the posts already looped through.

Situation:

I am using custom widgets on my blog page to show the 5 latest posts in two specified categories – let’s call them categoryA and categoryB.

Answer to @Pieter Goosen’s question in the comments: Yes, I can specify the
category by manually entering an ID in the widget settings and it’s
saved in the DB.

Below these widgets the blog page loops through all my posts and shows them.

Effect:

The main loop on the blog page (the one below the widgets) displays the posts from categoryA and categoryB again. They appear double because the widgets above already looped through the categories A and B.

Question:

Is it possible to exclude the first five posts from categoryA and categoryB in the main loop? Could it be done with pre_get_posts? Using get_query_var? How?

EDIT

My own approaches

FIRST TRY

I modified my widget codes like this:

global $do_not_duplicate;
while($featured_posts->have_posts()): $featured_posts->the_post(); 
    $do_not_duplicate[] = get_the_ID();

As my widgets are being registered in my functions.php file, I added these lines to it:

global $do_not_duplicate;
$do_not_duplicate = array();
include('widgets/widgets.php');

Finally on the index.php, I added this line to the main loop:

if(have_posts()) :
    global $do_not_duplicate;
    while(have_posts()): the_post();
        if( in_array( $post->ID, $do_not_duplicate ) ) continue;
        get_template_part('content');
    endwhile;
endif;

Please note: This basically works but does not exactly remove the posts from the main query but skip them. This means the default posts_per_page setting won’t be fulfilled and this won’t work with Jetpack infinite scroll. Also, using globals is not the way to go.

SECOND TRY

I adapted @birgire suggestion and modified it a little. I thought about using the suggested pre_get_posts filter in combination with a foreach-loop for the categories used by the widgets in order to gather the IDs of the posts queried by them.
Here’s my snippet:

add_action('pre_get_posts', 'modified_mainloop', 10);
function modified_mainloop($query) {
    if(!is_admin() && $query->is_home() && $query->is_main_query()) {
        $cats_to_exclude = get_categories('include=3,4,9');
        foreach($cats_to_exclude as $category) {
            if($category->term_id == '9') {
                $num_posts_to_exclude = 3;
            } else {
                $num_posts_to_exclude = 5;
            }
            $posts_to_exclude = get_posts(array(
                'posts_per_page' => $num_posts_to_exclude,
                'category__in'   => array($category->term_id),
                'fields'         => 'ids'
            ));

        }
        $query->set('post__not_in', (array) $posts_to_exclude);

    }
}

Please note: Idea is OK, but code does not work.

Solutions Collecting From Web of "Exclude first 5 posts of specific categories in the main loop"

Here is a concept that you can try out to exclude 5 posts per category each from category 3 and 5 where posts are excluded that belongs to category 9 and either of the before mentioned categories.

Here is the function: (CAVEAT: Requires PHP 5.4+ as I used new short array syntax. Change as needed)

/**
 * function get_sticky_category_posts()
 *
 * Get an array of post ids according to arguments given.
 *
 * @author Pieter Goosen
 * @see https://wordpress.stackexchange.com/q/187477/31545
 *
 * @param array $args
 */
function get_sticky_category_posts( $args = [] )
{
    /*
     * If the array is empty or not valid, return null
     */
    if ( empty( $args ) || !is_array( $args ) )
        return null;

    $ids = '';

    foreach ( $args as $k=>$v ) {

        /*
         * Check that $v['taxonomy'], $v['included_terms'] and $v['posts_per_page'] are all set. If not, return null
         */ 
        if (    !isset( $v['taxonomy'] ) 
             || !isset( $v['included_terms'] )
             || !isset( $v['posts_per_page'] )
        )
        return null;

        /*
         * Sanitize and validate user input
         */
        $included_terms = filter_var( $v['included_terms'], FILTER_VALIDATE_INT, ['flags'  => FILTER_FORCE_ARRAY] );
        $taxonomy       = filter_var( $v['taxonomy'], FILTER_SANITIZE_STRING );

        /*
         * Create tax_query according to whether $v['excluded_terms'] is set
         */
        if ( !isset( $v['excluded_terms'] ) ) {
            $tax_query = [
                [
                    'taxonomy'         => $taxonomy,
                    'terms'            => $included_terms,
                    'include_children' => false,
                ],
            ];
        } else {
            $tax_query = [
                [
                    'taxonomy'         => $taxonomy,
                    'terms'            => $included_terms,
                    'include_children' => false,
                ],
                [
                    'taxonomy'         => $taxonomy,
                    'terms'            => filter_var( $v['excluded_terms'], FILTER_VALIDATE_INT, ['flags'  => FILTER_FORCE_ARRAY] ),
                    'include_children' => false,
                    'operator'         => 'NOT IN'
                ],
            ];
        } // endif ( !$v['excluded_term'] ) statement

        /*
         * Use get_posts to get an array of post ids to exclude
         */
        $posts_to_exclude = get_posts(
            [
                'posts_per_page' => filter_var( $v['posts_per_page'], FILTER_VALIDATE_INT ),
                'fields'         => 'ids',
                'tax_query'      => $tax_query
            ]
        );
        if ( $posts_to_exclude ) {

            /*
             * Concatenate all ids into a string using implode
             */
            $ids .= implode( ' ', $posts_to_exclude );
            $ids .= ' ';
        } //endif ( $posts_to_exclude )

    } //end foreach

    /*
     * If we don't have any ids, thus empty string, return null
     */
    if ( !$ids )
        return null;

    /*
     * Create a single flat array of post ids. Use rtrim to remove trailing white space
     */
    return explode( ' ', rtrim( $ids ) );
}

Her is how it works:

  • You need to pass a multi-dimensional array to the function’s arguments in the following format

    $args = [
        0 => ['posts_per_page' => 5, 'taxonomy' => 'category', 'included_terms' => 3, 'excluded_terms' => 9],
        1 => ['posts_per_page' => 5, 'taxonomy' => 'category', 'included_terms' => 4, 'excluded_terms' => 9],
    ];
    $a = get_sticky_category_posts( $args );
    
  • There are four allowed parameters, of which you have to set the posts_per_page, taxonomy and included_terms (which needs to be an integer or an array of term ids) parameters, otherwise the function returns null. The other allowed parameter (which is not required and can be omitted) is excluded_terms which is also an integer or an array of term ids and is the terms in which a posts should not be in. The above will return the following (ie means the following)

    • 5 post ids from the taxonomy term 3 from the taxonomy category, but posts that are in term 3 and 9 should not be returned

    • 5 post ids from the taxonomy term 4 from the taxonomy category, but posts that are in term 4 and 9 should not be returned

    • The returned array from the function (var_dump( $a )) looks something like this

      array(10) {
        [0]=>
        string(3) "159"
        [1]=>
        string(3) "149"
        [2]=>
        string(3) "129"
        [3]=>
        string(3) "126"
        [4]=>
        string(3) "119"
        [5]=>
        string(2) "76"
        [6]=>
        string(3) "528"
        [7]=>
        string(3) "394"
        [8]=>
        string(3) "147"
        [9]=>
        string(2) "97"
      }
      

This array can now be passed to pre_get_posts to the post__not_in parameter to exclude the posts from the function. You can also use this array of ids in any other place, maybe return them to a custom query or other purposes like sticky posts on other pages except the home page as used in this answer 🙂

Here is a use case with pre_get_posts. This will remove the required posts from the main query. (Requires PHP 5.4+ due to the syntax in the function and the use of closures (closures are available PHP 5.3+)

add_action( 'pre_get_posts', function ( $q )
{
    if (    !is_admin() // Not necessary for home page, included it to be used on any other template like category pages
         && $q->is_main_query() // Make sure this is the main query
         && $q->is_home() // Targets the home page only
    ) {
        if ( function_exists( 'get_sticky_category_posts' ) ) {

            $args = [
                0 => ['posts_per_page' => 5, 'taxonomy' => 'category', 'included_terms' => 3, 'excluded_terms' => 9],
                1 => ['posts_per_page' => 5, 'taxonomy' => 'category', 'included_terms' => 4, 'excluded_terms' => 9],
            ];
            $sticky_cat_posts = get_sticky_category_posts( $args );

            if ( !empty( $sticky_cat_posts ) ) {
                $q->set( 'post__not_in', $sticky_cat_posts );
            }   
        }
    }
}); 

You might try this kind of pre_get_posts hook approach (untested):

add_action( 'pre_get_posts', function( $q )
{
    $exclude_cats = [ 12, 34 ]; // <-- Edit these category ids!

    if(    ! is_admin() 
        && $q->is_main_query()
        && ! is_singular()
        && ! is_category( $exclude_cats ) 
    )
    {
        $exclude_post_ids = get_posts( 
            [
                'fields'         => 'ids',
                'posts_per_page' => 5,
                'category__in'   => $exclude_cats,
            ]
        );
        $q->set( 'post__not_in', (array) $exclude_post_ids ); 
    }
} );

where you might have to adjust this further to your needs.

You also might want to use category slugs instead, but then you would have to setup a proper tax_query, since category__in only supports an array of category ids.