Here's another approach.
I try to simplify the number of items/content to help control the fitting algorithm, so I break things up a little differently.

I might use a full width card design for the initial banner, make it 2 rows high. The card image for this banner would be a large PNG with transparency -- showing your branding/opaque art in the upper left area shown above. The PNG would be 3 times wider and twice as tall as your brand/opaque area, and would fill the card design.
Then you could use a banner with no URL action to fill that gray space, load a transparent png for the banner card image.
The rest of the cards are pretty standard.
Then your collection would be setup like this:
Banner
Dummy banner
Content 1 - calls 3x2 card
Content 2 - calls 1x2 card
Content 3 - calls 2x2 card
Content 4 - calls 2x2 card
Content 5 - calls 1x2 card
Content 6...
Additional thought:
The banners may be set to have no action, this might leave a large non-active region on the screen and you might not like this.
The large banner might be a good example of no action for the banner, but the gray box in my diagram could be used differently. An alternate treatment for the banner and 3x2 card would be to combine this and use a single card instead for this (apparently) important, lead item. One, full width card with image right could be used. This card could have transparency in the non-image area to show the collection background.
This approach might further simplify the curation of the collection content, and of course should be driven by metadata and mapping rules.
Just a different two cents of input. Hope it helps.