You posted this in the Python space. Are you looking for a script to do this? In case the simple way as Dan mentioned, doing a manual spatial join does not work for you, there is always a scripting solution. However, that will take a some time to complete. How much coding experience do you have and what have you tried so far?
I assume that you want to obtain statistical information from multiple parcels for each region and add that data to the regions, right?
How large are your datasets? How many features do you have in both featureclasses?
As a general outline of what could work I would be thinking along the lines of:
If the datasets are large, it would be better to use select by location for each region in order to optimize the performance. This steps above only apply when the regions do not overlap and when a parcel centroid will only fall inside a single region, otherwise a parcel will be taken into account multiple times.
It seems strange that there isn't a join option opposite to "HAVE_THEIR_CENTER_IN —The features in the join features will be matched if a target feature's center falls within them"
that is "HAVE_THEIR_CENTER_IN —The features in the join features will be matched if the join features center falls within the target feature."
given the size... I would consider tiling the data first, this is a classic case where tiling would be easy since it should be fairly obvious which polygon belongs to what... even with overlap if needed. The mere size of the files will bog processing down, so the workflows suggested should account for most situations and you can deal with the remaining separately.
As you already mentioned, having 2M elements in a dictionary will be very slow, since a dictionary takes about 3x the size as overhead.
In addition to what Dan Patterson already mentioned, if there is any attribute that can be identified in both datasets (like a neighborhood or something a little bigger) you could base the "tiles" on those areas.If there is no common attribute then tile the area like Dan mentioned (perhaps with a little overlap just in case) and process those. Doing 200k select by locations on a dataset of 2M features will really take a lot of time and that is not the way to go.
Is it possible to post a part of the data (say 1 region and the corresponding parcels for that region)?
Amen to what David said. Why is there no "have their center in". Is there a workaround? Sure. But I have a better workaround: have ESRI actually make logical design decisions, and better software.
You may want to have a look at the Geoprocessing Tool arcpy.TabulateIntersection_analysis, probably followed by Sort and SummaryStatistics to select best match (first row).
[Edit: "A's centroids are always within B (but B's centroids are not always in A)" is wrong and should state "B's centroids are always within A (but A's centroids are not always in B)"]
This is no question:
I have the same problem as David however, the "do it the opposite wy and do an attribute join" solution is not an option for me. This is because my targets (A) are only partially in my joins (B) and the other way round. They just interesct. However they intersect with neighbouring features aswell. The only usefull spatial relation to join them is the centroid. A's centroids are always within B (but B's centroids are not always in A). Since I am far from being a coder I will have to convert the join features into points as David suggests.
In that sense, I would like to suggest that Esri adds the needed match option in an upcoming uptdate. It seems fundamental anyway.
Could you explain why you would want to multiplicate potentially large B features to join attributes of A ? When joining this way you will end up with duplicated B geometries for every A with centroid in B and lost A geometry.
What is the use case for such a requirement?