Your last approach (spatial joins) is a good one (as indicated earlier) with the summary field added.
This is a classic case where working with raster data would be easier. Each fishnet polygon could represent one raster cell. Your maps as raster could simply be added together, which is a local statistic operation
Cell Statistics (Spatial Analyst)—ArcGIS Pro | Documentation
Alternative which don't require extra extensions would entail exporting the data to numpy arrays (fishnet centroid, more specifically the OBJECTID field and the attribute field) and summing the arrays since your object id's are all the same. Local statistical functions can be done in numpy and scipy if you don't have the Spatial Analyst extension
... sort of retired...