hi gaetan
this behaviour is (unfortunately) expected and is the result of two things:
1. the FileGeodatabase format is not efficient when multipatches of multiple features reference the same image - the image is stored separately for each feature. put differently, there is no common texture storage per feature class.
2. CityEngine does not do any image/texture cropping - it simply exports texture coordinates and the full image.
we recognize that in your case these two things result in pathological gdb sizes. would you be able to pre-segment (or simply subdivide) the orthophoto beforehand to reduce the redundancy? you could use CGA to find and assign the corresponding orthophoto tiles.
i'll also reach out to some of our ArcGIS data processing experts, maybe there is a way to "throw away" all pixels which are not referenced by the multipatches, i.e. basically cropping down each texture to its active texture coordinate region.
sorry for the mixed answer & kind regards,
simon