I thought a little about it and I was wondering if you couldn't first analyze links between clusters before trying to compute shortest paths between all the nodes.
I guess that there are multiple reasons for clustering to happen, rivers and like bridges at specific locations, borders and control points, valleys, etc. Wouldn't it be more efficient to build a solution around the following strategy (?) ..
1. Define clusters based on the aforementioned criteria.
2. Build a list of nodes that connect these clusters.
3. Determine shortest path "intra-cluster" for each cluster.
4. Based on 2. and 3., build a second ("virtual") network whose nodes are connection nodes from 2. and whose edges lengths are the shortest paths between these nodes obtained in 3.
Once done, you can determine the shortest path between two nodes by:
- If the two nodes are in the same cluster, it is given by 3.
- If the two nodes are in different clusters, add them to the virtual network, connected to each connection node of their own clusters by edges whose lengths are the shortest path to these nodes. Compute shortest path on this extended virtual network.
It would have several major advantages I guess. The obvious one is to reduce the dimension of your "objects" (matrices, etc) in memory by splitting it/them into multiple smaller ones. This would also be relevant for your processing in Matlab because it would not require a huge amount of contiguous memory (that is smaller than the free memory that is left on your machine). Moreover, as these clusters are not represented as directly connected within the same matrix, the sum of the smaller matrices are likely to require significantly less memory than a 40,000x40,000 full/dense matrix (you could see all your smaller matrices as blocs of a 40k x 40k bloc/sparse matrix). The price to pay would be a little more algorithm to develop in comparison to the direct, "brute-force" method.
This might not be adapted to the data-mining that you want to perform and I am far to be a specialist, so you might want to ask a real computer/network person or wait for my post to be fairly criticized 😉
Cheers,
Cedric