How to export the 40 million records in a SDC file into shapefile?

709
12
07-16-2013 05:53 AM
yunkexiang
New Contributor
Dear All,
I am working on a project where I need to process in postGIS.
In order to do so, I need shapefile as input.
However, what I have at hand are SDC files. So I need to transform the SDC file into shapefiles in order to put the results in postGIS.

But the problem is, the dataset contains 40 million records which is too big to be "save as shapefile" in one time.
What I am doing now is to divide the records in some logical ways (such as select be the STATES) and then once I have the selection, I will create a layer from the selection and then export it into shapefile.

To process 1/50 of the whole dataset takes me 50minutes to select, and another an hour or so to export.
Which is too tedious.

Does anyone know any other approach that can speed up this process?
Will model builder help at all? (I am looking for automatically way of select and export so that I can leave my computer running over night.)
Thanks.

Best
0 Kudos
12 Replies
RobertBorchert
Frequent Contributor III
Are there any other options to this rather than a shapefile?

i.e. what do you intend to use as a postGIS program?

Can you import the SDC files into a personal geodatabase.
0 Kudos
yunkexiang
New Contributor
I am not sure how to deal with sdc file.
Can I export into file geodatabase?

I think the bottleneck I have is about transforming the sdc file at the moment.
0 Kudos
RobertBorchert
Frequent Contributor III
What program do you intend to use as a postGIS program to work with the date.

A shapefile may or may  not be necessary.

I believe it would go faster if you import to a personal or file geodatabase first.

Also, when you are selecting and exporting pause the drawing.

The length of time required is a direct relation to size of the data file and the power of your computer.

close all programs on your computer and ONLY have the program you need running.



I am not sure how to deal with sdc file.
Can I export into file geodatabase?

I think the bottleneck I have is about transforming the sdc file at the moment.
0 Kudos
yunkexiang
New Contributor
Thanks!
I will be using pgAdmin3.  According to this post "http://workshops.opengeo.org/postgis-intro/loading_data.html", it seems I have to load shapefile into the system.
So how long will it take to transfer from file geodatabase to shapefile then? Anyway I will try this method and let you know if it works better.
Also, I am trying to use the python window to finish everything.
So shall i use "arcpy.FeatureClassToFeatureClass_conversion" for geodatabase to shapefile transfer?

Thanks a ton!
Your help means a lot to me!

What program do you intend to use as a postGIS program to work with the date.

A shapefile may or may  not be necessary.

I believe it would go faster if you import to a personal or file geodatabase first.

Also, when you are selecting and exporting pause the drawing.

The length of time required is a direct relation to size of the data file and the power of your computer.

close all programs on your computer and ONLY have the program you need running.
0 Kudos
RobertBorchert
Frequent Contributor III
I have not ever tried to work with 40 million records in shapefile or SDC files for that matter so I could not tell you how long it might take.  I have some very very very large databases but I have never tried to export more than 750,000 0 records and never to a shapefile on that scale

Probably stay away from Python.  Your adding to many steps to your procedure.  There is probably no way the get around crunching it one state at a time.  However, like I mentioned close all other programs to give more power to your cpu.  if you have your files open in arcgis just use the simple symbols and remove all unnecessary features.

I file or personal geodatabase is probably the way to go.

What other formats can pgAdmin utilize?  Perhaps try exporting to one of those.

bottom line is you have a huge dataset and it takes a machine with the hardware to handle that.
0 Kudos
WillWhite1
New Contributor
I would personally try to use the Split tool - use the 40 million as the input and your states polygon as the split feature. From memory, this outputs a new feature class (or shapefile) for each split field (in your case the state name). I think this is an ArcInfo level tool though. You can then use each shapefile import into PostGIS.

Alternatively I would use python to iterate through each state feature, select all those that intersect and then write the output to a shapefile.
0 Kudos
yunkexiang
New Contributor
The idea of using python to iterate is really intrigueing. But what would the code look like?
Currently I am manually paste into the python window the code of arcpy.selectionmanagement...and then use arcpy to export the states, but I haven't try a loop or anything that I can use to get the data.
Would you like to give me a sense what the code will be like?
thanks!!!!!



I would personally try to use the Split tool - use the 40 million as the input and your states polygon as the split feature. From memory, this outputs a new feature class (or shapefile) for each split field (in your case the state name). I think this is an ArcInfo level tool though. You can then use each shapefile import into PostGIS.

Alternatively I would use python to iterate through each state feature, select all those that intersect and then write the output to a shapefile.
0 Kudos
WillWhite1
New Contributor
The idea of using python to iterate is really intrigueing. But what would the code look like?
Currently I am manually paste into the python window the code of arcpy.selectionmanagement...and then use arcpy to export the states, but I haven't try a loop or anything that I can use to get the data.
Would you like to give me a sense what the code will be like?
thanks!!!!!


Maybe take your logic to do the selection, and export that you've used in the python window and put it into a search cursor. For example the following would iterate through each state in a feature class:

import arcpy

fc = r"C:\temp.gdb\states"

with arcpy.da.SearchCursor(fc, "State") as cursor:
    for row in cursor:
        # your code here


See http://resources.arcgis.com/en/help/main/10.1/index.html#//018w00000011000000 for more information on the search cursor.
0 Kudos
yunkexiang
New Contributor
This is a great idea. But the only bottleneck I have is that I have to start with SDC file as input. There is not a syntax to take SDC file as input. Right?
Thanks for sharing your wisdom!!!!!!!!!


Maybe take your logic to do the selection, and export that you've used in the python window and put it into a search cursor. For example the following would iterate through each state in a feature class:

import arcpy

fc = r"C:\temp.gdb\states"

with arcpy.da.SearchCursor(fc, "State") as cursor:
    for row in cursor:
        # your code here


See http://resources.arcgis.com/en/help/main/10.1/index.html#//018w00000011000000 for more information on the search cursor.
0 Kudos