GeoAnalytics Big Data HDFS Manifest Format Problem

820
2
02-28-2022 04:29 AM
Yusuf_SelçukMehan
New Contributor

We are using ArcGIS GeoAnalytics 10.9.1 and trying to create Big Data File Share to Oracle BDA 5.16 over HDFS.

When we try to generate manifest for CSV ve parquet files we are getting these errors;

-1-
Failed to load extension 'csv' with format 'delimited': [User] io.fs.delimited.discovery.invalid_sample ListMap(path -> hdfs-44c69445-8a6d-42a7-97fc-58473ed0ed79://***/DENEME/DENEME1/DENEME_ALT.csv) [User] io.fs.delimited.discovery.invalid_sample ListMap(path -> hdfs-44c69445-8a6d-42a7-97fc-58473ed0ed79://***/DENEME/DENEME1/DENEME_ALT.csv)


-2-
com.esri.arcgis.st.spark.ExecutionException: [User] io.fs.delimited.discovery.invalid_sample ListMap(path -> hdfs-44c69445-8a6d-42a7-97fc-58473ed0ed79://***/DENEME/DENEME1/DENEME_ALT.csv) at com.esri.arcgis.st.spark.MessageHandler.throwUserError(MessageHandler.scala:43) at com.esri.arcgis.st.spark.MessageHandler.throwUserError$(MessageHandler.scala:42) at com.esri.arcgis.st.spark.GenericLoggerMessageHandler.throwUserError(MessageHandler.scala:48) at com.esri.arcgis.st.io.fs.format.delim.SchemaDiscoveryUtil$.readSample(SchemaDiscoveryUtil.scala:142) at com.esri.arcgis.st.io.fs.format.delim.SchemaDiscoveryUtil$.discoverFileProperties(SchemaDiscoveryUtil.scala:31) at com.esri.arcgis.st.io.fs.format.DelimitedFileFormat.discoverProperties(DelimitedFileFormat.scala:35) at com.esri.arcgis.st.io.fs.FileFormatDiscovery$.$anonfun$discoverWithoutFormatHint$9(FileFormatDiscovery.scala:130) at com.esri.arcgis.st.io.fs.FileFormatDiscovery$.$anonfun$discoverWithoutFormatHint$9$adapted(FileFormatDiscovery.scala:122) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.generic.TraversableForwarder.foreach(TraversableForwarder.scala:38) at scala.collection.generic.TraversableForwarder.foreach$(TraversableForwarder.scala:38) at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:47) at com.esri.arcgis.st.io.fs.FileFormatDiscovery$.$anonfun$discoverWithoutFormatHint$8(FileFormatDiscovery.scala:122) at com.esri.arcgis.st.io.fs.FileFormatDiscovery$.$anonfun$discoverWithoutFormatHint$8$adapted(FileFormatDiscovery.scala:120) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at com.esri.arcgis.st.io.fs.FileFormatDiscovery$.discoverWithoutFormatHint(FileFormatDiscovery.scala:120) at com.esri.arcgis.st.io.fs.FileFormatDiscovery$.$anonfun$discover$3(FileFormatDiscovery.scala:37) at scala.Option.orElse(Option.scala:447) at com.esri.arcgis.st.io.fs.FileFormatDiscovery$.discover(FileFormatDiscovery.scala:37) at com.esri.arcgis.st.io.fs.FileSystemDataSource.discoverDatasetDefinition(FileSystemDataSource.scala:100) at com.esri.arcgis.gae.ags.datastore.manifest.ManifestDataStore.$anonfun$discoverDatasetMetadata$1(ManifestDataStore.scala:48) at scala.util.Try$.apply(Try.scala:213) at com.esri.arcgis.gae.ags.datastore.manifest.ManifestDataStore.discoverDatasetMetadata(ManifestDataStore.scala:47) at com.esri.arcgis.gae.ags.datastore.manifest.gen.ManifestManager.$anonfun$generateManifestDataset$1(ManifestManager.scala:53) at scala.util.Try$.apply(Try.scala:213) at com.esri.arcgis.gae.ags.datastore.manifest.gen.ManifestManager.generateManifestDataset(ManifestManager.scala:44) at com.esri.arcgis.gae.ags.datastore.manifest.gen.ManifestManager.$anonfun$generateManifest$2(ManifestManager.scala:35) at com.esri.arcgis.gae.ags.datastore.manifest.gen.ManifestManager.$anonfun$generateManifest$2$adapted(ManifestManager.scala:29) at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:75) at com.esri.arcgis.gae.ags.datastore.manifest.gen.ManifestManager.$anonfun$generateManifest$1(ManifestManager.scala:29) at scala.util.Try$.apply(Try.scala:213) at com.esri.arcgis.gae.ags.datastore.manifest.gen.ManifestManager.generateManifest(ManifestManager.scala:22) at com.esri.arcgis.gae.gp.fn.FnGenerateManifest$.generateManifest(FnGenerateManifest.scala:135) at com.esri.arcgis.gae.gp.fn.FnGenerateManifest$.execute(FnGenerateManifest.scala:55) at com.esri.arcgis.gae.fn.api.v1.GAFunction.executeFunction(GAFunction.scala:25) at com.esri.arcgis.gae.fn.api.v1.GAFunction.executeFunction$(GAFunction.scala:15) at com.esri.arcgis.gae.gp.fn.FnGenerateManifest$.executeFunction(FnGenerateManifest.scala:18) at com.esri.arcgis.gae.gp.GPToGAFunctionAdapter.$anonfun$execute$4(GPToGAFunctionAdapter.scala:159) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.esri.arcgis.st.util.ScopeManager.withScope(ScopeManager.scala:20) at com.esri.arcgis.st.spark.ExecutionContext.withScope(ExecutionContext.scala:23) at com.esri.arcgis.st.spark.ExecutionContext.withScope$(ExecutionContext.scala:23) at com.esri.arcgis.st.spark.GenericExecutionContext.withScope(ExecutionContext.scala:52) at com.esri.arcgis.gae.gp.GPToGAFunctionAdapter.execute(GPToGAFunctionAdapter.scala:157) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at com.esri.arcgis.interop.NativeObjRef.nativeVtblInvokeNative(Native Method) at com.esri.arcgis.interop.NativeObjRef.nativeVtblInvoke(Unknown Source) at com.esri.arcgis.interop.NativeObjRef.invoke(Unknown Source) at com.esri.arcgis.interop.Dispatch.vtblInvoke(Unknown Source) at com.esri.arcgis.system.IRequestHandlerProxy.handleStringRequest(Unknown Source) at com.esri.arcgis.discovery.servicelib.impl.SOThreadBase.handleSoapRequest(SOThreadBase.java:577) at com.esri.arcgis.discovery.servicelib.impl.DedicatedSOThread.handleRequest(DedicatedSOThread.java:310) at com.esri.arcgis.discovery.servicelib.impl.SOThreadBase.startRun(SOThreadBase.java:430) at com.esri.arcgis.discovery.servicelib.impl.DedicatedSOThread.run(DedicatedSOThread.java:166)

-3-
Unable to determine file format handler for 'hdfs-44c69445-8a6d-42a7-97fc-58473ed0ed79://***/DENEME/DENEME1'

Tags (3)
0 Kudos
2 Replies
MichaelPark
Esri Contributor

This error is usually thrown when the HDFS namenode is accessible to the Enterprise machines, but the datanodes are not. The namenode allows us to list files in a directory, but the actual content of the files are stored on the datanodes. I don't know exactly how Oracle BDA exposes the HDFS API, but it likely behaves the same. I would check to make sure that all of the machines in your BDA are accessible to the Enterprise machine(s). Make sure they aren't blocked by a firewall and that the advertised host names/IP addresses can be resolved by the Enterprise machines.

Yusuf_SelçukMehan
New Contributor

Hi Michael,

Yes you are right, HDFS list API return datanode with internal IPs which GeoAnalytics Server cant reach.

There is a private network for Oracle BDA servers.

I've found a topic in cloudera community;

https://community.cloudera.com/t5/Support-Questions/HDFS-put-failing-due-to-internal-IP-address-use/...

For solution it says we have to set use datanode hostname in spark/hadoop configuration on the client side. In GeoAnalytics server admin or manager I couldnt see any configuration for spark.

Can we do any configuration for Spark in GeoAnalytics?

Thanks.

0 Kudos