data.json missing records

307
2
Jump to solution
07-12-2021 11:47 AM
GregYoung2
New Contributor III

We use the data.json output from Hub to monitor the status of records on our site.  Today I noticed that the item count had dropped significantly.  Digging in, it appears that the table-based items on our Hub site are being excluded from the data.json output.  Is anyone else experiencing this?

Our site: Ontario GeoHub

Expected record count (as of time of posting): 398

Count in data.json: 349

Example of item not appearing in data.json: https://geohub.lio.gov.on.ca/datasets/bait-licence-and-area/explore

 

0 Kudos
1 Solution

Accepted Solutions
ThomasHervey1
Esri Contributor

@GregYoung2it looks like there is a bug. Some of the items you'll see in your search results are not being indexed by Hub and therefore not appearing on the DCAT endpoint. We'll look into this shortly.

On a different note, I'm curious to know what exactly you're monitoring this list for. The data.json endpoint is used primarily for content federation on sites like data.gov and the JSON response conforms to the DCAT-US 1.1 specification. So is there additional or alternative information that would help you monitor the catalog?

View solution in original post

0 Kudos
2 Replies
ThomasHervey1
Esri Contributor

@GregYoung2it looks like there is a bug. Some of the items you'll see in your search results are not being indexed by Hub and therefore not appearing on the DCAT endpoint. We'll look into this shortly.

On a different note, I'm curious to know what exactly you're monitoring this list for. The data.json endpoint is used primarily for content federation on sites like data.gov and the JSON response conforms to the DCAT-US 1.1 specification. So is there additional or alternative information that would help you monitor the catalog?

0 Kudos
GregYoung2
New Contributor III

@ThomasHervey1  thank you for the quick reply. It would be great to see this resolved soon.

I have a script that harvests data from Hub/AGOL and populates feature tables that we use in dashboards to track the number of items on our site, recently added content, and flag records that don't conform to our metadata standards.  I've found that the data.json output is the easiest way to programmatically get a full list of the items that belong to just our site.  It is used as a starting point with the open data api (v3) and python api used to gather more detailed information about items.  I've figured out how to to this with the open data api as well, but have been told by Esri support not to rely on that one.