Learning module Predicting Seagrass Habitats With Machine Learning: I'm getting the following Python error trying to train the random forest classifier

479
3
04-11-2020 07:02 PM
BrianOrt
New Contributor II

Here's where I am in the module:

Train your random forest classifier

Now that you have split your data, you'll train your random forest classifier using the training data you have created.

  1. Create the variable rfco to show the results of running the RandomForestClassifier command to create 500 trees. Then use the .fit argument to apply the forest results to the training data.
    rfco = RandomForestClassifier(n_estimators = 500, oob_score = True) rfco.fit(train_set[predictVars], indicator)
  2. Run the classification again using the test dataset. Create the attribute seagrassPred to store this data with a 1 for occurrence and a 0 for no occurrence.

    The test data is 90 percent of the United States coastal data that was not used to train the model, and will show the accuracy of your prediction.

    seagrassPred = rfco.predict(test_set[predictVars])
  3. Use the results of the classification to check the efficiency of the model by calculating prediction accuracy and estimation error.
    test_seagrass = test_set[classVar].as_matrix() test_seagrass = test_seagrass.flatten() error = NUM.sum(NUM.abs(test_seagrass - seagrassPred))/len(seagrassPred) * 100
    -------
    Here's the last few entries in my Python log:

    test_set = data.drop(train_set.index)

    indicator, _ = PD.factorize(train_set[classVar[0]])

    print('Training Data Size = ' + str(train_set.shape[0]))print('Test Data Size = ' + str(test_set.shape[0]))

    Training Data Size = 1000

    Test Data Size = 9000

    rfco = RandomForestClassifier(n_estimators = 500, oob_score = True)rfco.fit(train_set[predictVars], indicator)

    seagrassPred = rfco.predict(test_set[predictVars])

    test_seagrass = test_set[classVar].as_matrix()test_seagrass = test_seagrass.flatten()error = NUM.sum(NUM.abs(test_seagrass - seagrassPred))/len(seagrassPred) * 100

    Traceback (most recent call last):

      File "<string>", line 1, in <module>

      File "C:\Users\brian\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\lib\site-packages\pandas\core\generic.py", line 5274, in __getattr__

        return object.__getattribute__(self, name)

    AttributeError: 'DataFrame' object has no attribute 'as_matrix'


    I'm concerned it might be a file path problem that I can't do anything about.
    I'm working in ArcPro from a Windows server 2012R2, so the actual file path to Documents may differ from what it would be on a stand-alone machine.
    I'm new to Python but can sort of follow along.

    Kathy Cappelli‌ I saw on a related thread that you're the person responsible for this cool module.
3 Replies
DanPatterson_Retired
MVP Esteemed Contributor

Is this what you are following?

Predict Seagrass Habitats with Machine Learning | Learn ArcGIS 

some bugs in the instructions... don't know if they have been corrected in the instructions

https://community.esri.com/thread/223883-i-am-working-on-the-predict-seagrass-habitats-with-machine-... 

0 Kudos
BrianOrt
New Contributor II

Dan,

Yes, the Predict Seagrass Habitats... module is what I'm working through, and I did find the thread you refer to prior to posting my question, but thanks anyway for sending it. In that case, the OP's question stemmed from naming an attribute 02 (with zero) instead of O2 (with letter O).

I'm having a different issue, with calling up the method, .as_matrix. I have since found this information, saying that as_matrix is no longer included in Python versions. 

scikit learn - Python: Method .as_matrix will be removed in a future version. Use .values instead - ... 

I think perhaps I need to learn how to replace as_matrix with .values() or .to_numpy() before I can proceed.

The deprecation of .values happened after the publication of the Seagrass module I'm trying to learn from.

0 Kudos
BrianOrt
New Contributor II

Follow-up: I replaced the .as_matrix method with .to_numpy and got... a result. The accuracy estimates were exactly opposite of what they were supposed to have been using .as_matrix, so I would take a wild guess to say that .to_numpy might have reversed two columns in the matrix. By the end of the learning module, my map looked "right." Still, I'd like to know what happened between .as_matrix and .to_numpy.