AnsweredAssumed Answered

Learning module Predicting Seagrass Habitats With Machine Learning: I'm getting the following Python error trying to train the random forest classifier

Question asked by brianort on Apr 11, 2020
Latest reply on Apr 23, 2020 by brianort

Here's where I am in the module:

Train your random forest classifier

Now that you have split your data, you'll train your random forest classifier using the training data you have created.

  1. Create the variable rfco to show the results of running the RandomForestClassifier command to create 500 trees. Then use the .fit argument to apply the forest results to the training data.
    rfco = RandomForestClassifier(n_estimators = 500, oob_score = True) rfco.fit(train_set[predictVars], indicator)
  2. Run the classification again using the test dataset. Create the attribute seagrassPred to store this data with a 1 for occurrence and a 0 for no occurrence.

    The test data is 90 percent of the United States coastal data that was not used to train the model, and will show the accuracy of your prediction.

    seagrassPred = rfco.predict(test_set[predictVars])
  3. Use the results of the classification to check the efficiency of the model by calculating prediction accuracy and estimation error.
    test_seagrass = test_set[classVar].as_matrix() test_seagrass = test_seagrass.flatten() error = NUM.sum(NUM.abs(test_seagrass - seagrassPred))/len(seagrassPred) * 100
    -------
    Here's the last few entries in my Python log:

    test_set = data.drop(train_set.index)

    indicator, _ = PD.factorize(train_set[classVar[0]])

    print('Training Data Size = ' + str(train_set.shape[0]))print('Test Data Size = ' + str(test_set.shape[0]))

    Training Data Size = 1000

    Test Data Size = 9000

    rfco = RandomForestClassifier(n_estimators = 500, oob_score = True)rfco.fit(train_set[predictVars], indicator)

    seagrassPred = rfco.predict(test_set[predictVars])

    test_seagrass = test_set[classVar].as_matrix()test_seagrass = test_seagrass.flatten()error = NUM.sum(NUM.abs(test_seagrass - seagrassPred))/len(seagrassPred) * 100

    Traceback (most recent call last):

      File "<string>", line 1, in <module>

      File "C:\Users\brian\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\lib\site-packages\pandas\core\generic.py", line 5274, in __getattr__

        return object.__getattribute__(self, name)

    AttributeError: 'DataFrame' object has no attribute 'as_matrix'


    I'm concerned it might be a file path problem that I can't do anything about.
    I'm working in ArcPro from a Windows server 2012R2, so the actual file path to Documents may differ from what it would be on a stand-alone machine.
    I'm new to Python but can sort of follow along.

    Kathy Cappelli I saw on a related thread that you're the person responsible for this cool module.

Outcomes