EMedina-esristaff

Using Mimesis to Generate Mock Data for Testing

Blog Post created by EMedina-esristaff Employee on Feb 13, 2019

Have you ever found yourself in situations where you needed data to test with but either had nothing on hand to suit the purpose or not enough of it? Short on time, you may not want to sift through your existing data sets or search the web for sample data.

 

If tabular data is what you're after, then the Mimesis python package might be for you: Mimesis - Fake Data Generator — Mimesis 3.0.0 documentation. Using this handy package, you can generate as much fake/mock data as you need for your testing.

 

In this post, I want to illustrate how you might combine Mimesis with a few other packages to generate mock data into DBMS tables. These packages include:

  • psycopg2 (used to connect to PostgreSQL)
  • pyodbc (used to connect to MSSQL)
  • cx_Oracle (used to connect to Oracle)

 

To create a nice GUI you can use QT Designer and pyqt5 - I've done that here to better illustrate the use of this tool. You can create a executable from the below example using pyinstaller or you can just choose to pick out functions and create a command-line version. These functions do most of the work:

  • connection
  • createTbl
  • addRecords

 

Last thing to note is that this requires Python 3.

 


Let's suppose we're working with the below code as-is (after installing the 4 required packages):

# Mock Data Generator

import sys
from PyQt5 import QtCore, QtGui, QtWidgets
from PyQt5.QtWidgets import QDialog, QApplication
import psycopg2
import pyodbc
import cx_Oracle
from mimesis import Person
from mimesis import Address

class AppWindow(QDialog):
    def __init__(self):
        super().__init__()
        self.ui = Ui_Dialog()
        self.ui.setupUi(self)
        self.show() 

class Ui_Dialog(object):
    def setupUi(self, Dialog):
        Dialog.setObjectName("Dialog")
        Dialog.resize(400, 300)
        self.stat = 0
        self.buttonBox = QtWidgets.QDialogButtonBox(Dialog)
        self.buttonBox.setGeometry(QtCore.QRect(50, 260, 341, 32))
        self.buttonBox.setOrientation(QtCore.Qt.Horizontal)
        self.buttonBox.setStandardButtons(QtWidgets.QDialogButtonBox.Cancel|QtWidgets.QDialogButtonBox.Ok)
        self.buttonBox.setObjectName("buttonBox")
        self.textEdit = QtWidgets.QTextEdit(Dialog)
        self.textEdit.setGeometry(QtCore.QRect(263, 10, 121, 231))
        self.textEdit.setObjectName("textEdit")
        self.widget = QtWidgets.QWidget(Dialog)
        self.widget.setGeometry(QtCore.QRect(10, 10, 241, 231))
        self.widget.setObjectName("widget")
        self.gridLayout = QtWidgets.QGridLayout(self.widget)
        self.gridLayout.setContentsMargins(0, 0, 0, 0)
        self.gridLayout.setObjectName("gridLayout")
        self.label = QtWidgets.QLabel(self.widget)
        self.label.setMinimumSize(QtCore.QSize(70, 0))
        self.label.setObjectName("label")
        self.gridLayout.addWidget(self.label, 0, 0, 1, 1)
        self.comboBox = QtWidgets.QComboBox(self.widget)
        self.comboBox.setObjectName("comboBox")
        self.gridLayout.addWidget(self.comboBox, 0, 1, 1, 1)
        self.label_2 = QtWidgets.QLabel(self.widget)
        self.label_2.setMinimumSize(QtCore.QSize(70, 0))
        self.label_2.setObjectName("label_2")
        self.gridLayout.addWidget(self.label_2, 1, 0, 1, 1)
        self.lineEdit = QtWidgets.QLineEdit(self.widget)
        self.lineEdit.setObjectName("lineEdit")
        self.gridLayout.addWidget(self.lineEdit, 1, 1, 1, 1)
        self.label_3 = QtWidgets.QLabel(self.widget)
        self.label_3.setMinimumSize(QtCore.QSize(70, 0))
        self.label_3.setObjectName("label_3")
        self.gridLayout.addWidget(self.label_3, 2, 0, 1, 1)
        self.lineEdit_2 = QtWidgets.QLineEdit(self.widget)
        self.lineEdit_2.setObjectName("lineEdit_2")
        self.gridLayout.addWidget(self.lineEdit_2, 2, 1, 1, 1)
        self.label_4 = QtWidgets.QLabel(self.widget)
        self.label_4.setMinimumSize(QtCore.QSize(70, 0))
        self.label_4.setObjectName("label_4")
        self.gridLayout.addWidget(self.label_4, 3, 0, 1, 1)
        self.lineEdit_3 = QtWidgets.QLineEdit(self.widget)
        self.lineEdit_3.setObjectName("lineEdit_3")
        self.gridLayout.addWidget(self.lineEdit_3, 3, 1, 1, 1)
        self.label_5 = QtWidgets.QLabel(self.widget)
        self.label_5.setMinimumSize(QtCore.QSize(70, 0))
        self.label_5.setObjectName("label_5")
        self.gridLayout.addWidget(self.label_5, 4, 0, 1, 1)
        self.lineEdit_4 = QtWidgets.QLineEdit(self.widget)
        self.lineEdit_4.setObjectName("lineEdit_4")
        self.gridLayout.addWidget(self.lineEdit_4, 4, 1, 1, 1)
        self.label_6 = QtWidgets.QLabel(self.widget)
        self.label_6.setMinimumSize(QtCore.QSize(70, 0))
        self.label_6.setObjectName("label_6")
        self.gridLayout.addWidget(self.label_6, 5, 0, 1, 1)
        self.lineEdit_5 = QtWidgets.QLineEdit(self.widget)
        self.lineEdit_5.setObjectName("lineEdit_5")
        self.lineEdit_5.setEchoMode(QtWidgets.QLineEdit.Password)
        self.gridLayout.addWidget(self.lineEdit_5, 5, 1, 1, 1)
        self.label_7 = QtWidgets.QLabel(self.widget)
        self.label_7.setMinimumSize(QtCore.QSize(70, 0))
        self.label_7.setObjectName("label_7")
        self.gridLayout.addWidget(self.label_7, 6, 0, 1, 1)
        self.lineEdit_6 = QtWidgets.QLineEdit(self.widget)
        self.lineEdit_6.setObjectName("lineEdit_6")
        self.gridLayout.addWidget(self.lineEdit_6, 6, 1, 1, 1)
        self.label_8 = QtWidgets.QLabel(self.widget)
        self.label_8.setMinimumSize(QtCore.QSize(70, 0))
        self.label_8.setObjectName("label_8")
        self.gridLayout.addWidget(self.label_8, 7, 0, 1, 1)
        self.lineEdit_7 = QtWidgets.QLineEdit(self.widget)
        self.lineEdit_7.setObjectName("lineEdit_7")
        self.gridLayout.addWidget(self.lineEdit_7, 7, 1, 1, 1)
        self.lineEdit_2.setText('5432')
        self.comboBox.clear()
        self.dbmsList = ['PostgreSQL','MSSQL (Driver 11)','MSSQL (Driver 13)','MSSQL (Driver 17)','SQL Server Express','Oracle 12c+']
        self.comboBox.addItems(self.dbmsList)
        self.comboBox.activated.connect(self.handleActivated)
        self.retranslateUi(Dialog)
        self.buttonBox.accepted.connect(self.button_click)
        self.buttonBox.rejected.connect(Dialog.reject)
        QtCore.QMetaObject.connectSlotsByName(Dialog)

    def handleActivated(self, index):
        if index == 0:
            self.lineEdit_2.setText('5432')
        elif index > 0 and index < 4:
            self.lineEdit_2.setText('1433')
        elif index == 4:
            self.lineEdit_2.setText('')
        elif index == 5:
            self.lineEdit_2.setText('1521')

    def retranslateUi(self, Dialog):
        _translate = QtCore.QCoreApplication.translate
        Dialog.setWindowTitle(_translate("Dialog", "Mock Data Generator"))
        self.label.setText(_translate("Dialog", "DBMS Type: "))
        self.label_2.setText(_translate("Dialog", "Host: "))
        self.label_3.setText(_translate("Dialog", "Port: "))
        self.label_4.setText(_translate("Dialog", "Database: "))
        self.label_5.setText(_translate("Dialog", "Username: "))
        self.label_6.setText(_translate("Dialog", "Password: "))
        self.label_7.setText(_translate("Dialog", "Table Name: "))
        self.label_8.setText(_translate("Dialog", "# of Rows: "))

    def button_click(self):
        self.ctype = self.comboBox.currentIndex()
        self.server = self.lineEdit.text()
        self.port = self.lineEdit_2.text()
        self.dbase = self.lineEdit_3.text()
        self.user = self.lineEdit_4.text()
        self.password = self.lineEdit_5.text()
        self.tbl = self.lineEdit_6.text()
        self.rows = self.lineEdit_7.text()
        self.runFunc()

    def connection(self):
        if self.ctype == 0:
            try:
                self.con = psycopg2.connect(f"host={self.server} dbname={self.dbase} user={self.user} password={self.password} port={self.port}")   
                self.cur = self.con.cursor()
                self.stat = 1
                self.textEdit.append(f"Connected to {self.dbase}.")
            except psycopg2.DatabaseError as error:
                self.textEdit.append(str(error))
        elif self.ctype > 0 and self.ctype < 4:
            if self.ctype == 1:
                driver = '{ODBC Driver 11 for SQL Server}'
            elif self.ctype == 2:
                driver = '{ODBC Driver 13 for SQL Server}'
            elif self.ctype == 3:
                driver = '{ODBC Driver 17 for SQL Server}'
            else:
                self.textEdit.append("Driver number not valid")
                exit()
            try:
                self.con = pyodbc.connect(f'DRIVER={driver};SERVER='+self.server+','+self.port+';DATABASE='+self.dbase+';UID='+self.user+';PWD='+ self.password)
                self.cur = self.con.cursor()
                self.textEdit.append("Connected.")
            except pyodbc.Error as error:
                self.textEdit.append(str(error))

        elif self.ctype == 4:
            driver = '{SQL Server}'
            try:
                self.con = pyodbc.connect(f'DRIVER={driver};SERVER='+self.server+';DATABASE='+self.dbase+';UID='+self.user+';PWD='+ self.password)
                self.cur = self.con.cursor()
                self.stat = 1
                self.textEdit.append(f"Connected to {self.dbase}.")
            except pyodbc.Error as error:
                self.textEdit.append(str(error))

        elif self.ctype == 5:
            try:
                self.con = cx_Oracle.connect(f'{self.user}/{self.password}@{self.server}/{self.dbase}:{self.port}')
            except cx_Oracle.DatabaseError as error:
                self.textEdit.append(str(error))
                self.textEdit.append("If using SYS try another account. SYS is no supported.")
            self.cur = self.con.cursor()
            self.stat = 1
            self.textEdit.append(f"Connected to {self.dbase}.")
        else:
            self.textEdit.append("Invalid type.")

    def runFunc(self):
        if self.stat == 0:
            self.connection()
        if self.tbl == "" or self.rows =="":
            self.textEdit.append("Enter Table information")
        else:
            self.createTbl()
            self.stat = 0
            self.con.close()

    def createTbl(self):
        if self.ctype == 0:
            autoinc = "SERIAL PRIMARY KEY"
            itype = "INT"
            flt = "NUMERIC(10,5)"
        if self.ctype > 0 and self.ctype < 5:
            autoinc = "INTEGER IDENTITY PRIMARY KEY"
            itype = "INT"
            flt = "NUMERIC(10,5)"
        if self.ctype == 5:
            autoinc = "NUMBER GENERATED by default on null as IDENTITY"
            itype = "NUMBER(3)"

        try:
            self.cur.execute(f"CREATE TABLE {self.tbl}(Id {autoinc}, Firstname VARCHAR(50), Lastname VARCHAR(50), Phone VARCHAR(30), Email VARCHAR(50), Address VARCHAR(50), City VARCHAR(50), State VARCHAR (50), Nationality VARCHAR(50), Occupation VARCHAR(50), Age {itype}, Lat NUMERIC(10,5), Lon NUMERIC(10,5))")
            self.addRecords()
            self.con.commit()
        except (psycopg2.DatabaseError, pyodbc.Error, cx_Oracle.DatabaseError) as error:
            self.textEdit.append(str(error))
        else:
            self.textEdit.append("Table created.")

    def addRecords(self):
        p = Person('en')
        a = Address('en')
        l = list()
        for i in range(int(self.rows)):
            firstname = p.name()
            lastname = p.last_name()
            telephone = p.telephone()
            email = p.email()
            address = a.address()
            city = a.city()
            state = a.state()
            lat = a.latitude()
            lon = a.longitude()
            nationality = p.nationality()
            occupation = p.occupation()
            age = p.age()
            l.append(tuple((firstname, lastname, telephone, email, address, city, state, nationality, occupation, age, lat, lon)))

        if self.ctype == 0:
            self.cur.executemany(f"INSERT INTO {self.tbl} (Firstname, Lastname, Phone, Email, Address, City, State, Nationality, Occupation, Age, Lat, Lon) values(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)", l)
            self.textEdit.append(f"{self.tbl} populated with {self.rows} rows of Sample Data")
        elif self.ctype > 0 and self.ctype < 5:
            self.cur.executemany(f"INSERT INTO {self.tbl} (Firstname, Lastname, Phone, Email, Address, City, State, Nationality, Occupation, Age, Lat, Lon) values(?,?,?,?,?,?,?,?,?,?,?,?)", l)
            self.textEdit.append(f"{self.tbl} populated with {self.rows} rows of Sample Data")
        elif self.ctype == 5:
            self.cur.executemany(f"INSERT INTO {self.tbl} (Firstname, Lastname, Phone, Email, Address, City, State, Nationality, Occupation, Age, Lat, Lon) values(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12)", l)
            self.textEdit.append(f"{self.tbl} populated with {self.rows} rows of Sample Data")


app = QApplication(sys.argv)
w = AppWindow()
w.show()
sys.exit(app.exec_())

 


 

Execute that and you'll end up with a program that looks like this:

table_run

  • Here, I've connected to a PostgreSQL database and have elected to create a table named mock with 5000 rows of sample data.

 

Switching over to pgAdmin, I see that the table was successfully created and this mock data was generated:

table2

 

Using this tool, I generated a table with the following fields:

  • An auto-incremented id
  • First Name, Last Name, Phone, E-Mail, Address, City, State, Nationality, Occupation, Age, Latitude, and Longitude

 

Don't read too much into the data - it is, after all, mock data. None of the addresses actually exist and the longitude/latitude values won't necessarily land you on solid ground! Still, this data might be useful for testing of joins or data-processing workflows. I created 5000 rows in this example but it is certainly possible to generate 1,000,000 records and beyond.

 

Some notes on usage of this example tool:

  • For MSSQL, select the driver which corresponds to the one installed on the client machine.
  • For Oracle, "Database" is the instance.
    • Note that only Oracle 12c works in this example for generating mock data. This is because an auto-incremented id is created and the column type specified in the code was implemented at version 12c. Therefore, you'll want to modify the CREATE statement if you're testing on an older version of Oracle.
  • For SQL Server Express use this format:
    • Host: machinename\instance
    • Port: leave blank

 

 

That's about it! Hopefully this sample inspires others to experiment with Mimesis!

Outcomes