<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Efficiently Reading and Processing Large CSV Files in Python to Avoid Memory Issues in Python Questions</title>
    <link>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1341976#M69044</link>
    <description>&lt;P&gt;Which Python are you using?&amp;nbsp; I've ingested tens of millions of rows in 64-bit Python 2.7 (64-bit Geoprocessing for ArcMap), and processed 40-60 million rows in 64-bit Python 3 (ArcGIS Pro). Both of those VMs had less than 32GiB RAM available.&lt;/P&gt;&lt;P&gt;- V&lt;/P&gt;</description>
    <pubDate>Thu, 26 Oct 2023 14:20:39 GMT</pubDate>
    <dc:creator>VinceAngelo</dc:creator>
    <dc:date>2023-10-26T14:20:39Z</dc:date>
    <item>
      <title>Efficiently Reading and Processing Large CSV Files in Python to Avoid Memory Issues</title>
      <link>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1341857#M69042</link>
      <description>&lt;P&gt;I'm trying to read a large CSV file, but I keep running into memory issues. How can I efficiently read and process a large CSV file in Python, ensuring that I don't run out of memory?&lt;/P&gt;&lt;P&gt;Please provide a solution or guidance on how to handle large CSV files in Python to avoid memory problems.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Oct 2023 10:32:19 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1341857#M69042</guid>
      <dc:creator>hectorsalamanca</dc:creator>
      <dc:date>2023-10-26T10:32:19Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently Reading and Processing Large CSV Files in Python to Avoid Memory Issues</title>
      <link>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1341863#M69043</link>
      <description>&lt;P&gt;What have you tried?&amp;nbsp; How large are your files? I've read some pretty large files without issues.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Oct 2023 11:22:00 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1341863#M69043</guid>
      <dc:creator>AngelaSchirck</dc:creator>
      <dc:date>2023-10-26T11:22:00Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently Reading and Processing Large CSV Files in Python to Avoid Memory Issues</title>
      <link>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1341976#M69044</link>
      <description>&lt;P&gt;Which Python are you using?&amp;nbsp; I've ingested tens of millions of rows in 64-bit Python 2.7 (64-bit Geoprocessing for ArcMap), and processed 40-60 million rows in 64-bit Python 3 (ArcGIS Pro). Both of those VMs had less than 32GiB RAM available.&lt;/P&gt;&lt;P&gt;- V&lt;/P&gt;</description>
      <pubDate>Thu, 26 Oct 2023 14:20:39 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1341976#M69044</guid>
      <dc:creator>VinceAngelo</dc:creator>
      <dc:date>2023-10-26T14:20:39Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently Reading and Processing Large CSV Files in Python to Avoid Memory Issues</title>
      <link>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1342735#M69057</link>
      <description>&lt;P&gt;Out of curiosity, what sorts of things do you need to do? In the past, I ran into similar issues working with traffic data and the way I got around it was actually switching to Julia for the pre-processing tasks. Julia has a &lt;A href="https://dataframes.juliadata.org/stable/" target="_blank"&gt;DataFrames.jl&lt;/A&gt;&amp;nbsp;library that is similar to pandas.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Oct 2023 15:42:51 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1342735#M69057</guid>
      <dc:creator>EarlMedina</dc:creator>
      <dc:date>2023-10-27T15:42:51Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently Reading and Processing Large CSV Files in Python to Avoid Memory Issues</title>
      <link>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1342805#M69060</link>
      <description>&lt;P&gt;Use Python Generators:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;csv = (row for row in open("path/to/csv"))

for row in csv:
    process(row)

# Alternatively

def read_csv(csv:str, sep:str=',') -&amp;gt; list[str]:
    f = open(csv)
    for row in f:
        yield row.strip().split(sep)

for row in read_csv("path/to/csv"):
    process(row)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This will be slower than loading the whole file into memory, but it only uses a fixed amount of memory and iterates through the file line by line only loading in the line that it needs for the current operation&lt;/P&gt;</description>
      <pubDate>Fri, 27 Oct 2023 16:57:23 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1342805#M69060</guid>
      <dc:creator>HaydenWelch</dc:creator>
      <dc:date>2023-10-27T16:57:23Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently Reading and Processing Large CSV Files in Python to Avoid Memory Issues</title>
      <link>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1342808#M69061</link>
      <description>&lt;P&gt;The limit is around 100-200M rows from my experience. I've had to deal with large datasets that include say all addresses in a country and you tend to quickly run out of memory without using generators.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Oct 2023 17:01:17 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1342808#M69061</guid>
      <dc:creator>HaydenWelch</dc:creator>
      <dc:date>2023-10-27T17:01:17Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently Reading and Processing Large CSV Files in Python to Avoid Memory Issues</title>
      <link>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1367784#M69582</link>
      <description>&lt;P&gt;You can read large csv using PySpark in Python&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; pyspark.sql &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; SparkSession&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;spark = SparkSession.builder.appName(&lt;/SPAN&gt;&lt;SPAN&gt;"large_file_read"&lt;/SPAN&gt;&lt;SPAN&gt;).getOrCreate()&lt;/SPAN&gt;&lt;/SPAN&gt; &lt;SPAN class=""&gt;&lt;SPAN&gt;df = spark.read.csv(&lt;/SPAN&gt;&lt;SPAN&gt;'large_dataset.csv'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;header&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;You can get more details on this and how to choose best way to &lt;A title="read large csv" href="https://enodeas.com/how-to-read-large-csv-file-in-python-best-approach/" target="_blank" rel="noopener"&gt;read large csv&lt;/A&gt; here&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jan 2024 12:16:29 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/efficiently-reading-and-processing-large-csv-files/m-p/1367784#M69582</guid>
      <dc:creator>SamirSinghaMahapatra</dc:creator>
      <dc:date>2024-01-09T12:16:29Z</dc:date>
    </item>
  </channel>
</rss>

