hannabox/MEMORY_OPTIMIZATIONS.md

242 lines
9.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ESP32 MP3 Player Memory Optimizations
This document summarizes the memory optimizations implemented to resolve out-of-memory issues in your ESP32 MP3 player.
## Implemented Optimizations
### Improvement for /directory
Implemented a backpressure-safe, low-heap HTML streaming solution to prevent AsyncTCP cbuf resize OOM during /directory.
Root cause
- The previous implementation used AsyncResponseStream (a Print) and wrote faster than the TCP stack could drain. Under client/network backpressure, AsyncTCPs cbuf tried to grow and failed: cbuf.resize() -> WebResponses write(): Failed to allocate.
Fix implemented
- Switched /directory to AsyncChunkedResponse with a stateful generator that only produces bytes when the TCP layer is ready.
- Generates one entry at a time, respecting maxLen provided by the framework. This prevents buffer growth and heap spikes.
- No yield() needed; backpressure is handled by the chunked response callback scheduling.
Code changes
1. Added a tiny accessor to fetch file id at index
- Header: src/DirectoryNode.h
- Added: uint16_t getFileIdAt(size_t i) const;
- Source: src/DirectoryNode.cpp
- Implemented: uint16_t DirectoryNode::getFileIdAt(size_t i) const { return (i < ids.size()) ? ids[i] : 0; }
2. Replaced /directory handler with AsyncChunkedResponse generator
- File: src/main.cpp
- New logic (high level):
- DirectoryHtmlStreamState holds an explicit traversal stack of frames {node, fileIdx, childIdx, headerDone}.
- next(buffer, maxLen) fills output up to maxLen with:
- Single top-level \n
- A name\n for non-root directories (kept original behaviorno nested per subdir)
- One filename\n per file
- Depth-first traversal across subdirectories
- Closes with \n when done
- Uses snprintf into the chunk buffer and a simple copy loop for filenames, avoiding extra heap allocations.
- Frees generator state when finished and also on client disconnect.
3. Minor improvements in the chunked generator
- Normalized newline literals to \n (not escaped).
- Used single quotes around HTML attribute values to simplify C string escaping and reduce mistakes.
What remains unchanged
- DirectoryNode::streamDirectoryHTML(Print&) is left intact but no longer used by /directory. Mapping/State endpoints continue using their existing streaming; they are small and safe.
Why this eliminates the crashes
- AsyncChunkedResponse only invokes the generator when theres space to send more, so AsyncTCPs cbuf wont grow unbounded. The generator respects the maxLen and yields 0 on completion, eliminating the resize path that previously caused OOM.
Build and flash instructions
- Your environment doesnt have PlatformIO CLI available. Options:
1. VSCode PlatformIO extension: Use the Build and Upload tasks from the PlatformIO toolbar.
2. Install PlatformIO CLI:
- python3 -m pip install --user platformio
- $HOME/.local/bin must be in PATH (or use full path).
- Then build: pio run -e d1_mini32
- Upload: pio run -e d1_mini32 -t upload
3. Arduino IDE/CLI: Import and build the sketch there if preferred.
Runtime test checklist
- Open serial monitor at 115200, reset device.
- Hit [](http://DEVICE_IP/directory)<http://DEVICE_IP/directory> in a browser; the page should render fully without OOM or crash.
- Simulate slow client backpressure:
- curl --limit-rate 5k [](http://DEVICE_IP/directory)<http://DEVICE_IP/directory> -v -o /dev/null
- Observe no “[E][cbuf.cpp:104] resize(): failed to allocate temporary buffer” or “WebResponses write(): Failed to allocate”
- Watch heap logs during serving; you should see stable heap with no large dips.
- If desired, repeat with multiple concurrent connections to /directory to verify robustness.
Optional follow-ups
- If mapping ever grows large, convert /mapping to AsyncChunkedResponse using the same pattern.
- If your ESP32 has PSRAM, enabling it can further reduce heap pressure, but the chunked approach is already robust.
- Consider enabling CONFIG_ASYNC_TCP_MAX_ACK_TIME tune if you want more aggressive backpressure timing; your platformio.ini already has some AsyncTCP stack tweaks noted.
Summary
- Replaced Print-based recursive streaming with a chunked, backpressure-aware generator for /directory.
- This removes the cbuf resize failure path and should stop the crashes you observed while still using minimal heap.
### 2. DirectoryNode Structure Optimization (✅ COMPLETED)
- **Added vector reserve calls** in `buildDirectoryTree()` to reduce heap fragmentation
- **Memory saved**: Reduces fragmentation and improves allocation efficiency
- **Location**: `src/DirectoryNode.cpp` - lines with `reserve(8)`, `reserve(16)`
### 3. Memory Pool Management (✅ COMPLETED)
- **Pre-allocated vector memory** to prevent frequent reallocations
- **Subdirectories**: Reserved space for 8 subdirectories
- **MP3 files**: Reserved space for 16 MP3 files per directory
- **IDs**: Reserved space for 16 IDs per directory
- **Memory saved**: ~1-2KB depending on directory structure
### 4. Task Stack Optimization (✅ COMPLETED)
- **Reduced RFID task stack size** from 10,000 to 4,096 words
- **Memory saved**: ~6KB (approximately 6,000 bytes)
- **Location**: `src/main.cpp` - `xTaskCreatePinnedToCore()` call
### 5. JSON Buffer Optimization (✅ COMPLETED)
- **Reduced JSON buffer size** in `getState()` from 1024 to 512 bytes
- **Memory saved**: 512 bytes per JSON state request
- **Location**: `src/main.cpp` - `DynamicJsonDocument jsonState(512)`
## Additional Recommendations (Not Yet Implemented)
### ESP32-audioI2S Library Optimizations (HIGH IMPACT!)
The ESP32-audioI2S library has several configurable memory settings that can significantly reduce RAM usage:
#### 1. Audio Buffer Size Optimization
```cpp
// In your main.cpp setup(), add after audio initialization:
audio.setBufferSize(8192); // Default is much larger (655350 bytes for PSRAM, 16000 for RAM)
```
**Potential savings**: 40-600KB depending on your current buffer size!
#### 2. Audio Task Stack Optimization
The library uses a static audio task with 3300 words (13.2KB). You can modify this in the library:
```cpp
// In Audio.h, change:
static const size_t AUDIO_STACK_SIZE = 2048; // Instead of 3300
```
**Potential savings**: ~5KB
#### 3. Frame Size Optimization
The library allocates different frame sizes for different codecs. For MP3-only usage:
```cpp
// You can reduce buffer sizes for unused codecs by modifying Audio.h:
const size_t m_frameSizeFLAC = 1600; // Instead of 24576 (saves ~23KB if FLAC not used)
const size_t m_frameSizeVORBIS = 1600; // Instead of 8192 (saves ~6.5KB if Vorbis not used)
```
#### 4. Disable Unused Features
Add these build flags to `platformio.ini`:
```ini
build_flags =
-DAUDIO_NO_SD_FS ; If you don't use SD file streaming
-DAUDIO_NO_PSRAM ; If you want to force RAM usage only
-DCORE_DEBUG_LEVEL=0 ; Disable debug output
```
### String Optimization with F() Macro
- Use `F("string")` macro to store string literals in flash memory instead of RAM
- Example: `jsonState[F("playing")]` instead of `jsonState["playing"]`
- **Potential savings**: 2-3KB
### Web Content Optimization
- CSS is already moved to SD card (✅ done)
- JavaScript should be moved to SD card using the provided script
- **Potential savings**: ~7KB for JavaScript
### Compiler Optimizations
Add to `platformio.ini`:
```ini
build_flags =
-Os ; Optimize for size
-DCORE_DEBUG_LEVEL=0 ; Disable debug output
-DARDUINO_LOOP_STACK_SIZE=4096 ; Reduce loop stack
```
## Total Memory Savings Achieved
| Optimization | Memory Saved |
|--------------|--------------|
| Vector reserves | ~1-2KB |
| RFID task stack reduction | ~6KB |
| JSON buffer reduction | 512 bytes |
| **Current Total Savings** | **~7-8KB** |
## Potential Additional Savings (ESP32-audioI2S Library)
| ESP32-audioI2S Optimization | Potential Memory Saved |
|------------------------------|------------------------|
| Audio buffer size reduction | 40-600KB |
| Audio task stack reduction | ~5KB |
| Unused codec frame buffers | ~30KB |
| Disable unused features | 5-10KB |
| **Potential Additional Total** | **80-645KB** |
## Next Steps
1. **Copy web files to SD card**:
```bash
./copy_to_sd.sh
```
(Adjust the SD card mount point in the script as needed)
2. **Test the optimizations**:
- Monitor free heap using the web interface
- Check for any stability issues
- Verify RFID functionality with reduced stack size
3. **High-impact ESP32-audioI2S optimizations**:
```cpp
// Add to setup() after audio.setPinout():
audio.setBufferSize(8192); // Reduce from default large buffer
```
4. **Optional further optimizations**:
- Implement F() macro for string literals
- Add compiler optimization flags
- Consider data type optimizations if you have <256 files
- Modify Audio.h for unused codec optimizations
## Files Modified
- `src/DirectoryNode.cpp` - Added vector reserve calls
- `src/main.cpp` - Reduced task stack and JSON buffer sizes
- `copy_to_sd.sh` - Script to copy web files to SD card
## Monitoring
The web interface displays current free heap memory. Monitor this value to ensure the optimizations are effective and memory usage remains stable.