Skip to content

Fixes a memory leak issue in the Avro C library under abnormal data scenarios.#3635

Open
kwenzh wants to merge 3 commits intoapache:mainfrom
kwenzh:fix-avroc-mem
Open

Fixes a memory leak issue in the Avro C library under abnormal data scenarios.#3635
kwenzh wants to merge 3 commits intoapache:mainfrom
kwenzh:fix-avroc-mem

Conversation

@kwenzh
Copy link
Copy Markdown

@kwenzh kwenzh commented Jan 27, 2026

What is the purpose of the change

  • Does not properly validate that the string length does not exceed available buffer space, potentially leading to buffer overflow
  • Add a macro definition AVRO_SAFE_READ to release memory upon exception return.

Verifying this change

  • Construct a schema and data that do not match, and call the read_string function to read them.
  • Observe memory usage.
  • Stop reading,Observe if memory usage decreases.

Alternatively, you can use valgrind to check for memory leaks.

Documentation

Comment thread lang/c/src/encoding_binary.c Outdated
Comment thread lang/c/src/encoding_binary.c Outdated
Comment thread lang/c/src/avro/io.h
Comment thread lang/c/src/encoding_binary.c Outdated
Comment thread lang/c/src/encoding_binary.c
Comment thread lang/c/src/io.c
return mem_reader->len - mem_reader->read;
} else if (is_file_io(reader)) {
struct _avro_reader_file_t *file_reader = avro_reader_to_file(reader);
return bytes_available(file_reader);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns only the buffered bytes, not all remaining bytes as the memory io branch above

Copy link
Copy Markdown
Author

@kwenzh kwenzh Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I haven't found a way to get the remaining buffer length for file I/O. My goal is to check the maximum readable length before malloc to avoid memory leaks caused by the length exceeding the limit during avro_read_memory checks. Are there any other good solutions?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns only the buffered bytes, not all remaining bytes as the memory io branch above

maybe use (int64_t) sizeof(reader->buffer) , Am I understanding this correctly?

Comment thread lang/c/src/encoding_binary.c Outdated
// max := r.tail - r.head + 1; if max >= 0 && size > max
max_available = avro_max_read(reader);
if (max_available >= 0 && str_len > max_available) {
avro_set_error("mem io: String length %" PRId64 " is greater than available buffer size %" PRId64,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to use the same indentation as the rest of the file, i.e. tabs.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

see again?

Comment thread lang/c/src/encoding.h
Comment thread lang/c/src/encoding.h Outdated
@fabiocfabini
Copy link
Copy Markdown

Is anything preventing this particular request to be merged?

@fabiocfabini
Copy link
Copy Markdown

Why can't the fix simply include the addition of the AVRO_READ_OR_FREE macro and replace the AVRO_READ calls in the read_string and read_bytes functions?

@martin-g
Copy link
Copy Markdown
Member

Is anything preventing this particular request to be merged?

There are no active maintainers of the C SDK...

We will need to the community to review and approve the changes before merging it.

@kwenzh
Copy link
Copy Markdown
Author

kwenzh commented Apr 15, 2026

Is this issue confirmed? Are there any other questions regarding the code modification logic?

@fabiocfabini
Copy link
Copy Markdown

Is anything preventing this particular request to be merged?

There are no active maintainers of the C SDK...

We will need to the community to review and approve the changes before merging it.

How can one become such a maintainer? Also, what does it mean the community needs to review?

@fabiocfabini
Copy link
Copy Markdown

Is this issue confirmed? Are there any other questions regarding the code modification logic?

Is there a reasons why the modifications can't simply revolve around replacing AVRO_READ for AVRO_READ_OR_FREE in the read_string and read_bytes functions?

@martin-g
Copy link
Copy Markdown
Member

How can one become such a maintainer?

https://community.apache.org/contributors/becomingacommitter.html

Also, what does it mean the community needs to review?

It means that the users of the C SDK are encouraged to review the PR(s) and comment on them - what is good and bad. Once a PR is approved by a few users we (the maintainers of the other SDKs) can merge it and make it part of the next release.

@kwenzh
Copy link
Copy Markdown
Author

kwenzh commented Apr 16, 2026

Is this issue confirmed? Are there any other questions regarding the code modification logic?

Is there a reasons why the modifications can't simply revolve around replacing AVRO_READ for AVRO_READ_OR_FREE in the read_string and read_bytes functions?

The current modification is as follows, isn't ?

@fabiocfabini
Copy link
Copy Markdown

The current modification is as follows, isn't ?

What is the purpose of the avro_max_read function being introduced? The current error reporting already accurately reports if the string size is bigger then the size of the encoded buffer. What other purpose does it serve?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants