kref.txt 8.87 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
===================================================
Adding reference counters (krefs) to kernel objects
===================================================

:Author: Corey Minyard <minyard@acm.org>
:Author: Thomas Hellstrom <thellstrom@vmware.com>

A lot of this was lifted from Greg Kroah-Hartman's 2004 OLS paper and
presentation on krefs, which can be found at:

  - http://www.kroah.com/linux/talks/ols_2004_kref_paper/Reprint-Kroah-Hartman-OLS2004.pdf
  - http://www.kroah.com/linux/talks/ols_2004_kref_talk/

Introduction
============
16 17 18 19 20 21

krefs allow you to add reference counters to your objects.  If you
have objects that are used in multiple places and passed around, and
you don't have refcounts, your code is almost certainly broken.  If
you want refcounts, krefs are the way to go.

22
To use a kref, add one to your data structures like::
23

24 25
    struct my_data
    {
26 27 28 29 30
	.
	.
	struct kref refcount;
	.
	.
31
    };
32 33 34

The kref can occur anywhere within the data structure.

35 36 37
Initialization
==============

38
You must initialize the kref after you allocate it.  To do this, call
39
kref_init as so::
40 41 42 43 44 45 46 47 48 49

     struct my_data *data;

     data = kmalloc(sizeof(*data), GFP_KERNEL);
     if (!data)
            return -ENOMEM;
     kref_init(&data->refcount);

This sets the refcount in the kref to 1.

50 51 52
Kref rules
==========

53 54 55 56 57
Once you have an initialized kref, you must follow the following
rules:

1) If you make a non-temporary copy of a pointer, especially if
   it can be passed to another thread of execution, you must
58 59
   increment the refcount with kref_get() before passing it off::

60
       kref_get(&data->refcount);
61

62 63 64
   If you already have a valid pointer to a kref-ed structure (the
   refcount cannot go to zero) you may do this without a lock.

65 66
2) When you are done with a pointer, you must call kref_put()::

67
       kref_put(&data->refcount, data_release);
68

69 70 71 72 73 74 75 76 77 78 79 80
   If this is the last reference to the pointer, the release
   routine will be called.  If the code never tries to get
   a valid pointer to a kref-ed structure without already
   holding a valid pointer, it is safe to do this without
   a lock.

3) If the code attempts to gain a reference to a kref-ed structure
   without already holding a valid pointer, it must serialize access
   where a kref_put() cannot occur during the kref_get(), and the
   structure must remain valid during the kref_get().

For example, if you allocate some data and then pass it to another
81
thread to process::
82

83 84
    void data_release(struct kref *ref)
    {
85 86
	struct my_data *data = container_of(ref, struct my_data, refcount);
	kfree(data);
87
    }
88

89 90
    void more_data_handling(void *cb_data)
    {
91 92 93 94
	struct my_data *data = cb_data;
	.
	. do stuff with data here
	.
95
	kref_put(&data->refcount, data_release);
96
    }
97

98 99
    int my_data_handler(void)
    {
100 101 102 103 104 105 106 107 108 109 110 111
	int rv = 0;
	struct my_data *data;
	struct task_struct *task;
	data = kmalloc(sizeof(*data), GFP_KERNEL);
	if (!data)
		return -ENOMEM;
	kref_init(&data->refcount);

	kref_get(&data->refcount);
	task = kthread_run(more_data_handling, data, "more_data_handling");
	if (task == ERR_PTR(-ENOMEM)) {
		rv = -ENOMEM;
112
	        kref_put(&data->refcount, data_release);
113 114 115 116 117 118
		goto out;
	}

	.
	. do stuff with data here
	.
119
    out:
120 121
	kref_put(&data->refcount, data_release);
	return rv;
122
    }
123 124 125 126 127 128 129 130 131

This way, it doesn't matter what order the two threads handle the
data, the kref_put() handles knowing when the data is not referenced
any more and releasing it.  The kref_get() does not require a lock,
since we already have a valid pointer that we own a refcount for.  The
put needs no lock because nothing tries to get the data without
already holding a pointer.

Note that the "before" in rule 1 is very important.  You should never
132
do something like::
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151

	task = kthread_run(more_data_handling, data, "more_data_handling");
	if (task == ERR_PTR(-ENOMEM)) {
		rv = -ENOMEM;
		goto out;
	} else
		/* BAD BAD BAD - get is after the handoff */
		kref_get(&data->refcount);

Don't assume you know what you are doing and use the above construct.
First of all, you may not know what you are doing.  Second, you may
know what you are doing (there are some situations where locking is
involved where the above may be legal) but someone else who doesn't
know what they are doing may change the code or copy the code.  It's
bad style.  Don't do it.

There are some situations where you can optimize the gets and puts.
For instance, if you are done with an object and enqueuing it for
something else or passing it off to something else, there is no reason
152
to do a get then a put::
153 154 155 156 157 158

	/* Silly extra get and put */
	kref_get(&obj->ref);
	enqueue(obj);
	kref_put(&obj->ref, obj_cleanup);

159
Just do the enqueue.  A comment about this is always welcome::
160 161 162 163 164 165 166 167 168

	enqueue(obj);
	/* We are done with obj, so we pass our refcount off
	   to the queue.  DON'T TOUCH obj AFTER HERE! */

The last rule (rule 3) is the nastiest one to handle.  Say, for
instance, you have a list of items that are each kref-ed, and you wish
to get the first one.  You can't just pull the first item off the list
and kref_get() it.  That violates rule 3 because you are not already
169
holding a valid pointer.  You must add a mutex (or some other lock).
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189
For instance::

	static DEFINE_MUTEX(mutex);
	static LIST_HEAD(q);
	struct my_data
	{
		struct kref      refcount;
		struct list_head link;
	};

	static struct my_data *get_entry()
	{
		struct my_data *entry = NULL;
		mutex_lock(&mutex);
		if (!list_empty(&q)) {
			entry = container_of(q.next, struct my_data, link);
			kref_get(&entry->refcount);
		}
		mutex_unlock(&mutex);
		return entry;
190 191
	}

192 193 194
	static void release_entry(struct kref *ref)
	{
		struct my_data *entry = container_of(ref, struct my_data, refcount);
195

196 197 198
		list_del(&entry->link);
		kfree(entry);
	}
199

200 201 202 203 204 205
	static void put_entry(struct my_data *entry)
	{
		mutex_lock(&mutex);
		kref_put(&entry->refcount, release_entry);
		mutex_unlock(&mutex);
	}
206 207 208 209

The kref_put() return value is useful if you do not want to hold the
lock during the whole release operation.  Say you didn't want to call
kfree() with the lock held in the example above (since it is kind of
210
pointless to do so).  You could use kref_put() as follows::
211

212 213 214 215
	static void release_entry(struct kref *ref)
	{
		/* All work is done after the return from kref_put(). */
	}
216

217 218 219 220 221 222 223 224 225 226
	static void put_entry(struct my_data *entry)
	{
		mutex_lock(&mutex);
		if (kref_put(&entry->refcount, release_entry)) {
			list_del(&entry->link);
			mutex_unlock(&mutex);
			kfree(entry);
		} else
			mutex_unlock(&mutex);
	}
227 228 229 230 231 232

This is really more useful if you have to call other routines as part
of the free operations that could take a long time or might claim the
same lock.  Note that doing everything in the release routine is still
preferred as it is a little neater.

233
The above example could also be optimized using kref_get_unless_zero() in
234 235 236 237 238 239 240 241 242 243 244 245 246
the following way::

	static struct my_data *get_entry()
	{
		struct my_data *entry = NULL;
		mutex_lock(&mutex);
		if (!list_empty(&q)) {
			entry = container_of(q.next, struct my_data, link);
			if (!kref_get_unless_zero(&entry->refcount))
				entry = NULL;
		}
		mutex_unlock(&mutex);
		return entry;
247 248
	}

249 250 251
	static void release_entry(struct kref *ref)
	{
		struct my_data *entry = container_of(ref, struct my_data, refcount);
252

253 254 255 256 257
		mutex_lock(&mutex);
		list_del(&entry->link);
		mutex_unlock(&mutex);
		kfree(entry);
	}
258

259 260 261 262
	static void put_entry(struct my_data *entry)
	{
		kref_put(&entry->refcount, release_entry);
	}
263 264 265 266 267 268 269 270 271

Which is useful to remove the mutex lock around kref_put() in put_entry(), but
it's important that kref_get_unless_zero is enclosed in the same critical
section that finds the entry in the lookup table,
otherwise kref_get_unless_zero may reference already freed memory.
Note that it is illegal to use kref_get_unless_zero without checking its
return value. If you are sure (by already having a valid pointer) that
kref_get_unless_zero() will return true, then use kref_get() instead.

272 273
Krefs and RCU
=============
274

275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297
The function kref_get_unless_zero also makes it possible to use rcu
locking for lookups in the above example::

	struct my_data
	{
		struct rcu_head rhead;
		.
		struct kref refcount;
		.
		.
	};

	static struct my_data *get_entry_rcu()
	{
		struct my_data *entry = NULL;
		rcu_read_lock();
		if (!list_empty(&q)) {
			entry = container_of(q.next, struct my_data, link);
			if (!kref_get_unless_zero(&entry->refcount))
				entry = NULL;
		}
		rcu_read_unlock();
		return entry;
298 299
	}

300 301 302
	static void release_entry_rcu(struct kref *ref)
	{
		struct my_data *entry = container_of(ref, struct my_data, refcount);
303

304 305 306 307 308
		mutex_lock(&mutex);
		list_del_rcu(&entry->link);
		mutex_unlock(&mutex);
		kfree_rcu(entry, rhead);
	}
309

310 311 312 313
	static void put_entry(struct my_data *entry)
	{
		kref_put(&entry->refcount, release_entry_rcu);
	}
314 315 316 317 318 319

But note that the struct kref member needs to remain in valid memory for a
rcu grace period after release_entry_rcu was called. That can be accomplished
by using kfree_rcu(entry, rhead) as done above, or by calling synchronize_rcu()
before using kfree, but note that synchronize_rcu() may sleep for a
substantial amount of time.