redis 精讲系列介绍七 - 过期策略

作者：Nick

2022 年 6 月 21 日
本文字数：9913 字
阅读完需：约 33 分钟

这一篇不再是数据结构介绍了，大致的数据结构基本都介绍了，这一篇主要是查漏补缺，或者说讲一些重要且基本的概念，也可能是经常被忽略的，很多讲 redis 的系列文章可能都会忽略，学习 redis 的时候也会，因为觉得源码学习就是讲主要的数据结构和“算法”学习了就好了。redis 的主要应用就是拿来作为高性能的缓存，那么缓存一般有些啥需要注意的，首先是访问速度，如果取得跟数据库一样快，那就没什么存在的意义，第二个是缓存的字面意思，我只是为了让数据读取快一些，通常大部分的场景这个是需要更新过期的，这里就把我要讲的第一点引出来了（真累，

redis 过期策略

redis 是如何过期缓存的，可以猜测下，最无脑的就是每个设置了过期时间的 key 都设个定时器，过期了就删除，这种显然消耗太大，清理地最及时，还有的就是 redis 正在采用的懒汉清理策略和定期清理懒汉策略就是在使用的时候去检查缓存是否过期，比如 get 操作时，先判断下这个 key 是否已经过期了，如果过期了就删掉，并且返回空，如果没过期则正常返回主要代码是

/* This function is called when we are going to perform some operation * in a given key, but such key may be already logically expired even if * it still exists in the database. The main way this function is called * is via lookupKey*() family of functions. * * The behavior of the function depends on the replication role of the * instance, because slave instances do not expire keys, they wait * for DELs from the master for consistency matters. However even * slaves will try to have a coherent return value for the function, * so that read commands executed in the slave side will be able to * behave like if the key is expired even if still present (because the * master has yet to propagate the DEL). * * In masters as a side effect of finding a key which is expired, such * key will be evicted from the database. Also this may trigger the * propagation of a DEL/UNLINK command in AOF / replication stream. * * The return value of the function is 0 if the key is still valid, * otherwise the function returns 1 if the key is expired. */int expireIfNeeded(redisDb *db, robj *key) {    if (!keyIsExpired(db,key)) return 0;
    /* If we are running in the context of a slave, instead of     * evicting the expired key from the database, we return ASAP:     * the slave key expiration is controlled by the master that will     * send us synthesized DEL operations for expired keys.     *     * Still we try to return the right information to the caller,     * that is, 0 if we think the key should be still valid, 1 if     * we think the key is expired at this time. */    if (server.masterhost != NULL) return 1;
    /* Delete the key */    server.stat_expiredkeys++;    propagateExpire(db,key,server.lazyfree_lazy_expire);    notifyKeyspaceEvent(NOTIFY_EXPIRED,        "expired",key,db->id);    return server.lazyfree_lazy_expire ? dbAsyncDelete(db,key) :                                         dbSyncDelete(db,key);}
/* Check if the key is expired. */int keyIsExpired(redisDb *db, robj *key) {    mstime_t when = getExpire(db,key);    mstime_t now;
    if (when < 0) return 0; /* No expire for this key */
    /* Don't expire anything while loading. It will be done later. */    if (server.loading) return 0;
    /* If we are in the context of a Lua script, we pretend that time is     * blocked to when the Lua script started. This way a key can expire     * only the first time it is accessed and not in the middle of the     * script execution, making propagation to slaves / AOF consistent.     * See issue #1525 on Github for more information. */    if (server.lua_caller) {        now = server.lua_time_start;    }    /* If we are in the middle of a command execution, we still want to use     * a reference time that does not change: in that case we just use the     * cached time, that we update before each call in the call() function.     * This way we avoid that commands such as RPOPLPUSH or similar, that     * may re-open the same key multiple times, can invalidate an already     * open object in a next call, if the next call will see the key expired,     * while the first did not. */    else if (server.fixed_time_expire > 0) {        now = server.mstime;    }    /* For the other cases, we want to use the most fresh time we have. */    else {        now = mstime();    }
    /* The key expired if the current (virtual or real) time is greater     * than the expire time of the key. */    return now > when;}/* Return the expire time of the specified key, or -1 if no expire * is associated with this key (i.e. the key is non volatile) */long long getExpire(redisDb *db, robj *key) {    dictEntry *de;
    /* No expire? return ASAP */    if (dictSize(db->expires) == 0 ||       (de = dictFind(db->expires,key->ptr)) == NULL) return -1;
    /* The entry was found in the expire dict, this means it should also     * be present in the main dict (safety check). */    serverAssertWithInfo(NULL,key,dictFind(db->dict,key->ptr) != NULL);    return dictGetSignedIntegerVal(de);}

复制代码

这里有几点要注意的，第一是当惰性删除时会根据 lazyfree_lazy_expire 这个参数去判断是执行同步删除还是异步删除，另外一点是对于 slave，是不需要执行的，因为会在 master 过期时向 slave 发送 del 指令。光采用这个策略会有什么问题呢，假如一些 key 一直未被访问，那这些 key 就不会过期了，导致一直被占用着内存，所以 redis 采取了懒汉式过期加定期过期策略，定期策略是怎么执行的呢

/* This function handles 'background' operations we are required to do * incrementally in Redis databases, such as active key expiring, resizing, * rehashing. */void databasesCron(void) {    /* Expire keys by random sampling. Not required for slaves     * as master will synthesize DELs for us. */    if (server.active_expire_enabled) {        if (server.masterhost == NULL) {            activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);        } else {            expireSlaveKeys();        }    }
    /* Defrag keys gradually. */    activeDefragCycle();
    /* Perform hash tables rehashing if needed, but only if there are no     * other processes saving the DB on disk. Otherwise rehashing is bad     * as will cause a lot of copy-on-write of memory pages. */    if (!hasActiveChildProcess()) {        /* We use global counters so if we stop the computation at a given         * DB we'll be able to start from the successive in the next         * cron loop iteration. */        static unsigned int resize_db = 0;        static unsigned int rehash_db = 0;        int dbs_per_call = CRON_DBS_PER_CALL;        int j;
        /* Don't test more DBs than we have. */        if (dbs_per_call > server.dbnum) dbs_per_call = server.dbnum;
        /* Resize */        for (j = 0; j < dbs_per_call; j++) {            tryResizeHashTables(resize_db % server.dbnum);            resize_db++;        }
        /* Rehash */        if (server.activerehashing) {            for (j = 0; j < dbs_per_call; j++) {                int work_done = incrementallyRehash(rehash_db);                if (work_done) {                    /* If the function did some work, stop here, we'll do                     * more at the next cron loop. */                    break;                } else {                    /* If this db didn't need rehash, we'll try the next one. */                    rehash_db++;                    rehash_db %= server.dbnum;                }            }        }    }}/* Try to expire a few timed out keys. The algorithm used is adaptive and * will use few CPU cycles if there are few expiring keys, otherwise * it will get more aggressive to avoid that too much memory is used by * keys that can be removed from the keyspace. * * Every expire cycle tests multiple databases: the next call will start * again from the next db, with the exception of exists for time limit: in that * case we restart again from the last database we were processing. Anyway * no more than CRON_DBS_PER_CALL databases are tested at every iteration. * * The function can perform more or less work, depending on the "type" * argument. It can execute a "fast cycle" or a "slow cycle". The slow * cycle is the main way we collect expired cycles: this happens with * the "server.hz" frequency (usually 10 hertz). * * However the slow cycle can exit for timeout, since it used too much time. * For this reason the function is also invoked to perform a fast cycle * at every event loop cycle, in the beforeSleep() function. The fast cycle * will try to perform less work, but will do it much more often. * * The following are the details of the two expire cycles and their stop * conditions: * * If type is ACTIVE_EXPIRE_CYCLE_FAST the function will try to run a * "fast" expire cycle that takes no longer than EXPIRE_FAST_CYCLE_DURATION * microseconds, and is not repeated again before the same amount of time. * The cycle will also refuse to run at all if the latest slow cycle did not * terminate because of a time limit condition. * * If type is ACTIVE_EXPIRE_CYCLE_SLOW, that normal expire cycle is * executed, where the time limit is a percentage of the REDIS_HZ period * as specified by the ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC define. In the * fast cycle, the check of every database is interrupted once the number * of already expired keys in the database is estimated to be lower than * a given percentage, in order to avoid doing too much work to gain too * little memory. * * The configured expire "effort" will modify the baseline parameters in * order to do more work in both the fast and slow expire cycles. */
#define ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP 20 /* Keys for each DB loop. */#define ACTIVE_EXPIRE_CYCLE_FAST_DURATION 1000 /* Microseconds. */#define ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC 25 /* Max % of CPU to use. */#define ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE 10 /* % of stale keys after which                                                   we do extra efforts. */void activeExpireCycle(int type) {    /* Adjust the running parameters according to the configured expire     * effort. The default effort is 1, and the maximum configurable effort     * is 10. */    unsigned long    effort = server.active_expire_effort-1, /* Rescale from 0 to 9. */    config_keys_per_loop = ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP +                           ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP/4*effort,    config_cycle_fast_duration = ACTIVE_EXPIRE_CYCLE_FAST_DURATION +                                 ACTIVE_EXPIRE_CYCLE_FAST_DURATION/4*effort,    config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC +                                  2*effort,    config_cycle_acceptable_stale = ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE-                                    effort;
    /* This function has some global state in order to continue the work     * incrementally across calls. */    static unsigned int current_db = 0; /* Last DB tested. */    static int timelimit_exit = 0;      /* Time limit hit in previous call? */    static long long last_fast_cycle = 0; /* When last fast cycle ran. */
    int j, iteration = 0;    int dbs_per_call = CRON_DBS_PER_CALL;    long long start = ustime(), timelimit, elapsed;
    /* When clients are paused the dataset should be static not just from the     * POV of clients not being able to write, but also from the POV of     * expires and evictions of keys not being performed. */    if (clientsArePaused()) return;
    if (type == ACTIVE_EXPIRE_CYCLE_FAST) {        /* Don't start a fast cycle if the previous cycle did not exit         * for time limit, unless the percentage of estimated stale keys is         * too high. Also never repeat a fast cycle for the same period         * as the fast cycle total duration itself. */        if (!timelimit_exit &&            server.stat_expired_stale_perc < config_cycle_acceptable_stale)            return;
        if (start < last_fast_cycle + (long long)config_cycle_fast_duration*2)            return;
        last_fast_cycle = start;    }
    /* We usually should test CRON_DBS_PER_CALL per iteration, with     * two exceptions:     *     * 1) Don't test more DBs than we have.     * 2) If last time we hit the time limit, we want to scan all DBs     * in this iteration, as there is work to do in some DB and we don't want     * expired keys to use memory for too much time. */    if (dbs_per_call > server.dbnum || timelimit_exit)        dbs_per_call = server.dbnum;
    /* We can use at max 'config_cycle_slow_time_perc' percentage of CPU     * time per iteration. Since this function gets called with a frequency of     * server.hz times per second, the following is the max amount of     * microseconds we can spend in this function. */    timelimit = config_cycle_slow_time_perc*1000000/server.hz/100;    timelimit_exit = 0;    if (timelimit <= 0) timelimit = 1;
    if (type == ACTIVE_EXPIRE_CYCLE_FAST)        timelimit = config_cycle_fast_duration; /* in microseconds. */
    /* Accumulate some global stats as we expire keys, to have some idea     * about the number of keys that are already logically expired, but still     * existing inside the database. */    long total_sampled = 0;    long total_expired = 0;
    for (j = 0; j < dbs_per_call && timelimit_exit == 0; j++) {        /* Expired and checked in a single loop. */        unsigned long expired, sampled;
        redisDb *db = server.db+(current_db % server.dbnum);
        /* Increment the DB now so we are sure if we run out of time         * in the current DB we'll restart from the next. This allows to         * distribute the time evenly across DBs. */        current_db++;
        /* Continue to expire if at the end of the cycle more than 25%         * of the keys were expired. */        do {            unsigned long num, slots;            long long now, ttl_sum;            int ttl_samples;            iteration++;
            /* If there is nothing to expire try next DB ASAP. */            if ((num = dictSize(db->expires)) == 0) {                db->avg_ttl = 0;                break;            }            slots = dictSlots(db->expires);            now = mstime();
            /* When there are less than 1% filled slots, sampling the key             * space is expensive, so stop here waiting for better times...             * The dictionary will be resized asap. */            if (num && slots > DICT_HT_INITIAL_SIZE &&                (num*100/slots < 1)) break;
            /* The main collection cycle. Sample random keys among keys             * with an expire set, checking for expired ones. */            expired = 0;            sampled = 0;            ttl_sum = 0;            ttl_samples = 0;
            if (num > config_keys_per_loop)                num = config_keys_per_loop;
            /* Here we access the low level representation of the hash table             * for speed concerns: this makes this code coupled with dict.c,             * but it hardly changed in ten years.             *             * Note that certain places of the hash table may be empty,             * so we want also a stop condition about the number of             * buckets that we scanned. However scanning for free buckets             * is very fast: we are in the cache line scanning a sequential             * array of NULL pointers, so we can scan a lot more buckets             * than keys in the same time. */            long max_buckets = num*20;            long checked_buckets = 0;
            while (sampled < num && checked_buckets < max_buckets) {                for (int table = 0; table < 2; table++) {                    if (table == 1 && !dictIsRehashing(db->expires)) break;
                    unsigned long idx = db->expires_cursor;                    idx &= db->expires->ht[table].sizemask;                    dictEntry *de = db->expires->ht[table].table[idx];                    long long ttl;
                    /* Scan the current bucket of the current table. */                    checked_buckets++;                    while(de) {                        /* Get the next entry now since this entry may get                         * deleted. */                        dictEntry *e = de;                        de = de->next;
                        ttl = dictGetSignedIntegerVal(e)-now;                        if (activeExpireCycleTryExpire(db,e,now)) expired++;                        if (ttl > 0) {                            /* We want the average TTL of keys yet                             * not expired. */                            ttl_sum += ttl;                            ttl_samples++;                        }                        sampled++;                    }                }                db->expires_cursor++;            }            total_expired += expired;            total_sampled += sampled;
            /* Update the average TTL stats for this database. */            if (ttl_samples) {                long long avg_ttl = ttl_sum/ttl_samples;
                /* Do a simple running average with a few samples.                 * We just use the current estimate with a weight of 2%                 * and the previous estimate with a weight of 98%. */                if (db->avg_ttl == 0) db->avg_ttl = avg_ttl;                db->avg_ttl = (db->avg_ttl/50)*49 + (avg_ttl/50);            }
            /* We can't block forever here even if there are many keys to             * expire. So after a given amount of milliseconds return to the             * caller waiting for the other active expire cycle. */            if ((iteration & 0xf) == 0) { /* check once every 16 iterations. */                elapsed = ustime()-start;                if (elapsed > timelimit) {                    timelimit_exit = 1;                    server.stat_expired_time_cap_reached_count++;                    break;                }            }            /* We don't repeat the cycle for the current database if there are             * an acceptable amount of stale keys (logically expired but yet             * not reclained). */        } while ((expired*100/sampled) > config_cycle_acceptable_stale);    }
    elapsed = ustime()-start;    server.stat_expire_cycle_time_used += elapsed;    latencyAddSampleIfNeeded("expire-cycle",elapsed/1000);
    /* Update our estimate of keys existing but yet to be expired.     * Running average with this sample accounting for 5%. */    double current_perc;    if (total_sampled) {        current_perc = (double)total_expired/total_sampled;    } else        current_perc = 0;    server.stat_expired_stale_perc = (current_perc*0.05)+                                     (server.stat_expired_stale_perc*0.95);}

复制代码

执行定期清除分成两种类型，快和慢，分别由beforeSleep和databasesCron调用，快版有两个限制，一个是执行时长由 ACTIVE_EXPIRE_CYCLE_FAST_DURATION 限制，另一个是执行间隔是 2 倍的 ACTIVE_EXPIRE_CYCLE_FAST_DURATION，另外这还可以由配置的 server.active_expire_effort 参数来控制，默认是 1，最大是 10

onfig_cycle_fast_duration = ACTIVE_EXPIRE_CYCLE_FAST_DURATION +                                 ACTIVE_EXPIRE_CYCLE_FAST_DURATION/4*effort

复制代码

然后会从一定数量的 db 中找出一定数量的带过期时间的 key（保存在 expires 中），这里的数量是由

config_keys_per_loop = ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP +                           ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP/4*effort

复制代码

控制，慢速的执行时长是

config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC +                                  2*efforttimelimit = config_cycle_slow_time_perc*1000000/server.hz/100;

复制代码

这里还有一个额外的退出条件，如果当前数据库的抽样结果已经达到我们所允许的过期 key 百分比，则下次不再处理当前 db，继续处理下个 db

本文使用署名 4.0 国际 (CC BY 4.0)许可协议，欢迎转载、或重新修改使用，但需要注明来源。

本文作者: Nicksxs

创建时间: 2020-04-12

本文链接: redis精讲系列介绍七-过期策略

发布于: 刚刚阅读数: 3

原文链接:【http://xie.infoq.cn/article/578b6d83d4c4cbbc82dbcabe0】。文章转载请联系作者。